Error 502 on cloning big Git repo

Hello,

I have a big Git repo (4.6 GB), and cloning fails with the following error:
Cloning into ‘repo’…
error: RPC failed; HTTP 502 curl 22 The requested URL returned error: 502 Proxy Error
fatal: The remote end hung up unexpectedly

Error appears after a few minutes.

The logs are here:
2017-08-24 14:18:01.692 INFO [rhodecode.lib.utils] Logging action:pull on repo:BIGRepo by user:<User(‘id:4:cbrossollet’)> ip:10.67.6.44
2017-08-24 14:18:01.806 INFO [rhodecode.lib.middleware.request_wrapper] IP: 10.67.6.44 Request to /BIGRepo/git-upload-pack time: 263.263s

The server has 12 GB of RAM, and when monitoring it during clone I see memory climbing up to ~ 10 GB, and CPU at 25% (it has 4 cores). I have no issue with this repo using a git smart http server… Is there anything to do? Any parameter to tune? I’m using version 4.7.2 on Ubuntu server 16.10.

Hi Charles,

Please try the 4.9.0 release. It might be an issue with stream support. If this doesn’t solve the problem we’ll check it further.

Hi Marcin,

I get the same problem with 4.9.0. It seems to fail in a shorter time now (git-upload-pack time: 82.428s)

Are you sure it’s not your client or some kind of proxy ? Does it also fail if you bypass the http server, and clone directly from the IP:PORT of rhodecode server ?

Yes, it’s the same problem accessing directly the rhodecode server at 127.0.0.1:10002
Error is now

fatal: unable to access ‘https://admin@127.0.0.1:10002/Repo/’: Operation timed out after 300038 milliseconds with 0 out of 0 bytes received

In case it matters, git version on the server is 2.9.3, and Ubuntu server version is 16.10

Hmm This looks very odd.

Maybe your server is running out of memory ? It’s shouldn’t be the case, because GIT should stream the data, unless there’s other problem there. We usually test large clones using GIT Kernel repository Which is ~600K commits and ~3gb size. Memory usage doesn’t go bigger then 1GB.

Let us re-check few things if there hasn’t been regression in this case.

Having 12 GB of RAM, it shouldn’t go out of memory, idle usage is 1.5 GB. To the difference of Git kernel repo, ours has big files (around 200 MB), so it might be the difference between your tests and our context.
NB Sadly we can’t use Git LFS…

Using top I see that a gunicorn process is stuck is using a lot of resources before the cloning fails. Using /proc/<PID>/cmdline doesn’t give any info of what is it doing, how can I check that?

It would be nice to see what’s going on but I don’t know how…

Based on our data there should a GIT CLI process spawned from gunicorn itself. It’d look for any git ... commands that take a loot of resources.

It’s not what I see in my case, only gunicorn process takes up resources. I quickly see a git process at the beginning but then all is in guniorn

top - 18:31:25 up  2:27,  1 user,  load average: 0.60, 0.41, 0.39
Tasks: 170 total,   2 running, 168 sleeping,   0 stopped,   0 zombie
%Cpu(s): 25.8 us,  0.8 sy,  0.0 ni, 73.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 12297260 total,   138508 free,  7324780 used,  4833972 buff/cache
KiB Swap:  4191228 total,  4149232 free,    41996 used.  4684332 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1264 rhodeco+  20   0 1221008 960616  11628 R  99.7  7.8   1:16.92 .gunicorn-wrapp
 4311 rhodeco+  20   0 5191692 4.621g  10772 S   5.6 39.4   2:19.55 .gunicorn-wrapp
   53 root      20   0       0      0      0 S   0.3  0.0   0:02.80 kswapd0
 1804 rabbitmq  20   0 2190152  36408   4232 S   0.3  0.3   0:24.87 beam.smp
 7484 rhodeco+  20   0   42024   3676   3028 R   0.3  0.0   0:00.02 top
    1 root      20   0   55212   6220   4404 S   0.0  0.1   0:04.44 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd
    3 root      20   0       0      0      0 S   0.0  0.0   0:00.04 ksoftirqd/0

Hmm, this seems very odd then. I feel we’re missing something important here. Is it possible to have a remote session with us to help track down the problem ?

Sure, will get in touch by email when remote access is ready.
Thanks a lot

Perfect,

I’ll wait for your message then.

Best,

Just an update, we manage to find one interesting case that could cause excessive memory usage and manage to fix it. I’d like to test that fix on your biggest repo would that be possible ?

WE got this confirmed and fixed. Fix will be in next release either 4.9.1 or 4.10.0

1 Like