ERROR [vcsserver.lib.exc_tracking] Failed to store exception

MrDecisive · October 10, 2019, 4:32pm

Our devs are having trouble when creating pull requests. The application in the browser starts reporting 500 errors and on investigation the vcsserver instance logs show the above error. Followed by “Too many open files” we’ve set the ulimit for the rcdev user (who runs the vcs and rhc services to 65535 and when we check open files they never seem to get to more than 10,000 ish. Any ideas?

Cheers
MrD

EDIT:
ERROR [vcsserver.lib.exc_tracking] Failed to store exception 139788049897520 information

Traceback (most recent call last):

File "/opt/rhodecode/store/vi14s6c820pdbdc4zik926vnf2wjnapp-python2.7-rhodecode-vcsserver-4.13.2/lib/python2.7/site-packages/vcsserver/lib/exc_tr

acking.py", line 96, in store_exception

File "/opt/rhodecode/store/vi14s6c820pdbdc4zik926vnf2wjnapp-python2.7-rhodecode-vcsserver-4.13.2/lib/python2.7/site-packages/vcsserver/lib/exc_tr

acking.py", line 82, in _store_exception

IOError: [Errno 24] Too many open files: ‘/tmp/rc_exception_store_v1/139788049897520_vcsserver_1570722782.042588’

2019-10-10 16:53:02.043 INFO [vcsserver.tweens] IP: 127.0.0.1 Request to path: /git time: 0.002s

2019-10-10 16:56:57.405 DEBUG [vcsserver.http_main] method called:assert_correct_path with kwargs:{} context_uid: 9c3845ab-aa76-4bbe-a467-281f725ad

1e5

2019-10-10 16:56:57.406 DEBUG [dogpile.lock] NeedRegenerationException

2019-10-10 16:56:57.406 DEBUG [dogpile.lock] no value, waiting for create lock

2019-10-10 16:56:57.406 DEBUG [dogpile.lock] value creation lock <dogpile.cache.region._LockWrapper object at 0x7f230300d290> acquired

2019-10-10 16:56:57.406 DEBUG [dogpile.lock] Calling creation function for not-yet-present value

2019-10-10 16:56:57.406 DEBUG [dogpile.lock] Released creation lock

2019-10-10 16:56:57.406 INFO [vcsserver.tweens] IP: 127.0.0.1 Request to path: /git time: 0.001s

2019-10-10 16:56:57.506 DEBUG [vcsserver.http_main] method called:assert_correct_path with kwargs:{} context_uid: d61fc397-0680-46af-b511-e0573008d

83f

2019-10-10 16:56:57.507 DEBUG [dogpile.lock] NeedRegenerationException

2019-10-10 16:56:57.507 DEBUG [dogpile.lock] no value, waiting for create lock

2019-10-10 16:56:57.507 DEBUG [dogpile.lock] value creation lock <dogpile.cache.region._LockWrapper object at 0x7f230300d1d0> acquired

2019-10-10 16:56:57.507 DEBUG [dogpile.lock] Calling creation function for not-yet-present value

2019-10-10 16:56:57.507 DEBUG [dogpile.lock] Released creation lock

2019-10-10 16:56:57.507 INFO [vcsserver.tweens] IP: 127.0.0.1 Request to path: /git time: 0.001s

2019-10-10 16:56:58.450 DEBUG [vcsserver.http_main] method called:assert_correct_path with kwargs:{} context_uid: 2b449694-f0be-44a7-a278-d1f185a7c

26b

2019-10-10 16:56:58.450 DEBUG [dogpile.lock] NeedRegenerationException

2019-10-10 16:56:58.450 DEBUG [dogpile.lock] no value, waiting for create lock

2019-10-10 16:56:58.450 DEBUG [dogpile.lock] value creation lock <dogpile.cache.region._LockWrapper object at 0x7f22f10e5d50> acquired

2019-10-10 16:56:58.450 DEBUG [dogpile.lock] Calling creation function for not-yet-present value

2019-10-10 16:56:58.451 DEBUG [dogpile.lock] Released creation lock

2019-10-10 16:56:58.451 ERROR [vcsserver.lib.exc_tracking] Failed to store exception 139788049930432 information

MrDecisive · October 15, 2019, 8:24am

This is still an issue if any one has any ideas? It seems to start when the dev tries to merge a pull request. The pull requests are often quite large with several hundred files in them so I’m guessing it has something to do with that. I’ve increases the ulimits for the rcdev user but the problem is still happening. Even if you have no ideas so suggestions of where I might look would be appreciated as I’m totally stumped. Thanks

rhodecode-support · October 16, 2019, 9:08am

In Admin > settings > system info there’s internal check for file descriptors, please check the values there, and post them here.

MrDecisive · October 16, 2019, 1:38pm

Hi Marcin,

Thanks for replying. These are the values from the system admin page:

cpu time (seconds):(-1, -1), file size:(-1, -1), stack size:(8388608, -1), core file size:(0, -1), address space size:(-1,
-1), locked in mem size:(65536, 65536), heap size:(-1, -1), rss size:(-1, -1), number of processes:(63960, 63960), open files:(65535, 65535)

Do they look okay?

Cheers

Phill

rhodecode-support · October 16, 2019, 6:07pm

This looks ok, i wonder maybe the GIT repo is very fragmented. Can you try running maintenance command from repository settings ?

MrDecisive · October 17, 2019, 9:52am

Hi Marcin,

I ran the maintenanance and got the results below. Would you say that was fragmented or not (I don’t have anything to compare with)

Cheers

Phill

GIT FSCK:

executed fsck --full

dangling commit 025d607eb8d713445fc7ee11fd31db1b53009ef2dangling commit f870e84b13026b0f9e309e079125b1a5c1be02bbdangling commit ee9f68de7a684cb01ef69d0f060aa071d2e14957dangling commit 96be0c98e907aefdeefdf4241890f7a8db3327c4dangling commit
4ed08c0d345b58c267380d4b774b8929f483f096dangling commit 1ef368b02a1eda6b2e21988c1ff9132f3161f0e2dangling commit c5fd889520a5ffbbff752a04ab681401914becc6dangling commit f22505663313164c43f994574d5b4843bc9121cedangling commit 5a2ce1ce6931ff5fd6d9789fdd5e74dad9297fc3dangling
commit c7b3d1b9fb322374d3fa71c23f5866989e396b3adangling commit 4fdcba25a23769fba699fcd5feb4b356ed662947dangling commit 1dea06e942c1fad4a62a9a3ad59e78a0440e07f8dangling commit 4f221f44ce8a8ba5986484d6e19b3f681afece0bdangling commit 2b4fc73771c0250f94c01fa86c645710c704a6da

rhodecode-support · October 17, 2019, 10:04am

Does the error happen again after executing this command?

MrDecisive · October 17, 2019, 10:22am

Hi Marcin,

As it happens I just had one of our devs come over and say things seem to be stable since we sorted the uliimit settings. I’m pretty sure the last crash was while the settings were still at the lower level because even though I’d increase
all the levels for the user that runs the process I’d not restarted the whole application so the changes may not have been fully applied. Subsequent to the last crash I restarted/rebooted everything and then we could see (as you suggested) within the interface
that the application had picked up the new values. So, from that point of view and the comment from the developer I’m happy that things look a lot better and we can close the issue.

I can always come back to you if we see the problem again.

Thanks for all you help and the time you’ve spent looking into this.

Cheers

Phill