Some time ago I upgraded from the old rccontrol program to rcstack, I had assume the process was successful as everything appeared to work, However, I have just tried to run through the rcstack upgrade process and hit a problem. Here are the services before the upgrade:
[root@vcs-rhode docker-rhodecode]# ./rcstack stack-status
Running hostname: https://vcs-rhode.....
CONTAINER ID NAMES IMAGE STATUS PORTS
50626df5800a rc_cluster_apps-celery-1 rhodecode/rhodecode-ce:4.28.0 Up 3 days
2d612d977a3b rc_cluster_apps-celery-beat-1 rhodecode/rhodecode-ce:4.28.0 Up 3 days
a39e0b310bfb rc_cluster_apps-rhodecode-2 rhodecode/rhodecode-ce:4.28.0 Up 3 days (healthy)
a4239f71eafc rc_cluster_apps-vcsserver-2 rhodecode/rhodecode-ce:4.28.0 Up 3 days (healthy)
b8a07575bb1b rc_cluster_metrics-grafana-1 grafana/grafana:9.5.15 Up 3 days 3000/tcp
3abf1aae9702 rc_cluster_metrics-loki-1 grafana/loki:2.9.3 Up 3 days 3100/tcp
6c288d080d9f rc_cluster_metrics-node-exporter-1 prom/node-exporter:v1.7.0 Up 3 days 9100/tcp
0dc1ce91e0d0 rc_cluster_metrics-prometheus-1 prom/prometheus:v2.48.1 Up 3 days 9090/tcp
f4d57911e5f4 rc_cluster_metrics-statsd-exporter-1 prom/statsd-exporter:v0.26.0 Up 3 days (healthy) 9102/tcp, 9125/tcp, 9125/udp
f7df993ab011 rc_cluster_router-traefik-1 traefik:v2.10.7 Up 3 days 0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp, 0.0.0.0:3100->3100/tcp, :::3100->3100/tcp, 0.0.0.0:9022->9022/tcp, :::9022->9022/tcp
8e179f8b6288 rc_cluster_services-channelstream-1 channelstream/channelstream:0.7.1 Up 3 days (healthy) 8000/tcp
09a2091814a7 rc_cluster_services-database-1 postgres:14.10 Up 3 days (healthy) 5432/tcp
0ae443ab4ef0 rc_cluster_services-elasticsearch-1 elasticsearch:6.8.23 Up 3 days (healthy) 9200/tcp, 9300/tcp
edfc46f58722 rc_cluster_services-nginx-errors-1 nginx:1.25.3 Up 3 days 80/tcp
22f9d0d8848f rc_cluster_services-nginx-statics-1 nginx:1.25.3 Up 3 days (unhealthy) 80/tcp
fc7e40107cd9 rc_cluster_services-redis-1 redis:7.2.4 Up 3 days (healthy) 6379/tcp
[root@vcs-rhode docker-rhodecode]#
It’s a vm so after several attempts and rollbacks I cloned and got it running under a new hostname/ip for testing. After the upgrade:
[root@test-rhode docker-rhodecode]# ./rcstack stack-status
Running hostname: https://test-rhode.....
CONTAINER ID NAMES IMAGE STATUS PORTS
b85017a9fb7b rc_cluster_apps-celery-1 rhodecode/rhodecode-ce:5.3.0 Up About an hour
b593bfc3c71e rc_cluster_apps-celery-beat-1 rhodecode/rhodecode-ce:5.3.0 Up About an hour
e5c07a454ec7 rc_cluster_apps-rhodecode-1 rhodecode/rhodecode-ce:5.3.0 Up About an hour (healthy)
d4a01db1779f rc_cluster_apps-vcsserver-1 rhodecode/rhodecode-ce:5.3.0 Up About an hour (healthy)
0838df690729 rc_cluster_metrics-grafana-1 grafana/grafana:9.5.18 Up About an hour 3000/tcp
e7df7a150cf0 rc_cluster_metrics-loki-1 grafana/loki:2.9.8 Up About an hour 3100/tcp
2abeb930d0d3 rc_cluster_metrics-node-exporter-1 prom/node-exporter:v1.8.0 Up About an hour 9100/tcp
20d0fa0d567d rc_cluster_metrics-prometheus-1 prom/prometheus:v2.51.2 Up About an hour 9090/tcp
6e47ea142023 rc_cluster_metrics-promtail-1 grafana/promtail:2.9.8 Up About an hour
b0ca9999d704 rc_cluster_metrics-statsd-exporter-1 prom/statsd-exporter:v0.26.1 Up About an hour (healthy) 9102/tcp, 9125/tcp, 9125/udp
9b2a6a530bb6 rc_cluster_router-traefik-1 traefik:v2.11.6 Up About an hour 0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp, 0.0.0.0:3100->3100/tcp, :::3100->3100/tcp, 0.0.0.0:9022->9022/tcp, :::9022->9022/tcp
cf9a094982c9 rc_cluster_services-channelstream-1 channelstream/channelstream:0.7.1 Up About an hour (healthy) 8000/tcp
6383e2895e2d rc_cluster_services-database-1 postgres:14.13 Up About an hour (healthy) 5432/tcp
573e1b91962f rc_cluster_services-elasticsearch-1 elasticsearch:6.8.23 Up About an hour (healthy) 9200/tcp, 9300/tcp
0f60ce832bd7 rc_cluster_services-nginx-errors-1 nginx:1.27.0 Up About an hour 80/tcp
909bad33957f rc_cluster_services-nginx-statics-1 nginx:1.27.0 Up About an hour (healthy) 80/tcp
eb3379c7f1eb rc_cluster_services-redis-1 redis:7.2.5 Up About an hour (healthy) 6379/tcp
[root@test-rhode docker-rhodecode]#
The website is browse able but certain functions do not work, you cannot view a repository and some of the admin pages fail to load.
When checking the rhodecode logs the first error I noticed was a python error relating to “long”, this was resolved by replacing “long” with “int”:
[root@test-rhode docker-rhodecode]# grep -H long ./config/_shared/gunicorn_conf_rc.py ./config/_shared/gunicorn_conf_vcs.py
./config/_shared/gunicorn_conf_rc.py: msecs = int((now - long(now)) * 1000)
./config/_shared/gunicorn_conf_vcs.py: msecs = int((now - long(now)) * 1000)
[root@test-rhode docker-rhodecode]#
The next error I spotted was a permissions issue with lock files:
File \"/home/rhodecode/venv/lib/python3.11/site-packages/dogpile/cache/backends/file.py\", line 413, in _acquire\n fileno = os.open(self.filename, wrflag)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPermissionError: [Errno 13] Permission denied: '/var/opt/rhodecode_data/cache_repo_repo.__var__opt__rhodecode_repo_store__open__crates.cache_db.dogpile.lock'\n\nD
[root@test-rhode docker-rhodecode]# ./rcstack cli storage
root@efe5abd6d8cc:/vol# ls -l /vol/datavolume/ | grep lock | grep 999 | wc
667 6003 72750
root@efe5abd6d8cc:/vol# ls -l /vol/datavolume/ | grep lock | grep root | wc
176 1584 24794
root@efe5abd6d8cc:/vol# ls -l /vol/datavolume/ | grep lock | wc
843 7587 97544
root@efe5abd6d8cc:/vol#
root@efe5abd6d8cc:/vol# ls -ld /vol/datavolume/
drwxr-xr-x. 8 999 999 131072 Oct 21 12:05 /vol/datavolume/
root@efe5abd6d8cc:/vol# chown -R 999:999 /vol/datavolume/
root@efe5abd6d8cc:/vol#
This has still not resolved the problem, but the stumbling block appears to be comunication with the vcsserver, checking the vcsserver instance with docker logs:
2024-10-21 13:04:14 [11] [WARNING] Invalid request from ip=172.18.0.16: Invalid HTTP Header: 'USER-AGENT'
I’m beginning to think I have done something fundamentally wrong! Any suggestions, I can roll back and try again.
Thanks