Rcstack - hgrepo, regression with non utf-8 encoded filenames

Hi,

I found a regression in between the 4.28.0 (python 2) and the latest beta image (python 3): if we have a hg repository with filenames encoded with another encoding than utf-8, the post 4.28.0 images raise an encoding exception from the rhodecode instance.

Note that mercurial internals really don’t seem to care about the filename encoding and treat it as a bytes array.

With the 4.28.0 image we can list repositories with non utf-8 filenames with some tradeoff (which seem fine for me) :

We can clone the repository as expected with the correct filenames.

But from the beta image we got an encoding exception:

We can still clone the repository as expected with the correct filenames.

You can find a hg repository showcase at:
https://code.rhodecode.com/u/aji/hgrepo-windows-1252

And the tracked exception here:
Exception ID: 139775684076352

For the Windows users, it should be great to relax the bytes.decode(encoding='utf-8', errors='strict') errors :slight_smile:

Furthermore, if we look at the rhodecode logs, this may trigger an invalid method call:

rhodecode-1    | 2024-07-11 08:10:31 [42] [ERROR] Error handling request /_admin/channelstream/connect
rhodecode-1    | Traceback (most recent call last):
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/pyramid/tweens.py", line 41, in excview_tween
rhodecode-1    |     response = handler(request)
rhodecode-1    |                ^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/pyramid/router.py", line 139, in handle_request
rhodecode-1    |     has_listeners and notify(ContextFound(request))
rhodecode-1    |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/pyramid/registry.py", line 103, in notify
rhodecode-1    |     [_ for _ in self.subscribers(events, None)]
rhodecode-1    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/zope/interface/registry.py", line 446, in subscribers
rhodecode-1    |     return self.adapters.subscribers(objects, provided)
rhodecode-1    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/zope/interface/adapter.py", line 896, in subscribers
rhodecode-1    |     subscription(*objects)
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/pyramid/config/adapters.py", line 97, in derived_subscriber
rhodecode-1    |     return subscriber(arg[0])
rhodecode-1    |            ^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/rhodecode/subscribers.py", line 93, in add_request_user_context
rhodecode-1    |     auth_user, auth_token = get_auth_user(request)
rhodecode-1    |                             ^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/rhodecode/lib/base.py", line 485, in get_auth_user
rhodecode-1    |     cookie_store = CookieStoreWrapper(session.get('rhodecode_user'))
rhodecode-1    |                                       ^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/beaker/session.py", line 789, in __getattr__
rhodecode-1    |     return getattr(self._session(), attr)
rhodecode-1    |                    ^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/beaker/session.py", line 785, in _session
rhodecode-1    |     self.__dict__['_sess'] = session_cls(req, **params)
rhodecode-1    |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/beaker/session.py", line 227, in __init__
rhodecode-1    |     self.load()
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/beaker/session.py", line 415, in load
rhodecode-1    |     session_data = self._decrypt_data(session_data)
rhodecode-1    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/beaker/session.py", line 372, in _decrypt_data
rhodecode-1    |     data = b64decode(session_data)
rhodecode-1    |            ^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/beaker/_compat.py", line 37, in b64decode
rhodecode-1    |     return _b64decode(b.encode('ascii'))
rhodecode-1    |                       ^^^^^^^^
rhodecode-1    | AttributeError: 'dict' object has no attribute 'encode'
rhodecode-1    | 
rhodecode-1    | During handling of the above exception, another exception occurred:
rhodecode-1    | 
rhodecode-1    | Traceback (most recent call last):
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/gunicorn/workers/sync.py", line 135, in handle
rhodecode-1    |     self.handle_request(listener, req, client, addr)
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/gunicorn/workers/sync.py", line 178, in handle_request
rhodecode-1    |     respiter = self.wsgi(environ, resp.start_response)
rhodecode-1    |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/rhodecode/config/middleware.py", line 452, in pyramid_app_with_cleanup
rhodecode-1    |     return pyramid_app(environ, start_response)
rhodecode-1    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/rhodecode/lib/middleware/https_fixup.py", line 45, in __call__
rhodecode-1    |     return self.application(environ, custom_start_response)
rhodecode-1    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/pyramid/router.py", line 270, in __call__
rhodecode-1    |     response = self.execution_policy(environ, self)
rhodecode-1    |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/pyramid/router.py", line 276, in default_execution_policy
rhodecode-1    |     return router.invoke_request(request)
rhodecode-1    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/pyramid/router.py", line 245, in invoke_request
rhodecode-1    |     response = handle_request(request)
rhodecode-1    |                ^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/rhodecode/lib/middleware/request_wrapper.py", line 54, in __call__
rhodecode-1    |     response = self.handler(request)
rhodecode-1    |                ^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/rhodecode/tweens.py", line 104, in sanity_check
rhodecode-1    |     return handler(request)
rhodecode-1    |            ^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/rhodecode/tweens.py", line 56, in vcs_detection_tween
rhodecode-1    |     return handler(request)
rhodecode-1    |            ^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/pyramid/tweens.py", line 43, in excview_tween
rhodecode-1    |     response = _error_handler(request, exc)
rhodecode-1    |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/pyramid/tweens.py", line 13, in _error_handler
rhodecode-1    |     response = request.invoke_exception_view(exc_info)
rhodecode-1    |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/pyramid/view.py", line 765, in invoke_exception_view
rhodecode-1    |     response = _call_view(
rhodecode-1    |                ^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/pyramid/view.py", line 674, in _call_view
rhodecode-1    |     response = view_callable(context, request)
rhodecode-1    |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/pyramid/viewderivers.py", line 392, in viewresult_to_response
rhodecode-1    |     result = view(context, request)
rhodecode-1    |              ^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/rhodecode/config/middleware.py", line 234, in error_handler
rhodecode-1    |     c.messages = helpers.flash.pop_messages(request=request)
rhodecode-1    |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/rhodecode/lib/helpers.py", line 744, in pop_messages
rhodecode-1    |     for cat, msg in session.pop(self.session_key, []):
rhodecode-1    |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/rhodecode/lib/rc_beaker.py", line 136, in save
rhodecode-1    |     value = wrapped(session, *arg, **kw)
rhodecode-1    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/rhodecode/lib/rc_beaker.py", line 91, in pop
rhodecode-1    |     return self._session().pop(k, d)
rhodecode-1    |            ^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/beaker/session.py", line 785, in _session
rhodecode-1    |     self.__dict__['_sess'] = session_cls(req, **params)
rhodecode-1    |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/beaker/session.py", line 227, in __init__
rhodecode-1    |     self.load()
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/beaker/session.py", line 415, in load
rhodecode-1    |     session_data = self._decrypt_data(session_data)
rhodecode-1    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/beaker/session.py", line 372, in _decrypt_data
rhodecode-1    |     data = b64decode(session_data)
rhodecode-1    |            ^^^^^^^^^^^^^^^^^^^^^^^
rhodecode-1    |   File "/home/rhodecode/venv/lib/python3.11/site-packages/beaker/_compat.py", line 37, in b64decode
rhodecode-1    |     return _b64decode(b.encode('ascii'))
rhodecode-1    |                       ^^^^^^^^
rhodecode-1    | AttributeError: 'dict' object has no attribute 'encode'

I didn’t see this error before, but confirm on your side to see if this is related.

One more note, with the rccontrol one RhodeCode 4.27.1 Community Edition, not dockerized, the filename with a non utf-8 encoding is shown as-is:

Thanks a lot for the feedback and details on the issue! We will definitely look into it!

1 Like

Hi,

This looks like unrelated thing, from 4.X → 5.X migration, seems like OLD session data is still inside the DB or Redis, and 4.X vs 5.X isn’t compatible. Try using cleanup session command from admin → settings → user sessions

Thanks, that was it.
I’d never cleaned up the users sessions during the migration.

An All Cleanup Session operation did remove the AttributeError: 'dict' object has no attribute 'encode' error.

Hi,

The change seems to be indeed introduced during the python3 migration.

msgpack.pack, or more specifically the msgpack.packb, calls produce different results between the python2 and python3 versions.

From a python2, if we do the following:

from __future__ import print_function
import msgpack
import sys

# Python2: str is the same as bytes
data = 'cp1252-filename-\xe9\xe0\xf9.txt'
p = msgpack.packb(data)
u = msgpack.unpackb(p, raw=False)

print(sys.version)
print("data:", data)
print("packed:", p)
print("unpacked:", type(u), u)

We have the output:

2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 21:01:17) [MSC v.1500 64 bit (AMD64)]
data: cp1252-filename-éàù.txt
packed: Äcp1252-filename-éàù.txt
unpacked: <type 'str'> cp1252-filename-éàù.txt

From a python3, if we do:

import msgpack
import sys

data = b'cp1252-filename-\xe9\xe0\xf9.txt'
p = msgpack.packb(data) # use_bin_type=True by default
u = msgpack.unpackb(p)

print(sys.version)
print("data:", data)
print("packed:", p)
print("unpacked:", type(u), u)

We have the output:

3.12.4 (tags/v3.12.4:8e8a4ba, Jun  6 2024, 19:30:16) [MSC v.1940 64 bit (AMD64)]
data: b'cp1252-filename-\xe9\xe0\xf9.txt'
packed: b'\xc4\x17cp1252-filename-\xe9\xe0\xf9.txt'
unpacked: <class 'bytes'> b'cp1252-filename-\xe9\xe0\xf9.txt'

But if we change how the msg.packb is called with:

import msgpack
import sys

data = b'cp1252-filename-\xe9\xe0\xf9.txt'
p = msgpack.packb(data, use_bin_type=False)
u = msgpack.unpackb(p)

print(sys.version)
print("data:", data)
print("packed:", p)
print("unpacked:", type(u), u)

We have the same error:

Traceback (most recent call last):
  File "C:\Users\WDAGUtilityAccount\Desktop\test-python3\test2-use_bin_type=False.py", line 7, in <module>
    u = msgpack.unpackb(p)
        ^^^^^^^^^^^^^^^^^^
  File "msgpack\\_unpacker.pyx", line 194, in msgpack._cmsgpack.unpackb
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 16: invalid continuation byte

I didn’t see yet how RhodeCode calls the msgpack.packb.

Furthermore, the commit messages seem not to be affected by this because the encoding conversion is handled directly from the backend: rhodecode-enterprise-ce Files · rhodecode/lib/vcs/backends/hg/commit.py · RhodeCode Free Hosting

If you have a proper default_encoding= from the rhodecode.ini, the conversion is done correctly, otherwise the unicode conversion replace the invalid codepoints.

The files paths are affected by the msgpack error as the backend sends them as-is: rhodecode-enterprise-ce Files · rhodecode/lib/vcs/backends/hg/commit.py · RhodeCode Free Hosting

Hope this can help.

1 Like

Hi,

thanks for investigating that, we already did some analysis and the reason this is happening is that the list of files returned by vcsserver is converted to UTF-8 Strings.

so CP-X encoded files just get broken in transfer. We’re working on a fix

Vcsserver has a special mechanism to return raw bytes data back.

diff --git a/vcsserver/remote/git_remote.py b/vcsserver/remote/git_remote.py
--- a/vcsserver/remote/git_remote.py
+++ b/vcsserver/remote/git_remote.py
@@ -433,7 +433,7 @@
                     raise exceptions.VcsException(e)(f"Unknown bulk attribute: {attr}")
             return result

-        return _bulk_request(repo_id, rev, sorted(pre_load))
+        return BinaryEnvelope(_bulk_request(repo_id, rev, sorted(pre_load)))

     @reraise_safe_exceptions
     def bulk_file_request(self, wire, commit_id, path, pre_load):
diff --git a/vcsserver/remote/hg_remote.py b/vcsserver/remote/hg_remote.py
--- a/vcsserver/remote/hg_remote.py
+++ b/vcsserver/remote/hg_remote.py
@@ -338,7 +338,7 @@
                         f'Unknown bulk attribute: "{attr}"')
             return result

-        return _bulk_request(repo_id, commit_id, sorted(pre_load))
+        return BinaryEnvelope(_bulk_request(repo_id, commit_id, sorted(pre_load)))

     @reraise_safe_exceptions
     def ctx_branch(self, wire, commit_id):
@@ -387,7 +387,8 @@
     def ctx_list(self, path, revision):
         repo = self._factory.repo(path)
         ctx = self._get_ctx(repo, revision)
-        return list(ctx)
+        byte_nodes: list[bytes] = list(ctx)
+        return BinaryEnvelope(byte_nodes)

     @reraise_safe_exceptions
     def ctx_parents(self, wire, commit_id):

but this is just a part of the story, we need to work on making this work across the whole system.

2 Likes