Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push fails on remote due to UTF16 names #1964

Open
Eddie-cz opened this issue Feb 23, 2022 · 14 comments
Open

Push fails on remote due to UTF16 names #1964

Eddie-cz opened this issue Feb 23, 2022 · 14 comments
Labels
pinned Pinnend issues are not touched by the stale bot

Comments

@Eddie-cz
Copy link

Issue description

While trying to migrate existing hg repo to scm, push fails on remote with "remote: pretxnchangegroup.scm hook failed".
Some repos works, some are rejected due to this issue.
I was able to push some failing hg repos using conversion to git and back to hg and then pushing to scm.
SCM-manager version is latest, and installation is both on Windows and on Linux, both are failing.
Mercurial from client side was 5.9.2 or 6.0.0. No SCM proxies are set.

Log:
2022-02-23 14:13:05.375 [HgHookWorker-1] [6ASyGjI8F2w] ERROR sonia.scm.repository.spi.HgHookChangesetProvider - could not retrieve changesets
org.javahg.internals.RuntimeIOException: Input length = 1
at org.javahg.internals.Utils.asRuntime(Utils.java:438)
at org.javahg.internals.Utils.decodeBytes(Utils.java:244)
at org.javahg.internals.Utils.decodeBytes(Utils.java:224)
at org.javahg.internals.HgInputStream.textUpTo(HgInputStream.java:430)
at sonia.scm.repository.spi.javahg.AbstractChangesetCommand.createFromInputStream(AbstractChangesetCommand.java:246)
at sonia.scm.repository.spi.javahg.AbstractChangesetCommand.readListFromStream(AbstractChangesetCommand.java:164)
at sonia.scm.repository.spi.javahg.HgLogChangesetCommand.execute(HgLogChangesetCommand.java:65)
at sonia.scm.repository.spi.HgHookChangesetProvider.handleRequest(HgHookChangesetProvider.java:70)
at sonia.scm.repository.api.HgHookBranchProvider.changesets(HgHookBranchProvider.java:115)
at sonia.scm.repository.api.HgHookBranchProvider.collect(HgHookBranchProvider.java:125)
at sonia.scm.repository.api.HgHookBranchProvider.getDeletedOrClosed(HgHookBranchProvider.java:89)
at sonia.scm.repository.DefaultBranchDeleteProtection.protectDefaultBranch(DefaultBranchDeleteProtection.java:57)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.github.legman.InvocationContext.invoke(InvocationContext.java:108)
at com.github.legman.InvocationContext.proceed(InvocationContext.java:101)
at com.github.legman.micrometer.MicrometerInvocationInterceptor.invoke(MicrometerInvocationInterceptor.java:47)
at com.github.legman.InvocationContext.proceed(InvocationContext.java:99)
at com.github.legman.EventHandler.handleEvent(EventHandler.java:103)
at com.github.legman.SynchronizedEventHandler.handleEvent(SynchronizedEventHandler.java:52)
at com.github.legman.EventBus.dispatchSynchronous(EventBus.java:452)
at com.github.legman.EventBus.dispatch(EventBus.java:446)
at com.github.legman.EventBus.dispatchSynchronousQueuedEvents(EventBus.java:421)
at com.github.legman.EventBus.post(EventBus.java:333)
at sonia.scm.event.LegmanScmEventBus.post(LegmanScmEventBus.java:92)
at sonia.scm.repository.AbstractRepositoryManager.fireHookEvent(AbstractRepositoryManager.java:57)
at sonia.scm.repository.spi.HookEventFacade$HookEventHandler.fireHookEvent(HookEventFacade.java:137)
at sonia.scm.repository.hooks.DefaultHookHandler.fireHook(DefaultHookHandler.java:119)
at sonia.scm.repository.hooks.DefaultHookHandler.handleHookRequest(DefaultHookHandler.java:105)
at sonia.scm.repository.hooks.DefaultHookHandler.handleHookRequest(DefaultHookHandler.java:90)
at sonia.scm.repository.hooks.DefaultHookHandler.run(DefaultHookHandler.java:77)
at sonia.scm.repository.hooks.HookServer.lambda$associateSecurityManager$1(HookServer.java:103)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
at java.base/java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:813)
at org.javahg.internals.Utils.decodeBytes(Utils.java:242)
... 37 common frames omitted
2022-02-23 14:13:05.403 [HgHookWorker-1] [6ASyGjI8F2w] WARN sonia.scm.repository.hooks.DefaultHookHandler - unknown error on hook occurred
java.lang.NullPointerException: null
at sonia.scm.repository.api.HgHookBranchProvider.changesets(HgHookBranchProvider.java:115)
at sonia.scm.repository.api.HgHookBranchProvider.collect(HgHookBranchProvider.java:125)
at sonia.scm.repository.api.HgHookBranchProvider.getDeletedOrClosed(HgHookBranchProvider.java:89)
at sonia.scm.repository.DefaultBranchDeleteProtection.protectDefaultBranch(DefaultBranchDeleteProtection.java:57)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.github.legman.InvocationContext.invoke(InvocationContext.java:108)
at com.github.legman.InvocationContext.proceed(InvocationContext.java:101)
at com.github.legman.micrometer.MicrometerInvocationInterceptor.invoke(MicrometerInvocationInterceptor.java:47)
at com.github.legman.InvocationContext.proceed(InvocationContext.java:99)
at com.github.legman.EventHandler.handleEvent(EventHandler.java:103)
at com.github.legman.SynchronizedEventHandler.handleEvent(SynchronizedEventHandler.java:52)
at com.github.legman.EventBus.dispatchSynchronous(EventBus.java:452)
at com.github.legman.EventBus.dispatch(EventBus.java:446)
at com.github.legman.EventBus.dispatchSynchronousQueuedEvents(EventBus.java:421)
at com.github.legman.EventBus.post(EventBus.java:333)
at sonia.scm.event.LegmanScmEventBus.post(LegmanScmEventBus.java:92)
at sonia.scm.repository.AbstractRepositoryManager.fireHookEvent(AbstractRepositoryManager.java:57)
at sonia.scm.repository.spi.HookEventFacade$HookEventHandler.fireHookEvent(HookEventFacade.java:137)
at sonia.scm.repository.hooks.DefaultHookHandler.fireHook(DefaultHookHandler.java:119)
at sonia.scm.repository.hooks.DefaultHookHandler.handleHookRequest(DefaultHookHandler.java:105)
at sonia.scm.repository.hooks.DefaultHookHandler.handleHookRequest(DefaultHookHandler.java:90)
at sonia.scm.repository.hooks.DefaultHookHandler.run(DefaultHookHandler.java:77)
at sonia.scm.repository.hooks.HookServer.lambda$associateSecurityManager$1(HookServer.java:103)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)

Thanks for help. Petr

@eheimbuch
Copy link
Contributor

Hey @Eddie-cz,

it seems like a null pointer because your changesets could not be resolved properly. Could you possibly provide an example repository so we can reproduce it?

@Eddie-cz
Copy link
Author

test.zip

It's issue with file name encoding. Probably with UTF16 names. One of such is in provided repo. Push fails.

@eheimbuch
Copy link
Contributor

Thank you. We will have a look.

@eheimbuch
Copy link
Contributor

eheimbuch commented Feb 24, 2022

After a closer look, this error does not seem to be so trivial. UTF-16 encoding for filenames does seem to be a problem. When I manually create your example file in a fresh SCM-Manager Mercurial repository, it works fine.

We think that we have to dig deeper but this could take us some days. Mercurial seems not to support UTF-16 file names at all. See here: https://www.mercurial-scm.org/wiki/EncodingStrategy

@Eddie-cz
Copy link
Author

It's not critical for us atm, as we finally know what caused push issues, but definitely appreciate it would be fixed. Thanks.

@stale
Copy link

stale bot commented Mar 30, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issue is stale and will be closed if no further activity occurs label Mar 30, 2022
@pfeuffer pfeuffer added pinned Pinnend issues are not touched by the stale bot and removed stale Issue is stale and will be closed if no further activity occurs labels Mar 30, 2022
@novikov-studio
Copy link

Hi, faced the same problem for subrepo. So I can't use convert-to-git-and-back workaround.

Works for me:

  1. Copy .hg catalog from client pc
  2. Replace \data\repositories<repo_id>\data.hg on server with client's one
  3. Restart SCM
  4. Clone repo from server on client pc

@Eddie-cz
Copy link
Author

Eddie-cz commented Oct 7, 2022

Will there be any progress on this, or should I close this one as you don't care?

@Eddie-cz Eddie-cz changed the title Push fails on remote Push fails on remote due to UTF16 names Oct 7, 2022
@Phil-Ah
Copy link
Contributor

Phil-Ah commented Oct 10, 2022

Hey @Eddie-cz , we are aware of this issue. Unfortunately our team is involved different other topics right now.
As SCM-Manager is open source software anybody may try and fix this. We are open to pull requests if you or someone else is able to contribute.

@codefox42
Copy link

We are experiencing the same exception (using 2.41.0):

2023-02-13 15:20:04.827 [HgHookWorker-48] [6zTVl0hL6C0u] ERROR sonia.scm.repository.spi.HgHookChangesetProvider - could not retrieve changesets
...
2023-02-13 15:20:04.877 [HgHookWorker-48] [6zTVl0hL6C0u] WARN  sonia.scm.repository.hooks.DefaultHookHandler - unknown error on hook occurred

Unfortunately @novikov-studio's workaround did not work for us - or I missed a detail. Any hints/suggestions highly appreciated!

@christophloose
Copy link

Hey all,

unfortunately there is currently no solution from our side. The reasons is that Mercurial does not support UTF16 and in this case we are depending on Mercurial.

@codefox42
Copy link

Hello @christophloose, we finally found a workaround that looks like this:

  1. Export the problematic changeset on the client: hg bundle --rev "outgoing()" mychanges.hg
  2. Transfer the bundle file to the SCM server.
  3. Import the changeset while bypassing the pretxnchangegroup.scm hook: hg --cwd $REPOPATH/data pull mychanges.hg

So from our perspective, Mercurial itself does not have a problem with the encoding. (Btw, the previous generation SCM 1.x was able to cope with those changesets - but I am not sure if this in this case.)

@codefox42
Copy link

After again spending some time on experimenting with SCM and filename with non-ascii characters, I would like to summarize the current situation (or at least my understanding of it):

  • Creating a file test-äöü.txt directly in SCM (using the Editor plugin):

    • It is displayed correctly in the Web-UI (repository browser).
    • When cloned to a Linux system, the filename is correctly displayed in the console.
    • When cloned to a Windows system, the filename results in mojibake.
  • Creating a file test-äöü.txt on a Windows system:

    • The SCM server does not accept the changeset (see logs above).
    • The Web-UI (repository browser) displays an error for the affected directory.
    • When cloned to a Linux system, the filename is displayed like 'test-'$'\344\366\374''.txt' (not far from mojibake).

Like @christophloose, I come to the conclusion, that the root cause is on Mercurial's side, somewhere when the file is added to the repository on a Windows system. Although Mercurial can cope with this (i.e. sharing code between Windows systems works without problems), it seems to break SCM's javahg implementation.

@iret-de
Copy link

iret-de commented Apr 18, 2024

We found the same issue when switching from SCM 1.x to SCM 2.x. After fiddling around we found:

Simple Workaround

  • Choose encoding "cp1252" in scm-managers repository settings
    • --> pushing filenames with non-ascii character will work now from windows and linux
  • Be aware
    • SCM servers Web-UI will display strange characters for Linux filenames that contain UTF-8 characters
    • switching back to encoding "" may break SCM servers Web-UI for folders containing filenames with cp1252 encoding

More findings

  • The biggest problem is for unexperienced windows-users:
    • They are not aware of a problem with non-ASCII characters
    • The problem appears only after pushing (which may be hours after the commit) and the only solution is to clone the original repository again and restart committing the changes
  • From my point of view the root of the problem is using non-ascii characters in filenames at all...
  • ..followed by a fault of SCM servers javahg: This process should at least accept every encoding, even if it results in strange looking filennames

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pinned Pinnend issues are not touched by the stale bot
Projects
None yet
Development

No branches or pull requests

8 participants