Skip to content

Conversation

knguyen1
Copy link
Contributor

@knguyen1 knguyen1 commented Aug 14, 2025

Summary

  • This is a…
    • Bug fix
    • Feature addition
    • Refactoring
    • Minor / simple change (like a typo)
    • Other
  • Describe this change in 1-2 sentences:
    • Fix drag-and-drop failures for scheme-less paths and address macOS NFC/NFD path mismatches causing accented folders/files to be ignored or fail to open.

See: https://bugreports.qt.io/browse/QTBUG-19827

And see: https://bugreports.qt.io/browse/QTBUG-22382

toLocalFile returning blank is intended behaviour.

Problem

  • JIRA ticket (optional): PICARD-3041
  • Drag sources sometimes provide scheme-less paths, making QUrl.toLocalFile() return empty and dropping files no-op.
  • On macOS (incl. sshfs/FUSE), Unicode normalization (NFC/NFD) mismatches caused ENOENT and “Add Files/Add Folder” to gray out accented directories.

Solution

  • picard/ui/itemviews/basetreeview.py: In drop_urls, use url.toLocalFile() or url.path() and canonicalize the result.
  • picard/util/__init__.py: Add canonicalize_path:
    • Trims trailing NULs, normalizes, resolves paths non-strictly.
    • macOS: component-wise NFC/NFD resolution against filesystem.
    • Windows: preserves long-path handling (\\?\\ prefix) when needed.
  • picard/ui/mainwindow/__init__.py: Canonicalize paths from “Add Files” and “Add Folder” before use.
  • Tests:
    • test/test_path_handling.py: DnD coverage (scheme-less and file://, move vs add, null trimming, HTTP handling).
    • test/test_canonicalize_path.py: Canonicalization (NULs, normalization, relative/absolute, Windows long-path behavior).
    • test/test_resolve_path_components_macos.py (macOS-only): NFC/NFD resolution per component.

Action

Additional actions required:

  • Update Picard documentation (optional reference to new path canonicalization behavior)
  • Other (please specify below)

Notes:

  • Follow-up PRs can extend canonicalization to other entry points (e.g., cover art local files, open_local_path).

@@ -475,16 +483,38 @@ def scrollTo(self, index, scrolltype=QtWidgets.QAbstractItemView.ScrollHint.Ensu
def drop_urls(urls, target, move_to_multi_tracks=True):
files = []
new_paths = []
tagger = QtCore.QCoreApplication.instance()
# We need to cast to Tagger here to avoid type checking errors below.
tagger = cast("Tagger", QtCore.QCoreApplication.instance())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, but there are more places where we should do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert or no? It was annoying me to get error intellisense highlights without it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I recommend ty, when the project comes out of pre-release: https://github.com/astral-sh/ty It's from the same authors as ruff and uv.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't do this. As Python is not statically typed there is no need for such casts during runtime.

But getting the main application instance is a typical case. Maybe we should provide our own helper method for this. This could have the appropriate type hint set for the return value.

@zas zas requested a review from phw August 15, 2025 09:50
Copy link
Member

@phw phw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the time right now to review the code in detail. But I have my concerns about the general approach for fixing a very specific issue reported on macOS.

The approach of verifying every path component in different Unicode normalizations against the file system by iterating over the parent directory appears to me to be a rather big performance issue. Specifically when doing so over a virtual filesystem backed by an SSH connection (which is the case for PICARD-3041). And it is not even clear to me, whether this change indeed fixes the reported issue.

In general our file handling is already rather complex, due to considering multiple special cases across operating systems. We already do path normalization at various parts, e.g. with the normpath function that also handles Windows long paths.

This change seems to me to add yet another layer to this complexity, while I think the issue behind PICARD-3041 might be related to some path handling we already do. My main suspects would be the encode_filename / decode_filename helpers that are not consistently used everywhere and also might lead to those normalization issues.

If we actually end up in situations where we need to try different Unicode normalizations, maybe we should only do this in the error case?

I'm not generally against this change, and if you all convince me that this is all needed to fix path issues on macOS then we maybe need to go this approach. But right now it does seem to add a lot of complexity.

@knguyen1
Copy link
Contributor Author

knguyen1 commented Aug 15, 2025

And it is not even clear to me, whether this change indeed fixes the reported issue.

It's not just OP's macFUSE/sshfs issue; what this will fix though is for programmes that supply schema-less URL's. QT people already made it clear they're not going to fix it. Like you, their concern was complexity and who's really responsible? I mean yeah, it would be wonderful if all apps provide full URI for files.

>>> from PyQt6.QtCore import QUrl
>>> url = QUrl("/User/music/file.mp3")
>>> url.toLocalFile()
''
>>> url.path()
'/User/music/file.mp3'
>>> url.scheme()
''
>>> 

My main suspects would be the encode_filename / decode_filename helpers that are not consistently used everywhere and also might lead to those normalization issues.

I was going to take on those next. Introducing a canonicalisation concept before encode/decode fn made more sense.

According to the LLM I'm using, there's actually a lot more to do in terms of handling non-unicode systems better.

Next tasks to standardize non‑Unicode/locale handling

  • Replace legacy helpers

    • Deprecate encode_filename/decode_filename. Introduce small I/O helpers:
      • to_io_path(x: str|bytes) -> str|bytes (pass through str; use os.fsencode only if the callee strictly needs bytes).
      • open_safe(path, ...) wrapper that logs both original and canonicalized candidates on failure.
  • Centralize canonicalization

    • Use canonicalize_path at all UI/file entry points: cover art local files, “Open Containing Folder”, drag-and-drop, clipboard paste, remote command handlers, plugin file inputs.
    • Add it to open_local_path and any path constructed from QUrl.
  • Mutagen and formats

    • Ensure formats pass str paths to Mutagen whenever possible; remove any lossy .encode(..., 'replace').
    • At write time, if a path fails due to locale/FS constraints, retry with a sanitized fallback name and surface a clear message.
  • POSIX bytes path support

    • Allow bytes paths to flow through low-level scanning (_scan_paths_recursive) and I/O when needed on non‑UTF‑8 locales; keep UI strings as str.
  • Locale and normalization strategy

    • On startup, detect problematic locales (e.g., ASCII, ISO‑8859‑1) and warn with actionable guidance.
    • macOS: expose a setting to enable/disable component-wise NFC/NFD resolution and log when it alters a path.
  • Error reporting and logging

    • Standardize path logging to include:
      • Original input, canonicalized result, NFC/NFD variants (macOS), exists/isdir checks.
      • Use safe repr for undecodable bytes without raising.
  • Test coverage and CI

    • Add matrix jobs simulating non‑UTF‑8 locales on Linux; run bytes-path tests on POSIX.
    • Unit tests for all entry points wired to canonicalize_path.
    • Mocked sshfs normalization test to emulate NFD exposure without a real mount.
  • Documentation and settings

    • Update docs on how Picard handles encodings/normalization, troubleshooting (sshfs, locales), and new settings/toggles if added.
  • Type hints and interfaces

    • Annotate path-facing APIs as str | bytes | os.PathLike where applicable; keep UI layers str-only, I/O wrappers accept both.
  • Gradual migration

    • Grep and replace remaining direct os.path usage where normalization/resolution is expected; route through canonicalize_path and I/O wrappers.

@phw
Copy link
Member

phw commented Aug 16, 2025

How exactly is the fact that we get a path and not a file URI the cause of the issue in the ticket? The functionality should generally stay the same, with or without the file:// scheme.

My main concern with this PR is really the macOS code that tries to resolve a misencoded filename against the filesystem. This feels to me as it is working around the symptoms rather to address the root cause, which is that we end up with a misencoded file path for some reason.

At a minimum we should:

  • measure the performance impact on file loading with this change
  • reproduce the issue in the ticket and ensure it is fixed

@zas
Copy link
Collaborator

zas commented Aug 16, 2025

My main concern with this PR is really the macOS code that tries to resolve a misencoded filename against the filesystem. This feels to me as it is working around the symptoms rather to address the root cause, which is that we end up with a misencoded file path for some reason.

I have the same concern regarding this patch.

@knguyen1
Copy link
Contributor Author

The functionality should generally stay the same, with or without the file:// scheme.

I don't follow. In a schema-less URL, toLocalFile (current behaviour) returns '', as referenced in https://bugreports.qt.io/browse/QTBUG-22382 Are you saying it's already possible for picard to add schema-less files? The unit tests in this PR seems to indicate no...

This feels to me as it is working around the symptoms rather to address the root cause, which is that we end up with a misencoded file path for some reason.

The work for which should be its own epic, to make picard play well with non-unicode FS and locales. I'm wondering if the juice is even worth the squeeze. There's a handful user complaints on JIRA that is yet to be resolved (mutagen errors, posix bytes-encoded paths, etc.), well, including this one...

@phw
Copy link
Member

phw commented Aug 16, 2025

I don't follow. In a schema-less URL, toLocalFile (current behaviour) returns '', as referenced in https://bugreports.qt.io/browse/QTBUG-22382 Are you saying it's already possible for picard to add schema-less files? The unit tests in this PR seems to indicate no...

Generally yes of course (e.g. if passed form command line). Inside the specific code path of dropping in BaseTreeView.drop_urls not, and your fixes there to handle this there make totally sense. I just have no indication that this actually happened in practice, nor does it seem to be related to PICARD-3041.

We should of course fix this handling, and accept schema less "URL" in this drop action.

My concerns are more about the path canonicalization code, specifically on macOS, and the PR claiming to solve PICARD-3041 (which it might or might not do).

@knguyen1
Copy link
Contributor Author

knguyen1 commented Aug 16, 2025

@phw Let me take some time to digest this. Maybe canonicalise_path is too much and too unnecessary for now. We need OP to do a live testing.

My concerns are more about the path canonicalization code, specifically on macOS, and the PR claiming to solve PICARD-3041 (which it might or might not do).

The reason canonicalize_path is part of this PR is because OP complained about the fileviewer greying out his Verite folder. Which I suspected it had to do with NFC/NFD normalisation.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants