Skip to content

feat: support remote snakefile via gh/gl shorthand#4161

Open
clelange wants to merge 26 commits into
snakemake:mainfrom
clelange:split/remote-snakefile-shorthand
Open

feat: support remote snakefile via gh/gl shorthand#4161
clelange wants to merge 26 commits into
snakemake:mainfrom
clelange:split/remote-snakefile-shorthand

Conversation

@clelange
Copy link
Copy Markdown
Contributor

@clelange clelange commented Apr 20, 2026

I had the impression that #4077 was too big to be reviewed and that the preference was to initially only support the gh/gl shorthand. I've therefore extracted this part out so that it can be reviewed more easily and eventually be merged. I think that almost everything has been reviewed already, but please take another look.

This PR adds support for using remote Snakefiles via the explicit hosted-source shorthand syntax:

  • gh:owner/repo@ref:path/to/Snakefile
  • gl:group/project@ref:path/to/Snakefile

The trailing path remains optional and defaults to workflow/Snakefile.

The explicit shorthand form should be unambiguous and avoids the review concerns around parsing full hosted URLs whose refs may contain /. Generic remote Snakefile execution via plain HTTP/HTTPS URLs still remains supported; this PR only avoids the provider-specific inference from full GitHub/GitLab URLs.

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).
  • I, as a human being, have checked each line of code in this pull request

AI-assistance disclosure

I used AI assistance for:

  • Code generation (e.g., when writing an implementation or fixing a bug)
  • Test/benchmark generation
  • Documentation (including examples)
  • Research and understanding

Summary by CodeRabbit

  • New Features

    • Support for remote Snakefiles via HTTP/HTTPS and hosted-source shorthands (gh:, gl:) with defaulting to workflow/Snakefile when path omitted.
  • Improvements

    • CLI, API and GUI accept and preserve snakefile URIs/strings; include paths resolve relative to remote Snakefiles; cache keys and hosted-source handling more robust (URL-encoding/ref sensitivity). Remote -s fetches only directly referenced files.
  • Documentation

    • Expanded docs and examples for shorthand syntax, default path behavior, remote resolution scope.
  • Tests

    • Added tests and a local HTTP test-server helper for remote Snakefile scenarios.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds remote Snakefile support and hosted-source shorthand (gh:/gl:); accepts snakefile as local Path or URI across API/CLI/GUI; implements hosting-provider inference, raw-URL generation with encoded refs/paths and adjusted cache keys; adds tests and an HTTP test-server helper.

Changes

Cohort / File(s) Summary
Documentation
docs/executing/cli.rst, docs/snakefiles/deployment.rst, docs/snakefiles/modularization.rst
Document remote --snakefile usage, hosted-source shorthand (gh:/gl:) with default workflow/Snakefile, include resolution relative to remote Snakefile, hosted-source convenience syntax, and note on limited retrieval vs full deployment.
API Layer
src/snakemake/api.py
resolve_snakefile() now accepts `Path
CLI Argument Handling
src/snakemake/cli.py
Removed type=Path from --snakefile/-s parser; building of workflow profile candidate paths only performed when snakefile is a Path.
GUI Integration
src/snakemake/gui.py
Import and use is_local_file; store absolute local snakefile path in app.extensions["snakefilepath"] when local, otherwise preserve remote/string; added defensive exception handling around is_local_file().
Source inference & URL generation
src/snakemake/sourcecache.py
Added shorthand parsers for gh:/gl: forms (optional host/ref/path), default path to workflow/Snakefile, URL-decode parsed pieces, URL-encode ref/path in raw-content URLs, choose raw URL format by host, change hosted-repo cache key to include provider and host, and prefer hosting-provider inference before treating as local.
Tests — helpers
tests/common.py
Added QuietHTTPRequestHandler and serve_directory(path: Path) context manager to serve a directory over an ephemeral local HTTP server for tests.
Tests — coverage
tests/test_api.py, tests/test_sourcecache.py, tests/tests.py
Added tests for remote Snakefile execution via API/CLI, shorthand URI preservation (API and GUI), comprehensive gh:/gl: parsing and cache-key/encoding behavior, and negative/fallback cases.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI as CLI / SnakemakeApi
    participant Resolver as resolve_snakefile
    participant Inferrer as infer_source_file
    participant Provider as HostingProviderFile
    participant Remote as Remote Host
    participant FS as Local Filesystem

    User->>CLI: request workflow(snakefile = "gh:owner/repo@ref[:path]" / URL / Path)
    CLI->>Resolver: resolve_snakefile(snakefile)
    Resolver->>Inferrer: infer_source_file(input)
    Inferrer->>Provider: parse shorthand or recognize URL
    alt shorthand or remote URL
        Provider->>Remote: construct raw content URL (encode ref/path) and GET
        Remote-->>Provider: Snakefile content or 200
        Provider-->>Resolver: SourceFile (URI)
    else local Path
        Inferrer->>FS: verify/resolve local Path
        FS-->>Resolver: Path
    end
    Resolver-->>CLI: resolved snakefile (Path | URI)
    CLI->>User: execute workflow / build DAG with resolved includes
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.16% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: support remote snakefile via gh/gl shorthand' accurately and specifically summarizes the main feature added—support for remote Snakefiles using gh/gl shorthand syntax.
Description check ✅ Passed The description fully addresses all required template sections: it explains the feature being added, confirms test coverage exists, confirms documentation was updated, and discloses all AI assistance used (code generation, tests, documentation, and research).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/snakemake/cli.py (1)

546-554: ⚠️ Potential issue | 🟡 Minor

Mention URLs and gh:/gl: shorthand in --snakefile help.

The parser now accepts non-local Snakefile strings, but the help still says “FILE”/“snakefile” only, making the new CLI feature hard to discover.

Proposed help text update
         help=(
-            "The workflow definition in form of a snakefile. "
+            "The workflow definition as a local Snakefile, HTTP(S) URL, "
+            "or hosted-source shorthand like gh:owner/repo@ref:path. "
             "Usually, you should not need to specify this. "
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/snakemake/cli.py` around lines 546 - 554, Update the --snakefile argument
help (the help string built around SNAKEFILE_CHOICES in src/snakemake/cli.py) to
mention that the argument accepts non-local sources: URLs (http/https),
GitHub/GitLab shorthand (gh:owner/repo[:path] and gl:owner/repo[:path]), and
local file paths; keep the existing sentence about default search order using
SNAKEFILE_CHOICES but append a short example of the allowed non-local forms so
users discover the feature when running --help for --snakefile.
🧹 Nitpick comments (2)
tests/test_api.py (1)

87-91: Consider also asserting the full expected output set matches.

Current check only verifies each expected file exists and has the right MD5, but does not catch extra unexpected outputs. Optional; the existing test_remote_snakefile_multiple_includes in tests/tests.py already covers equivalence via run(...), so this is nice-to-have.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_api.py` around lines 87 - 91, The test currently iterates
get_expected_files(expected_results) and checks existence and MD5 but doesn't
fail on extra files; after validating each expected file (using
get_expected_files, workdir, expected_results, md5sum), compute the actual set
of relative file paths under workdir (e.g., via workdir.rglob and
relative_to(workdir) for files) and assert that set(actual_relpaths) ==
set(get_expected_files(expected_results)) so any unexpected outputs cause the
test to fail.
docs/executing/cli.rst (1)

40-40: Minor: "snakedeploy" link text / positioning.

The inline snakedeploy link inside a .. note:: is fine, but the phrasing "usually more convenient" reads slightly awkward next to "it will only retrieve the snakefiles and everything directly referred by them". Consider "…and anything directly referenced by them…" for readability. Non-blocking nit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/executing/cli.rst` at line 40, Update the note sentence that currently
reads "will only retrieve the snakefiles and everything directly referred by
them, allowing an ad-hoc run." to improve readability by replacing "everything
directly referred by them" with "anything directly referenced by them" and keep
the inline snakedeploy link attached to the callout "use snakedeploy" (i.e.,
ensure the link remains on the phrase "snakedeploy" following "use") so the
sentence becomes clearer while preserving the link target.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/snakefiles/deployment.rst`:
- Around line 89-92: The documentation incorrectly implies that adding an
explicit path enables persistent caching; instead update the paragraph that
discusses the module dna_seq (tag v2.0.1) and the shorthand examples
(gh:snakemake-workflows/dna-seq-gatk-variant-calling@v2.0.1 and
gh:snakemake-workflows/dna-seq-gatk-variant-calling@v2.0.1:workflow/Snakefile)
to state that cacheability is determined by using an immutable reference (a tag
or commit), not by including an explicit path, and clarify that both shorthand
forms shown are cacheable because they use the tag v2.0.1 while the presence or
absence of the path does not affect caching.

In `@docs/snakefiles/modularization.rst`:
- Around line 399-402: Add a short paragraph after the hosted source syntax
examples explaining that using the provider shorthand (e.g.,
gh:owner/repo@ref:path/to/Snakefile or gl:group/project@ref:path/to/Snakefile)
is preferred when refs (branch or tag names) may contain slashes, because
attempting to parse full hosted/raw URLs (including
GitHub/raw.githubusercontent.com URLs) to infer a GithubFile is fundamentally
ambiguous when refs contain “/”; document that full URLs remain generic and can
be ambiguous and therefore recommend the provider-prefixed shorthand as the
unambiguous form.

In `@src/snakemake/gui.py`:
- Around line 74-77: The direct call to is_local_file(args.snakefile) can raise
for URI-like inputs and crash GUI registration; wrap the call in a try/except
and on any exception treat the value as a remote path (i.e., set
app.extensions["snakefilepath"] = args.snakefile). Specifically, modify the
block that sets app.extensions["snakefilepath"] to call
is_local_file(args.snakefile) inside a try, use os.path.abspath(...) when it
returns True, and in the except branch (or when it returns False) assign the
original args.snakefile so remote/URI values are preserved; reference
is_local_file and the app.extensions["snakefilepath"] assignment to locate the
code.

In `@src/snakemake/sourcecache.py`:
- Around line 852-855: The cache key for hosted repositories only uses self.repo
and can collide across different providers/hosts; update the hosted_repo cache
to include provider and host so clones are unique per (provider, host, repo).
Locate the hosted_repo method/attribute and change the cache dictionary key
generation (where self.repo is currently used) to include self.provider (or
provider) and self.host (e.g., use a tuple like (self.provider, self.host,
self.repo) or a concatenated string) so that GithubFile and GitlabFile instances
with different hosts/providers do not share the same entry; ensure any lookups
and inserts use the same new composite key.
- Around line 662-668: The cache key collision comes from encoding the ref in
GithubFile.get_path_or_uri() (turning "perf/autobump" into "perf%2Fautobump")
while SourceFile.get_cache_path() later unquotes the URL path, making
"perf%2Fautobump" and "perf:autobump" collapse; fix by preserving encoded ref
when constructing cache keys instead of unquoting, i.e. change
SourceFile.get_cache_path() to use the already-encoded ref from
GithubFile.get_path_or_uri() (or re-encode the ref using quote(safe="") before
building the cache path) so slashes in refs remain %2F and distinct; update any
unquote(...) call that currently normalizes the URL path to avoid reverting
encoded slashes.

In `@tests/common.py`:
- Around line 46-47: The log_message method in tests/common.py currently uses
the parameter name "format" which shadows the built-in and triggers Ruff A002;
rename that parameter (e.g., to "fmt" or "message_format") in the def
log_message(self, ...) signature and update all usages inside the log_message
function (and any local references) to the new name so calls and behavior remain
identical while avoiding the builtin shadowing.

---

Outside diff comments:
In `@src/snakemake/cli.py`:
- Around line 546-554: Update the --snakefile argument help (the help string
built around SNAKEFILE_CHOICES in src/snakemake/cli.py) to mention that the
argument accepts non-local sources: URLs (http/https), GitHub/GitLab shorthand
(gh:owner/repo[:path] and gl:owner/repo[:path]), and local file paths; keep the
existing sentence about default search order using SNAKEFILE_CHOICES but append
a short example of the allowed non-local forms so users discover the feature
when running --help for --snakefile.

---

Nitpick comments:
In `@docs/executing/cli.rst`:
- Line 40: Update the note sentence that currently reads "will only retrieve the
snakefiles and everything directly referred by them, allowing an ad-hoc run." to
improve readability by replacing "everything directly referred by them" with
"anything directly referenced by them" and keep the inline snakedeploy link
attached to the callout "use snakedeploy" (i.e., ensure the link remains on the
phrase "snakedeploy" following "use") so the sentence becomes clearer while
preserving the link target.

In `@tests/test_api.py`:
- Around line 87-91: The test currently iterates
get_expected_files(expected_results) and checks existence and MD5 but doesn't
fail on extra files; after validating each expected file (using
get_expected_files, workdir, expected_results, md5sum), compute the actual set
of relative file paths under workdir (e.g., via workdir.rglob and
relative_to(workdir) for files) and assert that set(actual_relpaths) ==
set(get_expected_files(expected_results)) so any unexpected outputs cause the
test to fail.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9017739d-5007-44eb-8eb5-64d456dadcae

📥 Commits

Reviewing files that changed from the base of the PR and between d8faabd and d656631.

📒 Files selected for processing (11)
  • docs/executing/cli.rst
  • docs/snakefiles/deployment.rst
  • docs/snakefiles/modularization.rst
  • src/snakemake/api.py
  • src/snakemake/cli.py
  • src/snakemake/gui.py
  • src/snakemake/sourcecache.py
  • tests/common.py
  • tests/test_api.py
  • tests/test_sourcecache.py
  • tests/tests.py

Comment thread docs/snakefiles/deployment.rst Outdated
Comment thread docs/snakefiles/modularization.rst
Comment thread src/snakemake/gui.py
Comment thread src/snakemake/sourcecache.py
Comment thread src/snakemake/sourcecache.py
Comment thread tests/common.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/snakemake/sourcecache.py (2)

714-717: ⚠️ Potential issue | 🟠 Major

Include provider and host in the hosted-repo cache key.

These shorthand constructors can now produce the same repo on different providers/hosts, but HostingProviderFile.hosted_repo still caches by self.repo only. The second source can reuse the first clone and read from the wrong remote.

Proposed fix
-from typing import TYPE_CHECKING, Any, ClassVar, Dict, List, Optional
+from typing import TYPE_CHECKING, Any, ClassVar, Dict, List, Optional, Tuple
@@
-    _hosted_repos: ClassVar[Dict[str, HostedGitRepo]] = {}
+    _hosted_repos: ClassVar[Dict[Tuple[type, str, str], HostedGitRepo]] = {}
@@
     `@property`
     def hosted_repo(self) -> HostedGitRepo:
+        assert self.host is not None
+        cache_key = (self.__class__, self.host, self.repo)
         try:
-            return self._hosted_repos[self.repo]
+            return self._hosted_repos[cache_key]
         except KeyError:
-            assert self.host is not None
             if self.host.startswith("https://") or self.host.startswith("http://"):
                 raise WorkflowError(
@@
             # Ensure that multiple threads don't concurrently create the same instance
             with self._lock:
-                hosted_repo = HostedGitRepo(
-                    self.repo, self.cache_path, self.auth, self.host
-                )
-                self._hosted_repos[self.repo] = hosted_repo
-            return hosted_repo
+                try:
+                    return self._hosted_repos[cache_key]
+                except KeyError:
+                    hosted_repo = HostedGitRepo(
+                        self.repo, self.cache_path, self.auth, self.host
+                    )
+                    self._hosted_repos[cache_key] = hosted_repo
+                    return hosted_repo
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/snakemake/sourcecache.py` around lines 714 - 717, The hosted-repo cache
is keyed only by self.repo, so shorthand constructors (GithubFile/GitlabFile)
that differ by provider or host can collide; update the caching key used by
hosted_repo to include provider and host (e.g., use a tuple like (self.repo,
self.provider, self.host) or equivalent) and ensure GithubFile and GitlabFile
instances expose the provider/host attributes used in that key so clones for
different providers/hosts are cached separately.

636-642: ⚠️ Potential issue | 🟠 Major

Keep slashed refs distinct in hosted-file cache paths.

ref is encoded here, but SourceFile.get_cache_path() later unquotes the URI path. That can make gh:o/r@perf/autobump:path/to/Snakefile share a cache entry with gh:o/r@perf:autobump/path/to/Snakefile.

Proposed fix
-from urllib.parse import unquote
+from urllib.parse import quote, unquote
@@
 class HostingProviderFile(SourceFile):
@@
     def get_filename(self):
         return os.path.basename(self.path)
 
+    def get_cache_path(self):
+        assert self.host is not None
+        return os.path.join(
+            "hosted-git",
+            self.__class__.__name__.lower(),
+            self.host,
+            quote(self.repo, safe="/"),
+            quote(self.ref, safe=""),
+            quote(self.path, safe="/"),
+        )
+
@@
     def get_path_or_uri(self, secret_free: bool) -> str:
-        from urllib.parse import quote
-
         auth = f":{self.token}@" if self.token and not secret_free else ""

Verification:

#!/bin/bash
# Demonstrate how the current quote-then-unquote cache-key path collapses
# a slashed ref into the same path as a shorter ref plus deeper file path.
python - <<'PY'
from urllib.parse import quote, unquote
import os

def current_cache_suffix(ref, path):
    uri_path = f"raw.githubusercontent.com/o/r/{quote(ref, safe='')}/{quote(path, safe='/')}"
    return os.path.join("https", unquote(uri_path.lstrip("/")))

a = current_cache_suffix("perf/autobump", "path/to/Snakefile")
b = current_cache_suffix("perf", "autobump/path/to/Snakefile")

print(a)
print(b)
assert a == b, "Expected current cache-key derivation to collide"
PY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/snakemake/sourcecache.py` around lines 636 - 642, The cache collision
happens because ref is percent-encoded (quote(self.ref, safe="")) then later
unquoted, turning %2F back into '/' and collapsing distinct refs into the same
cache path; change the encoding so slashes in the ref remain literal path
separators by using quote(self.ref, safe="/") instead of safe="" in the code
that builds the hosted-file URL (the block that sets ref = quote(self.ref, ...)
and path = quote(self.path, ...)); this preserves slashed refs as distinct path
segments when SourceFile.get_cache_path() later unquotes the URI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/snakemake/sourcecache.py`:
- Around line 714-717: The hosted-repo cache is keyed only by self.repo, so
shorthand constructors (GithubFile/GitlabFile) that differ by provider or host
can collide; update the caching key used by hosted_repo to include provider and
host (e.g., use a tuple like (self.repo, self.provider, self.host) or
equivalent) and ensure GithubFile and GitlabFile instances expose the
provider/host attributes used in that key so clones for different
providers/hosts are cached separately.
- Around line 636-642: The cache collision happens because ref is
percent-encoded (quote(self.ref, safe="")) then later unquoted, turning %2F back
into '/' and collapsing distinct refs into the same cache path; change the
encoding so slashes in the ref remain literal path separators by using
quote(self.ref, safe="/") instead of safe="" in the code that builds the
hosted-file URL (the block that sets ref = quote(self.ref, ...) and path =
quote(self.path, ...)); this preserves slashed refs as distinct path segments
when SourceFile.get_cache_path() later unquotes the URI.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 32591b33-601f-4b4e-aaa7-f30a683209f1

📥 Commits

Reviewing files that changed from the base of the PR and between d656631 and e3f2fcc.

📒 Files selected for processing (1)
  • src/snakemake/sourcecache.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/snakemake/sourcecache.py`:
- Around line 696-701: In the shorthand parsing block (variables first_colon,
first_slash, host, remainder), ensure an empty host (e.g., "gh::owner/repo") is
not accepted: change the logic so after finding first_colon you extract
candidate = remainder[:first_colon] and only set host = candidate and advance
remainder = remainder[first_colon+1:] if candidate is non-empty; if candidate is
empty, treat it as no custom host (set host = None) and do not consume the colon
so the shorthand remains valid against the default host handling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f9167ee6-e584-43fd-b048-abd73a58222b

📥 Commits

Reviewing files that changed from the base of the PR and between 1b6faf7 and 52805ee.

📒 Files selected for processing (7)
  • docs/snakefiles/deployment.rst
  • docs/snakefiles/modularization.rst
  • src/snakemake/gui.py
  • src/snakemake/sourcecache.py
  • tests/common.py
  • tests/test_api.py
  • tests/test_sourcecache.py
✅ Files skipped from review due to trivial changes (2)
  • docs/snakefiles/modularization.rst
  • docs/snakefiles/deployment.rst
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/snakemake/gui.py
  • tests/common.py
  • tests/test_sourcecache.py

Comment thread src/snakemake/sourcecache.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/snakemake/sourcecache.py (1)

733-734: Trivial wrapper.

_infer_hosting_provider_file currently just forwards to _infer_hosting_provider_file_shorthand. If you don't plan to add other inference branches in this PR, you can inline the call at line 749 and drop this function.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/snakemake/sourcecache.py` around lines 733 - 734, The function
_infer_hosting_provider_file is a trivial passthrough to
_infer_hosting_provider_file_shorthand; remove the wrapper and replace its call
sites with direct calls to _infer_hosting_provider_file_shorthand(path_or_uri)
(preserving the Optional[HostingProviderFile] return expectation), then delete
the _infer_hosting_provider_file definition to avoid dead indirection.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/snakemake/sourcecache.py`:
- Around line 705-730: The parser currently allows an empty ref (e.g.,
gh:owner/repo@:path) because ref is set to "" when ref_cand is empty; update the
shorthand parsing around ref_cand/ref (after ref_path.rpartition(":")) to reject
an empty ref by returning None when sep is truthy but ref_cand == "" (or
equivalently when ref == ""); this prevents creating GithubFile/GitlabFile with
branch="" and lets _check_git_args see a missing branch instead of an empty
string.
- Around line 751-754: The except block that calls is_local_file(path_or_uri)
currently swallows NotImplementedError, ImportError, and ValueError and silently
treats the path as non-local; update the handling in sourcecache.py so that when
is_local_file raises one of these exceptions you either (a) log the exception at
debug level (including the exception type and message) before setting
is_local=False to preserve diagnostics, or (b) add a clear comment above the
try-except explaining that malformed/unsupported URIs intentionally fall back to
GenericSourceFile; reference is_local_file, path_or_uri, and the fallback to
GenericSourceFile in your change so reviewers can find the location easily.

---

Nitpick comments:
In `@src/snakemake/sourcecache.py`:
- Around line 733-734: The function _infer_hosting_provider_file is a trivial
passthrough to _infer_hosting_provider_file_shorthand; remove the wrapper and
replace its call sites with direct calls to
_infer_hosting_provider_file_shorthand(path_or_uri) (preserving the
Optional[HostingProviderFile] return expectation), then delete the
_infer_hosting_provider_file definition to avoid dead indirection.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5aeab04e-21c6-4e49-ba3d-9abe6994eed9

📥 Commits

Reviewing files that changed from the base of the PR and between 52805ee and 74b576d.

📒 Files selected for processing (2)
  • src/snakemake/sourcecache.py
  • tests/test_sourcecache.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_sourcecache.py

Comment thread src/snakemake/sourcecache.py
Comment thread src/snakemake/sourcecache.py Outdated
@clelange
Copy link
Copy Markdown
Contributor Author

FYI @matthewfeickert @lukasheinrich

Copy link
Copy Markdown
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Clemens! Thanks a lot, this is very nice. Please find my comments below.

Comment thread src/snakemake/sourcecache.py Outdated
# Unsupported or malformed URIs should fall through to GenericSourceFile below.
try:
is_local = is_local_file(path_or_uri)
except (NotImplementedError, ImportError, ValueError):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would these exceptions occur in is_local_file? It worries me that we have to keep this in mind everytime we use is_local_file (this PR does it above as well). Ideally, is_local_file should not throw exceptions other than stuff caused by bugs, which in turn should not need to be handled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I changed is_local_file() so it behaves as a direct predicate for normal path/URI inputs instead of depending on smart_open.parse_uri(). Remote or unsupported schemes such as gh:, gl:, and s3:// now return False without callers needing to catch transport/parser exceptions.

Comment thread src/snakemake/api.py Outdated
if isinstance(path, str):
try:
is_local = is_local_file(path)
except (NotImplementedError, ImportError, ValueError):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment on the other try-except around is_local_file below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed via the same helper change. resolve_snakefile() now calls is_local_file() directly and preserves remote strings unchanged.

Comment thread src/snakemake/gui.py Outdated
app.extensions["snakefilepath"] = os.path.abspath(args.snakefile)
try:
is_local = is_local_file(args.snakefile)
except (NotImplementedError, ImportError, ValueError):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed via the same helper change. The GUI path handling now calls is_local_file() directly and keeps remote snakefile values unchanged.

Comment thread src/snakemake/sourcecache.py Outdated
self._hosted_repos[self.repo] = hosted_repo
return hosted_repo
try:
return self._hosted_repos[cache_key]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the second try here to get the object from the cache dict? Didn't we do that before in line 515?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this to make the intent clearer. The second lookup is still there, now in the form of .get() inside the lock, to handle the case where another thread populates the cache after the initial fast-path miss but before this thread acquires _lock. That preserves one HostedGitRepo instance per cache key. If you prefer removing the inner check entirely, let me know.

Comment thread src/snakemake/sourcecache.py Outdated
Comment on lines +746 to +747
def _infer_hosting_provider_file(path_or_uri: str) -> Optional[HostingProviderFile]:
return _infer_hosting_provider_file_shorthand(path_or_uri)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this function if it just delegates?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I removed the wrapper and now call _infer_hosting_provider_file_shorthand() directly.

@clelange clelange force-pushed the split/remote-snakefile-shorthand branch from b633509 to b042c7c Compare May 23, 2026 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants