Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: bowtie2 can now sort with samtools and picard #3518

Open
wants to merge 36 commits into
base: master
Choose a base branch
from

Conversation

jlanga
Copy link
Contributor

@jlanga jlanga commented Dec 3, 2024

This PR shamelessly copies functionalities from bwa-mem2 to sort or not, do it by coordinate or queryname, and choose between samtools and picard to do so

QC

While the contributions guidelines are more extensive, please particularly ensure that:

  • test.py was updated to call any added or updated example rules in a Snakefile
  • input: and output: file paths in the rules can be chosen arbitrarily
  • wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:)
  • temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to
  • the meta.yaml contains a link to the documentation of the respective tool or command under url:
  • conda environments use a minimal amount of channels and packages, in recommended ordering

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced configurability for the Bowtie2 alignment tool with new sorting parameters.
    • Added support for additional font packages to improve visual output.
    • Introduced dynamic rule generation for various alignment configurations in the testing framework.
    • Updated dependencies for Bowtie2 and added new packages for improved functionality, including graphics and font libraries.
    • Added a new author to the Bowtie2 package and clarified functionality in the documentation.
    • New entry added to the genome index file for improved sequence tracking.
  • Bug Fixes

    • Improved error handling for missing Bowtie2 index files and refined input validation.
  • Tests

    • Expanded test coverage for Bowtie2 alignment and Sourmash tools to ensure robustness across multiple configurations.
    • Added new test functions to validate command parameters and output paths for Bowtie2 alignments.

Copy link
Contributor

coderabbitai bot commented Dec 3, 2024

📝 Walkthrough

Walkthrough

This pull request introduces significant updates to the Bowtie2 alignment tool's environment configuration and functionality. Key changes include modifications to the Conda environment files (environment.linux-64.pin.txt, environment.yaml), enhancements in the meta.yaml file for parameter configurability, and the addition of dynamic rules in the Snakefile for alignment processing. Furthermore, the wrapper.py file has been updated to improve error handling and command execution. New tests have been added to test_wrappers.py to ensure comprehensive coverage of the new features and configurations.

Changes

File Path Change Summary
bio/bowtie2/align/environment.linux-64.pin.txt - Added new Conda version header.
- Removed several package URLs.
- Added multiple new packages including fonts and libraries.
- Updated existing package versions.
bio/bowtie2/align/environment.yaml - Updated bowtie2 version from 2.5.4 to 2.5.
- Added dependency: picard-slim =3.3.
- Updated snakemake-wrapper-utils from 0.6.2 to 0.6.
bio/bowtie2/align/meta.yaml - Added author: Jorge Langa.
- Introduced parameters: sort_program, sort_extra, sort_order.
- Updated notes section.
bio/bowtie2/align/test/Snakefile - Introduced dynamic rules for alignment based on sorting programs, orders, and file extensions.
bio/bowtie2/align/wrapper.py - Added error handling for missing Bowtie2 index files.
- Refined input sample validation.
- Expanded sorting program options.
- Updated command construction and logging.
test_wrappers.py - Added test function: test_bowtie2_align.
- Added test function: test_sourmash_compute.
- Minor formatting adjustments in existing tests.

Possibly related PRs

Suggested reviewers

  • johanneskoester
  • fgvieira

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3b96cc1 and 9d58e61.

📒 Files selected for processing (6)
  • bio/bowtie2/align/environment.linux-64.pin.txt (1 hunks)
  • bio/bowtie2/align/environment.yaml (1 hunks)
  • bio/bowtie2/align/meta.yaml (1 hunks)
  • bio/bowtie2/align/test/Snakefile (1 hunks)
  • bio/bowtie2/align/wrapper.py (3 hunks)
  • test_wrappers.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

test_wrappers.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

🪛 Ruff (0.8.0)
bio/bowtie2/align/wrapper.py

18-18: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


21-21: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


24-24: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


49-49: Undefined name snakemake

(F821)


49-49: Undefined name snakemake

(F821)


65-65: Undefined name snakemake

(F821)


103-103: Undefined name sort

(F821)


124-124: Undefined name snakemake

(F821)


125-125: Undefined name snakemake

(F821)


127-127: Undefined name snakemake

(F821)


128-128: Undefined name snakemake

(F821)


130-130: Undefined name snakemake

(F821)


132-132: Undefined name snakemake

(F821)


168-168: Undefined name snakemake

(F821)

🔇 Additional comments (5)
bio/bowtie2/align/environment.yaml (2)

Line range hint 1-4: LGTM! Well-configured channels and version pinning strategy.

The channel configuration with conda-forge and bioconda, along with nodefaults, ensures reliable and reproducible package resolution. The explicit version pinning for all dependencies is a good practice for reproducibility.


7-7: Verify picard-slim version.

Let's verify that picard-slim=3.3.0 is the latest stable version and check for any known issues.

✅ Verification successful

picard-slim version 3.3.0 is verified as the latest stable version

The verification confirms that:

  • 3.3.0 is currently the latest version available in the bioconda channel
  • Recent issues reported in the Picard repository are minor and don't indicate any critical problems with version 3.3.0
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check latest version and potential issues for picard-slim

# Get latest version from conda-forge
curl -s https://conda.anaconda.org/bioconda/noarch/repodata.json | jq -r '.packages | to_entries[] | select(.value.name=="picard-slim") | .value.version' | sort -V | tail -n 1

# Check for any reported issues in the past 6 months
gh api graphql -f query='
{
  search(query: "repo:broadinstitute/picard is:issue created:>2024-06-01", type: ISSUE, first: 5) {
    nodes {
      ... on Issue {
        title
        url
        labels(first: 5) {
          nodes {
            name
          }
        }
      }
    }
  }
}'

Length of output: 1316

test_wrappers.py (1)

2064-2079: Test function test_bowtie2_align_samtools_coordinate_extension is well-implemented

The new test function correctly iterates over all combinations of sorting programs, orders, and file extensions. The implementation is clean, and the function adheres to best practices for test definitions.

bio/bowtie2/align/environment.linux-64.pin.txt (2)

1-5: LGTM! Good practice adding Conda version.

The addition of the Conda version header improves reproducibility by documenting the exact version used to create this environment.


75-89: Verify version compatibility between tools.

The selected versions are recent and stable:

  • bowtie2 2.5.4
  • samtools 1.21
  • picard-slim 3.3.0

Good choice using picard-slim to reduce environment size. However, let's verify compatibility between these versions.

✅ Verification successful

Tools are compatible and consistently used across the repository

Based on the environment files analysis:

  • The selected versions (bowtie2 2.5.4, samtools 1.21, picard-slim 3.3.0) are consistently used together across multiple workflows in the repository
  • These versions are the current standard in the codebase, showing they are well-tested together
  • The Picard 3.3.0 release notes from earlier show no compatibility issues with samtools and includes htsjdk 4.1.3 which is compatible with samtools 1.21
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for known compatibility issues between the selected versions

# Check release notes and issues for compatibility information
gh api graphql -f query='
{
  repository(owner: "broadinstitute", name: "picard") {
    release(tagName: "3.3.0") {
      description
    }
  }
}
'

# Check if there are any reported issues between these specific versions
gh api graphql -f query='
{
  search(query: "repo:samtools/samtools repo:broadinstitute/picard is:issue samtools 1.21 picard 3.3.0", type: ISSUE) {
    nodes {
      ... on Issue {
        title
        url
        state
      }
    }
  }
}
'

Length of output: 2460


Script:

#!/bin/bash
# Let's check for any compatibility requirements or issues in the wrapper script and workflow

# Check wrapper script for version requirements
fd "bowtie2.*\.py" --exec cat {}

# Check workflow for version requirements
fd "bowtie2.*\.smk|bowtie2.*\.snakefile" -I --exec cat {}

# Check if there are any version constraints in environment files
fd "environment.*\.ya?ml" --exec grep -A5 -B5 "bowtie2\|samtools\|picard" {}

# Look for any version-specific code or comments
rg -A3 -B3 "version|Version" --glob "bowtie2*"

Length of output: 8907

bio/bowtie2/align/test/Snakefile Outdated Show resolved Hide resolved
bio/bowtie2/align/wrapper.py Outdated Show resolved Hide resolved
bio/bowtie2/align/wrapper.py Outdated Show resolved Hide resolved
bio/bowtie2/align/meta.yaml Outdated Show resolved Hide resolved
Comment on lines 8 to 75
https://conda.anaconda.org/conda-forge/noarch/font-ttf-dejavu-sans-mono-2.37-hab24e00_0.tar.bz2#0c96522c6bdaed4b1566d11387caaf45
https://conda.anaconda.org/conda-forge/noarch/font-ttf-inconsolata-3.000-h77eed37_0.tar.bz2#34893075a5c9e55cdafac56607368fc6
https://conda.anaconda.org/conda-forge/noarch/font-ttf-source-code-pro-2.038-h77eed37_0.tar.bz2#4d59c254e01d9cde7957100457e2d5fb
https://conda.anaconda.org/conda-forge/noarch/font-ttf-ubuntu-0.83-h77eed37_3.conda#49023d73832ef61042f6a237cb2687e7
https://conda.anaconda.org/conda-forge/linux-64/ld_impl_linux-64-2.43-h712a8e2_2.conda#048b02e3962f066da18efe3a21b77672
https://conda.anaconda.org/conda-forge/linux-64/python_abi-3.13-5_cp313.conda#381bbd2a92c863f640a55b6ff3c35161
https://conda.anaconda.org/conda-forge/noarch/tzdata-2024b-hc8b5060_0.conda#8ac3367aafb1cc0a068483c580af8015
https://conda.anaconda.org/conda-forge/noarch/fonts-conda-forge-1-0.tar.bz2#f766549260d6815b0c52253f1fb1bb29
https://conda.anaconda.org/conda-forge/linux-64/libgomp-14.2.0-h77fa898_1.conda#cc3573974587f12dda90d96e3e55a702
https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-2_gnu.tar.bz2#73aaf86a425cc6e73fcf236a5a46396d
https://conda.anaconda.org/conda-forge/linux-64/libgcc-14.1.0-h77fa898_1.conda#002ef4463dd1e2b44a94a4ace468f5d2
https://conda.anaconda.org/conda-forge/linux-64/libexpat-2.6.3-h5888daf_0.conda#59f4c43bb1b5ef1c71946ff2cbf59524
https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-14.1.0-h69a702a_1.conda#1efc0ad219877a73ef977af7dbb51f17
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-14.1.0-hc0a3c3a_1.conda#9dbb9699ea467983ba8a4ba89b08b066
https://conda.anaconda.org/conda-forge/linux-64/openssl-3.3.2-hb9d3cd8_0.conda#4d638782050ab6faa27275bed57e9b4e
https://conda.anaconda.org/conda-forge/noarch/fonts-conda-ecosystem-1-0.tar.bz2#fee5683a3f04bd15cbd8318b096a27ab
https://conda.anaconda.org/conda-forge/linux-64/libgcc-14.2.0-h77fa898_1.conda#3cb76c3f10d3bc7f1105b2fc9db984df
https://conda.anaconda.org/conda-forge/linux-64/alsa-lib-1.2.13-hb9d3cd8_0.conda#ae1370588aa6a5157c34c73e9bbb36a0
https://conda.anaconda.org/conda-forge/linux-64/c-ares-1.34.3-hb9d3cd8_1.conda#ee228789a85f961d14567252a03e725f
https://conda.anaconda.org/conda-forge/linux-64/libexpat-2.6.4-h5888daf_0.conda#db833e03127376d461e1e13e76f09b6c
https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-14.2.0-h69a702a_1.conda#e39480b9ca41323497b05492a63bc35b
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-14.2.0-hc0a3c3a_1.conda#234a5554c53625688d51062645337328
https://conda.anaconda.org/conda-forge/linux-64/libzlib-1.3.1-hb9d3cd8_2.conda#edb0dca6bc32e4f4789199455a1dbeb8
https://conda.anaconda.org/conda-forge/linux-64/openssl-3.4.0-hb9d3cd8_0.conda#23cc74f77eb99315c0360ec3533147a9
https://conda.anaconda.org/conda-forge/linux-64/pthread-stubs-0.4-hb9d3cd8_1002.conda#b3c17d95b5a10c6e64a21fa17573e70e
https://conda.anaconda.org/conda-forge/linux-64/xorg-libice-1.1.1-hb9d3cd8_1.conda#19608a9656912805b2b9a2f6bd257b04
https://conda.anaconda.org/conda-forge/linux-64/xorg-libxau-1.0.11-hb9d3cd8_1.conda#77cbc488235ebbaab2b6e912d3934bae
https://conda.anaconda.org/conda-forge/linux-64/xorg-libxdmcp-1.1.5-hb9d3cd8_0.conda#8035c64cb77ed555e3f150b7b3972480
https://conda.anaconda.org/conda-forge/linux-64/xorg-xorgproto-2024.1-hb9d3cd8_1.conda#7c21106b851ec72c037b162c216d8f05
https://conda.anaconda.org/conda-forge/linux-64/bzip2-1.0.8-h4bc722e_7.conda#62ee74e96c5ebb0af99386de58cf9553
https://conda.anaconda.org/conda-forge/linux-64/c-ares-1.33.1-heb4867d_0.conda#0d3c60291342c0c025db231353376dfb
https://conda.anaconda.org/conda-forge/linux-64/giflib-5.2.2-hd590300_0.conda#3bf7b9fd5a7136126e0234db4b87c8b6
https://conda.anaconda.org/conda-forge/linux-64/keyutils-1.6.1-h166bdaf_0.tar.bz2#30186d27e2c9fa62b45fb1476b7200e3
https://conda.anaconda.org/conda-forge/linux-64/libdeflate-1.20-hd590300_0.conda#8e88f9389f1165d7c0936fe40d9a9a79
https://conda.anaconda.org/conda-forge/linux-64/libdeflate-1.21-h4bc722e_0.conda#36ce76665bf67f5aac36be7a0d21b7f3
https://conda.anaconda.org/conda-forge/linux-64/libev-4.33-hd590300_2.conda#172bf1cd1ff8629f2b1179945ed45055
https://conda.anaconda.org/conda-forge/linux-64/libffi-3.4.2-h7f98852_5.tar.bz2#d645c6d2ac96843a2bfaccd2d62b3ac3
https://conda.anaconda.org/conda-forge/linux-64/libnsl-2.0.1-hd590300_0.conda#30fd6e37fe21f86f4bd26d6ee73eeec7
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-ng-14.1.0-h4852527_1.conda#bd2598399a70bb86d8218e95548d735e
https://conda.anaconda.org/conda-forge/linux-64/libiconv-1.17-hd590300_2.conda#d66573916ffcf376178462f1b61c941e
https://conda.anaconda.org/conda-forge/linux-64/libjpeg-turbo-3.0.0-hd590300_1.conda#ea25936bb4080d843790b586850f82b8
https://conda.anaconda.org/conda-forge/linux-64/libmpdec-4.0.0-h4bc722e_0.conda#aeb98fdeb2e8f25d43ef71fbacbeec80
https://conda.anaconda.org/conda-forge/linux-64/libpng-1.6.44-hadc24fc_0.conda#f4cc49d7aa68316213e4b12be35308d1
https://conda.anaconda.org/conda-forge/linux-64/libsqlite-3.47.0-hadc24fc_1.conda#b6f02b52a174e612e89548f4663ce56a
https://conda.anaconda.org/conda-forge/linux-64/libssh2-1.11.1-hf672d98_0.conda#be2de152d8073ef1c01b7728475f2fe7
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-ng-14.2.0-h4852527_1.conda#8371ac6457591af2cf6159439c1fd051
https://conda.anaconda.org/conda-forge/linux-64/libuuid-2.38.1-h0b41bf4_0.conda#40b61aab5c7ba9ff276c41cfffe6b80b
https://conda.anaconda.org/conda-forge/linux-64/libwebp-base-1.4.0-hd590300_0.conda#b26e8aa824079e1be0294e7152ca4559
https://conda.anaconda.org/conda-forge/linux-64/libxcb-1.17.0-h8a09558_0.conda#92ed62436b625154323d40d5f2f11dd7
https://conda.anaconda.org/conda-forge/linux-64/libxcrypt-4.4.36-hd590300_1.conda#5aa797f8787fe7a17d1b0821485b5adc
https://conda.anaconda.org/conda-forge/linux-64/libzlib-1.3.1-h4ab18f5_1.conda#57d7dc60e9325e3de37ff8dffd18e814
https://conda.anaconda.org/conda-forge/linux-64/ncurses-6.5-he02047a_1.conda#70caf8bb6cf39a0b6b7efc885f51c0fe
https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.13-noxft_h4845f30_101.conda#d453b98d9c83e71da0741bb0ff4d76bc
https://conda.anaconda.org/conda-forge/linux-64/xz-5.2.6-h166bdaf_0.tar.bz2#2161070d867d1b1204ea749c8eec4ef0
https://conda.anaconda.org/conda-forge/linux-64/zlib-1.3.1-hb9d3cd8_2.conda#c9f075ab2f33b3bbee9e62d4ad0a6cd8
https://conda.anaconda.org/conda-forge/linux-64/freetype-2.12.1-h267a509_2.conda#9ae35c3d96db2c94ce0cef86efdfa2cb
https://conda.anaconda.org/conda-forge/linux-64/graphite2-1.3.13-h59595ed_1003.conda#f87c7b7c2cb45f323ffbce941c78ab7c
https://conda.anaconda.org/conda-forge/linux-64/icu-75.1-he02047a_0.conda#8b189310083baabfb622af68fd9d3ae3
https://conda.anaconda.org/conda-forge/linux-64/lerc-4.0.0-h27087fc_0.tar.bz2#76bbff344f0134279f225174e9064c8f
https://conda.anaconda.org/conda-forge/linux-64/libedit-3.1.20191231-he28a2e2_2.tar.bz2#4d331e44109e3f0e19b4cb8f9b82f3e1
https://conda.anaconda.org/conda-forge/linux-64/libnghttp2-1.58.0-h47da74e_1.conda#700ac6ea6d53d5510591c4344d5c989a
https://conda.anaconda.org/conda-forge/linux-64/libsqlite-3.46.1-hadc24fc_0.conda#36f79405ab16bf271edb55b213836dac
https://conda.anaconda.org/conda-forge/linux-64/libssh2-1.11.0-h0841786_0.conda#1f5a58e686b13bcfde88b93f547d23fe
https://conda.anaconda.org/conda-forge/linux-64/libnghttp2-1.64.0-h161d5f1_0.conda#19e57602824042dfd0446292ef90488b
https://conda.anaconda.org/conda-forge/linux-64/pcre2-10.44-hba22ea6_2.conda#df359c09c41cd186fffb93a2d87aa6f5
https://conda.anaconda.org/conda-forge/linux-64/perl-5.32.1-7_hd590300_perl5.conda#f2cfec9406850991f4e3d960cc9e3321
https://conda.anaconda.org/conda-forge/linux-64/pixman-0.43.2-h59595ed_0.conda#71004cbf7924e19c02746ccde9fd7123
https://conda.anaconda.org/conda-forge/linux-64/readline-8.2-h8228510_1.conda#47d31b792659ce70f470b5c82fdfb7a4
https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.13-noxft_h4845f30_101.conda#d453b98d9c83e71da0741bb0ff4d76bc
https://conda.anaconda.org/conda-forge/linux-64/xorg-libsm-1.2.4-he73a12e_1.conda#05a8ea5f446de33006171a7afe6ae857
https://conda.anaconda.org/conda-forge/linux-64/xorg-libx11-1.8.10-h4f16b4b_0.conda#0b666058a179b744a622d0a4a0c56353
https://conda.anaconda.org/conda-forge/linux-64/zstd-1.5.6-ha6fb4c9_0.conda#4d056880988120e29d75bfff282e0f45
https://conda.anaconda.org/conda-forge/linux-64/fontconfig-2.15.0-h7e30c49_1.conda#8f5b0b297b59e1ac160ad4beec99dbee
https://conda.anaconda.org/conda-forge/linux-64/krb5-1.21.3-h659f571_0.conda#3f43953b7d3fb3aaa1d0d0723d91e368
https://conda.anaconda.org/conda-forge/linux-64/python-3.12.5-h2ad013b_0_cpython.conda#9c56c4df45f6571b13111d8df2448692
https://conda.anaconda.org/conda-forge/linux-64/libglib-2.82.2-h2ff4ddf_0.conda#13e8e54035ddd2b91875ba399f0f7c04
https://conda.anaconda.org/conda-forge/linux-64/libtiff-4.7.0-h6565414_0.conda#80eaf80d84668fa5620ac9ec1b4bf56f
https://conda.anaconda.org/conda-forge/linux-64/python-3.13.0-h9ebbce0_101_cp313.conda#f4fea9d5bb3f2e61a39950a7ab70ee4e
https://conda.anaconda.org/conda-forge/linux-64/xorg-libxext-1.3.6-hb9d3cd8_0.conda#febbab7d15033c913d53c7a2c102309d
https://conda.anaconda.org/conda-forge/linux-64/xorg-libxfixes-6.0.1-hb9d3cd8_0.conda#4bdb303603e9821baf5fe5fdff1dc8f8
https://conda.anaconda.org/conda-forge/linux-64/xorg-libxrender-0.9.11-hb9d3cd8_1.conda#a7a49a8b85122b49214798321e2e96b4
https://conda.anaconda.org/conda-forge/linux-64/xorg-libxt-1.3.1-hb9d3cd8_0.conda#279b0de5f6ba95457190a1c459a64e31
https://conda.anaconda.org/bioconda/linux-64/bowtie2-2.5.4-h7071971_4.tar.bz2#69822858766e6c8b12ae90d78d54d8ea
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider optimizing dependencies for CLI usage.

The environment includes several GUI-related packages that may be unnecessary for a command-line bioinformatics tool:

  • Font packages (dejavu, inconsolata, ubuntu, etc.)
  • X11 libraries (xorg-*)
  • Graphics libraries (cairo, pixman)

These dependencies could increase the environment size and installation time without providing value for the core functionality.

Consider removing these packages if they're not direct requirements for Bowtie2, Samtools, or Picard.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (6)
bio/bowtie2/align/environment.linux-64.pin.txt (1)

16-55: Consider optimizing system library dependencies.

The environment includes a comprehensive set of system libraries. While these are likely pulled in as dependencies, consider:

  1. Evaluating if all these libraries are necessary for the core functionality
  2. Testing if any can be marked as optional dependencies
bio/bowtie2/align/wrapper.py (2)

42-45: Fix typo in error message

There's a typo in the error message: "declarad" should be "declared".

-        f"Missing required indices: {missing_idx} declarad as input.idx.\n"
+        f"Missing required indices: {missing_idx} declared as input.idx.\n"

171-179: Consider resource management optimization

The wrapper combines alignment and sorting in a single step using pipes. While this is memory-efficient, it could benefit from some resource management considerations:

  1. For large datasets, consider adding a parameter to control the memory usage of sorting tools (e.g., --memory for samtools, MAX_RECORDS_IN_RAM for Picard)
  2. Consider adding progress reporting for long-running operations

Would you like help implementing these resource management optimizations?

bio/bowtie2/align/meta.yaml (1)

Line range hint 29-31: Enhance documentation for new sorting parameters

The notes section should document the new sorting functionality and parameters.

Add documentation for the new parameters:

  * The `extra` param allows for additional arguments for bowtie2.
+ * The `sort_program` param enables output sorting using either 'samtools' or 'picard'. Default is 'none'.
+ * The `sort_extra` param allows passing additional arguments to the selected sorting program.
+ * The `sort_order` param specifies the sorting order: 'coordinate' (default) or 'queryname'.
  * This wrapper uses an inner pipe. Make sure to use at least two threads in your Snakefile.
bio/bowtie2/align/test/Snakefile (2)

140-140: Consider dynamic thread allocation for sorting

The current fixed thread count might not be optimal for all sorting operations. Consider adjusting threads based on the sorting program.

-               threads: 8  # Use at least two threads
+               threads: lambda wildcards, input: 16 if wildcards.program != "none" else 8  # Increase threads for sorting operations

138-139: Enhance log file structure

Consider organizing log files in subdirectories based on sorting configuration for better maintainability.

-               log:
-                   f"logs/bowtie2/{{sample}}.{program}_{order}_{extension}.log",
+               log:
+                   f"logs/bowtie2/{program}/{order}/{{sample}}.{extension}.log",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9d58e61 and 734b1bb.

📒 Files selected for processing (5)
  • bio/bowtie2/align/environment.linux-64.pin.txt (1 hunks)
  • bio/bowtie2/align/environment.yaml (1 hunks)
  • bio/bowtie2/align/meta.yaml (2 hunks)
  • bio/bowtie2/align/test/Snakefile (1 hunks)
  • bio/bowtie2/align/wrapper.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • bio/bowtie2/align/environment.yaml
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.0)
bio/bowtie2/align/wrapper.py

18-18: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


21-21: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


24-24: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


49-49: Undefined name snakemake

(F821)


49-49: Undefined name snakemake

(F821)


65-65: Undefined name snakemake

(F821)


124-124: Undefined name snakemake

(F821)


125-125: Undefined name snakemake

(F821)


127-127: Undefined name snakemake

(F821)


128-128: Undefined name snakemake

(F821)


130-130: Undefined name snakemake

(F821)


132-132: Undefined name snakemake

(F821)


168-168: Undefined name snakemake

(F821)

🔇 Additional comments (7)
bio/bowtie2/align/environment.linux-64.pin.txt (2)

8-11: Consider optimizing dependencies for CLI usage.

The environment includes several GUI-related packages that may be unnecessary for a command-line bioinformatics tool:

  • Font packages (dejavu, inconsolata, ubuntu, etc.)
  • X11 libraries (xorg-*)
  • Graphics libraries (cairo, pixman)

These dependencies could increase the environment size and installation time without providing value for the core functionality.

Also applies to: 56-78


79-79: Verify compatibility between bioinformatics tools.

The core bioinformatics tools are present with appropriate versions:

  • bowtie2 2.5.4
  • samtools 1.21
  • picard-slim 3.3.0

Please verify that these specific versions are compatible with each other for the sorting functionality, particularly when using Picard for query name sorting.

Consider documenting the version compatibility matrix in the wrapper's documentation to help users understand which versions work well together.

Also applies to: 89-89, 92-92, 93-93

✅ Verification successful

Based on the search results, I can now generate a final response:

The specified versions of bowtie2, samtools, and picard are compatible and commonly used together

The codebase shows extensive integration testing and usage of these tools together:

  • Multiple test files demonstrate bowtie2 2.5.x working with samtools 1.x and picard 3.x
  • No compatibility issues or errors are documented between these versions
  • The tools are frequently used together in various workflows and pipelines
  • The version combinations are actively tested in the CI pipeline
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify compatibility between tool versions

# Check if there are any known compatibility issues between these versions
gh api graphql -f query='
{
  search(query: "repo:broadinstitute/picard is:issue \"bowtie2 2.5.4\" OR \"samtools 1.21\"", type: ISSUE) {
    nodes {
      ... on Issue {
        title
        url
        state
      }
    }
  }
}'

# Check if there are any reported issues with these specific versions
rg -l "compatibility|issue|error" | xargs rg -A 5 "bowtie2.*2\.5\.4|samtools.*1\.21|picard.*3\.3\.0"

Length of output: 636


Script:

#!/bin/bash
# Let's try a different approach to verify compatibility

# 1. Check for any configuration files or workflow files that might show these tools working together
fd -e yaml -e yml -e config -e json | xargs rg -l "bowtie2|samtools|picard"

# 2. Look for any documented version constraints or requirements
fd "README|CHANGELOG|requirements" | xargs rg -i "version|compatibility|require"

# 3. Check for any test files that might use these tools together
fd -e test -e spec | xargs rg -l "bowtie2|samtools|picard"

# 4. Look for workflow definitions that might show tool interactions
fd "workflow|pipeline|snakefile|nextflow" -i | xargs rg -l "bowtie2|samtools|picard"

# 5. Check if there are any documented issues in issue templates or docs
fd "ISSUE_TEMPLATE|docs" -i | xargs rg -i "compatibility|version"

Length of output: 34600

bio/bowtie2/align/wrapper.py (2)

103-103: ⚠️ Potential issue

Fix undefined variable sort in error message

The error message uses undefined variable sort instead of sort_program.

-    raise ValueError(f"Unexpected value for params.sort ({sort})")
+    raise ValueError(f"Unexpected value for params.sort_program ({sort_program})")

86-91: ⚠️ Potential issue

Clarify thread allocation error message for Picard

The error message for Picard thread allocation is unclear. It should explain that Picard requires 2 threads minimum - one for bowtie2 and one for Picard.

     bowtie2_threads = bowtie2_threads - 1
     if bowtie2_threads <= 0:
         raise ValueError(
-            "Not enough threads requested. This wrapper requires exactly one more."
+            "Not enough threads requested. When using Picard, at least 2 threads are required (1 for bowtie2, 1 for Picard)."
         )
bio/bowtie2/align/meta.yaml (2)

8-8: LGTM: Author addition

Addition of Jorge Langa as an author is appropriate given the contributions.


25-27: LGTM: New sorting parameters

The new parameters align well with the PR objectives:

  • sort_program: Enables sorting with samtools or picard
  • sort_extra: Allows customization of sorting behavior
  • sort_order: Provides coordinate/queryname sorting options
bio/bowtie2/align/test/Snakefile (1)

106-109: LGTM: Comprehensive dynamic rule generation

The nested loops effectively generate rules for all possible combinations of sorting configurations.

Verify the generated rule combinations:

✅ Verification successful

LGTM: Dynamic rule generation is correctly implemented

The nested loops generate 18 distinct rules (2 orders × 3 extensions × 3 programs) with proper:

  • Naming convention: bowtie2_align_{program}_{order}_{extension}
  • Required inputs including reference genome for CRAM output
  • Appropriate parameters for sorting configuration
  • Structured log file naming that reflects the configuration

The implementation ensures comprehensive coverage of all sorting configurations while maintaining consistent rule structure.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if all expected rule combinations are generated

# List all rules in the Snakefile
snakemake --list-rules 2>/dev/null | grep "bowtie2_align_" | sort

Length of output: 53


Script:

#!/bin/bash
# Let's examine the Snakefile content directly to verify the rule generation
cat bio/bowtie2/align/test/Snakefile

Length of output: 4074


Script:

#!/bin/bash
# Search for rule definitions in the Snakefile
rg "rule bowtie2_align_" bio/bowtie2/align/test/Snakefile -A 5

Length of output: 64


Script:

#!/bin/bash
# Check the full context around the loops
rg -B 5 -A 10 'for order in \["coordinate", "queryname"\]' bio/bowtie2/align/test/Snakefile

Length of output: 614

bio/bowtie2/align/wrapper.py Outdated Show resolved Hide resolved
Comment on lines 124 to 126
ref="genome.fasta",
output:
f"mapped_idx/{{sample}}.{program}_{order}.{extension}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Based on the analysis of both wrapper.py and the test Snakefile, I can now provide the final response:

CRAM output handling needs improvement

The wrapper doesn't explicitly validate the presence of the reference genome for CRAM output. While the test Snakefile correctly includes ref="genome.fasta" in CRAM-related rules, the wrapper.py should:

  • Validate reference presence when output is CRAM format
  • Properly pass the reference to samtools for CRAM generation

Specific issues:

  • In wrapper.py, line 54-55: The CRAM conversion lacks -T reference parameter in the samtools view command
  • No validation check for reference input when CRAM output is detected
🔗 Analysis chain

Verify CRAM output handling

The reference genome is required for CRAM output. Ensure proper error handling when ref is missing.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if wrapper.py handles missing reference for CRAM output

# Search for CRAM-related error handling
rg -A 5 "\.cram" bio/bowtie2/align/wrapper.py

Length of output: 47


Script:

#!/bin/bash
# Let's check the wrapper.py content to understand how output formats are handled
cat bio/bowtie2/align/wrapper.py

Length of output: 5582


Script:

#!/bin/bash
# Let's check if there's any reference handling for CRAM output in the Snakefile
cat bio/bowtie2/align/test/Snakefile

Length of output: 4074

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
bio/bowtie2/align/wrapper.py (1)

41-47: Fix typo in error message

There's a typo in the error message: "declarad" should be "declared".

-        f"Missing required indices: {missing_idx} declarad as input.idx.\n"
+        f"Missing required indices: {missing_idx} declared as input.idx.\n"
🧰 Tools
🪛 Ruff (0.8.0)

41-41: Undefined name snakemake

(F821)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 734b1bb and eac0db4.

📒 Files selected for processing (1)
  • bio/bowtie2/align/wrapper.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.0)
bio/bowtie2/align/wrapper.py

18-18: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


21-21: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


24-24: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


31-31: Undefined name snakemake

(F821)


36-36: Undefined name snakemake

(F821)


41-41: Undefined name snakemake

(F821)


51-51: Undefined name snakemake

(F821)


51-51: Undefined name snakemake

(F821)


67-67: Undefined name snakemake

(F821)


126-126: Undefined name snakemake

(F821)


127-127: Undefined name snakemake

(F821)


129-129: Undefined name snakemake

(F821)


130-130: Undefined name snakemake

(F821)


132-132: Undefined name snakemake

(F821)


134-134: Undefined name snakemake

(F821)


170-170: Undefined name snakemake

(F821)

🔇 Additional comments (6)
bio/bowtie2/align/wrapper.py (6)

1-12: LGTM: Appropriate imports for the new functionality

The imports are well-organized and include necessary modules for the new sorting capabilities.


13-32: LGTM: Well-structured initialization with proper thread validation

The initialization code properly defines required indices and handles thread allocation with appropriate validation.

🧰 Tools
🪛 Ruff (0.8.0)

18-18: Undefined name snakemake

(F821)


19-19: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


21-21: Undefined name snakemake

(F821)


22-22: Undefined name snakemake

(F821)


24-24: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


31-31: Undefined name snakemake

(F821)


50-59: LGTM: Comprehensive input validation

The input validation properly checks for valid sample counts and sort order values.

🧰 Tools
🪛 Ruff (0.8.0)

51-51: Undefined name snakemake

(F821)


51-51: Undefined name snakemake

(F821)


88-93: Clarify Picard thread requirement message

The error message doesn't clearly explain that Picard requires at least 2 threads total, with 1 thread reserved for Picard.

-            "Not enough threads requested. This wrapper requires exactly one more."
+            "Not enough threads requested. Picard sorting requires at least 2 threads (1 for Bowtie2, 1 for Picard)."

104-105: Fix variable name in error message

The error message uses sort instead of sort_program.

-    raise ValueError(f"Unexpected value for params.sort ({sort_program})")
+    raise ValueError(f"Unexpected value for params.sort_program ({sort_program})")

173-181: LGTM: Clean command construction with proper temporary directory handling

The execution logic properly handles temporary directories and constructs the command with all necessary parameters.

- bowtie2 =2.5.4
- picard-slim =3.3.0
- bowtie2 =2.5
- picard-slim =3.3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- picard-slim =3.3
- picard-slim =3.3.0

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
bio/bowtie2/align/wrapper.py (4)

14-23: Add edge case handling to get_extension function

The function should handle edge cases such as empty filenames or files without extensions to prevent potential runtime errors.

Consider this improved implementation:

 def get_extension(filename: str) -> str:
     """
     Return file format since Bowtie2 reads files that
     could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
     """
+    if not filename:
+        raise ValueError("Empty filename")
+    parts = filename.split(".")
+    if len(parts) < 2:
+        raise ValueError(f"No extension found in filename: {filename}")
     if filename.endswith((".gz", ".bz2")):
-        return filename.split(".")[-2].lower()
+        return parts[-2].lower() if len(parts) > 2 else ""
-    return filename.split(".")[-1].lower()
+    return parts[-1].lower()

45-47: Fix typo in error message

There's a typo in the error message: "declarad" should be "declared".

-        f"Missing required indices: {missing_idx} declarad as input.idx.\n"
+        f"Missing required indices: {missing_idx} declared as input.idx.\n"

155-183: Consider using constants for repeated string literals

The code uses string literals like "none", "samtools", "picard" multiple times. Consider defining these as constants at the module level to improve maintainability.

+# Constants for sort programs
+SORT_PROGRAM_NONE = "none"
+SORT_PROGRAM_SAMTOOLS = "samtools"
+SORT_PROGRAM_PICARD = "picard"
+
 match sort_program:
-    case "none":
+    case SORT_PROGRAM_NONE:
         # ...
-    case "samtools":
+    case SORT_PROGRAM_SAMTOOLS:
         # ...
-    case "picard":
+    case SORT_PROGRAM_PICARD:
         # ...

111-116: Consider extracting CRAM validation to a separate function

The CRAM validation logic could be moved to a separate function to improve code organization and reusability.

+def validate_cram_requirements(ref: str | None, ref_fai: str | None) -> None:
+    """Validate that reference files are provided for CRAM output."""
+    if ref is None or ref_fai is None:
+        raise ValueError(
+            "Reference file and index are required for CRAM output. "
+            "Please specify them as input.ref and input.ref_fai."
+        )
+
 if bam_extension == "cram":
-    if REF is None or REF_FAI is None:
-        raise ValueError(
-            "Reference file and index are required for CRAM output."
-            "Please specify them as input.ref and input.ref_fai."
-        )
+    validate_cram_requirements(REF, REF_FAI)
🧰 Tools
🪛 Ruff (0.8.0)

111-112: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between eac0db4 and 9c9c918.

📒 Files selected for processing (1)
  • bio/bowtie2/align/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.0)
bio/bowtie2/align/wrapper.py

26-26: Undefined name snakemake

(F821)


36-36: Undefined name snakemake

(F821)


41-41: Undefined name snakemake

(F821)


51-51: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)


58-58: Undefined name snakemake

(F821)


59-59: Undefined name snakemake

(F821)


60-60: Undefined name snakemake

(F821)


61-61: Undefined name snakemake

(F821)


62-62: Undefined name snakemake

(F821)


63-63: Undefined name snakemake

(F821)


69-69: Undefined name snakemake

(F821)


72-72: Undefined name snakemake

(F821)


73-73: Undefined name snakemake

(F821)


79-79: Undefined name snakemake

(F821)


80-80: Undefined name snakemake

(F821)


81-81: Undefined name snakemake

(F821)


82-82: Undefined name snakemake

(F821)


83-83: Undefined name snakemake

(F821)


85-85: Undefined name snakemake

(F821)


87-87: Undefined name snakemake

(F821)


111-112: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)


124-127: Use ternary operator CMD_INPUT = f"--interleaved {SAMPLE}" if interleaved else f"-U {SAMPLE}" instead of if-else-block

Replace if-else-block with CMD_INPUT = f"--interleaved {SAMPLE}" if interleaved else f"-U {SAMPLE}"

(SIM108)


197-197: Undefined name snakemake

(F821)

🔇 Additional comments (2)
bio/bowtie2/align/wrapper.py (2)

1-13: LGTM! Imports are appropriate for the new functionality

The imports are well-organized and include all necessary modules for the enhanced sorting capabilities.


72-74: Ensure proper thread allocation for sorting

The current implementation could result in 0 threads for sorting if snakemake.threads is 1.

🧰 Tools
🪛 Ruff (0.8.0)

72-72: Undefined name snakemake

(F821)


73-73: Undefined name snakemake

(F821)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
bio/bowtie2/align/test/Snakefile (2)

107-144: Consider validating program and order combinations

The current implementation allows all combinations of program and order, but some might not make sense (e.g., program="none" with order="coordinate"/"queryname"). Consider:

  1. Adding documentation about valid combinations
  2. Implementing validation logic to skip invalid combinations

Example improvement:

# Define valid combinations
VALID_COMBINATIONS = {
    "none": [""],  # no sorting
    "samtools": ["coordinate", "queryname"],
    "picard": ["coordinate", "queryname"]
}

for program, valid_orders in VALID_COMBINATIONS.items():
    for order in valid_orders:
        for extension in ["sam", "bam", "cram"]:
            # Rule definition here

129-134: Clean up commented output parameters

The commented output parameters appear to be unused. If they're not needed, consider removing them for better code clarity.

                output:
                    f"mapped_idx/{{sample}}.{program}_{order}.{extension}",
-                    # idx="",
-                    # metrics="",
-                    # unaligned="",
-                    # unpaired="",
-                    # unconcordant="",
-                    # concordant="",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9c9c918 and badae1c.

📒 Files selected for processing (2)
  • bio/bowtie2/align/test/Snakefile (2 hunks)
  • bio/bowtie2/align/test/genome.fasta.fai (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • bio/bowtie2/align/test/genome.fasta.fai
🔇 Additional comments (2)
bio/bowtie2/align/test/Snakefile (2)

89-89: LGTM: Proper CRAM reference handling

The addition of the FASTA index requirement (ref_fai) is correct and necessary for CRAM output handling. This change properly addresses the previous review feedback about CRAM reference validation.


107-109: LGTM: Comprehensive implementation of sorting options

The nested loop structure effectively generates test rules for all combinations of sorting programs (none/samtools/picard), orders (coordinate/queryname), and output formats (sam/bam/cram).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
bio/bowtie2/align/wrapper.py (3)

15-24: Consider handling additional edge cases in get_extension

While the function works correctly for standard cases, consider handling these edge cases:

  1. Files without extensions
  2. Files with multiple dots in the name
  3. Case-insensitive extension comparison
 def get_extension(filename: str) -> str:
     """
     Return file format since Bowtie2 reads files that
     could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
     """
+    if not filename or "." not in filename:
+        raise ValueError("Invalid filename: must have an extension")
     if filename.endswith((".gz", ".bz2")):
-        return filename.split(".")[-2].lower()
+        parts = filename.lower().split(".")
+        return parts[-2] if len(parts) > 2 else ""
-    return filename.split(".")[-1].lower()
+    return filename.lower().split(".")[-1]

174-176: Enhance CRAM output validation

When using CRAM output with samtools, consider validating that the reference file exists and is readable.

         if bam_extension == "cram":
+            if not path.isfile(REF):
+                raise ValueError(f"Reference file not found or not readable: {REF}")
             samtools_opts += f" --reference {REF} "

205-210: Consider using a list for command construction

The command construction could be more maintainable by using a list of command parts.

-    shell(
-        "( bowtie2"
-        " --threads {bowtie2_threads}"
-        " {CMD_INPUT} "
-        " -x {index}"
-        " {extra}"
-        " " + PIPE_CMD + ") {LOG}"
-    )
+    cmd_parts = [
+        "(",
+        "bowtie2",
+        f"--threads {bowtie2_threads}",
+        CMD_INPUT,
+        f"-x {index}",
+        extra,
+        PIPE_CMD,
+        f") {LOG}"
+    ]
+    shell(" ".join(cmd_parts))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between badae1c and 36b6145.

📒 Files selected for processing (1)
  • bio/bowtie2/align/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.0)
bio/bowtie2/align/wrapper.py

27-27: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


45-45: Undefined name snakemake

(F821)


55-55: Undefined name snakemake

(F821)


58-58: Undefined name snakemake

(F821)


62-62: Undefined name snakemake

(F821)


63-63: Undefined name snakemake

(F821)


64-64: Undefined name snakemake

(F821)


65-65: Undefined name snakemake

(F821)


66-66: Undefined name snakemake

(F821)


67-67: Undefined name snakemake

(F821)


73-73: Undefined name snakemake

(F821)


76-76: Undefined name snakemake

(F821)


77-77: Undefined name snakemake

(F821)


81-81: Undefined name snakemake

(F821)


82-82: Undefined name snakemake

(F821)


83-83: Undefined name snakemake

(F821)


84-84: Undefined name snakemake

(F821)


85-85: Undefined name snakemake

(F821)


87-87: Undefined name snakemake

(F821)


89-89: Undefined name snakemake

(F821)


126-129: Use ternary operator CMD_INPUT = f"--interleaved {SAMPLE}" if interleaved else f"-U {SAMPLE}" instead of if-else-block

Replace if-else-block with CMD_INPUT = f"--interleaved {SAMPLE}" if interleaved else f"-U {SAMPLE}"

(SIM108)


201-201: Undefined name snakemake

(F821)

🔇 Additional comments (3)
bio/bowtie2/align/wrapper.py (3)

1-14: LGTM! Imports and header are well-organized.

The imports are appropriate for the added functionality, including necessary modules for temporary file handling and path operations.


49-50: Fix typo in error message

The word "declarad" should be "declared" in the error message.

-        f"Missing required indices: {missing_idx} declarad as input.idx.\n"
+        f"Missing required indices: {missing_idx} declared as input.idx.\n"

76-78: Add early thread validation

The thread validation should occur before thread allocation to prevent negative thread counts.

+if snakemake.threads < 2:
+    raise ValueError("This wrapper requires at least 2 threads")
 bowtie2_threads = snakemake.threads
 sort_threads = snakemake.threads - 1
🧰 Tools
🪛 Ruff (0.8.0)

76-76: Undefined name snakemake

(F821)


77-77: Undefined name snakemake

(F821)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
bio/bowtie2/align/test/Snakefile (2)

136-136: Consider making threads configurable

The hardcoded thread count of 8 might not be optimal for all environments. Consider making this configurable through the config file or workflow resources.

-                threads: 8  # Use at least two threads
+                threads: config.get("bowtie2_threads", 8)  # Configurable with default

132-132: Document sort_extra parameter usage

The sort_extra parameter is included but empty. Please add a comment documenting its purpose and expected values for different sorting programs (samtools/picard).

-                    sort_extra="",
+                    sort_extra="",  # Additional arguments for sort program (e.g., "-m 4G" for samtools, "MAX_RECORDS_IN_RAM=1000000" for Picard)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 36b6145 and ceba80f.

📒 Files selected for processing (1)
  • bio/bowtie2/align/test/Snakefile (2 hunks)
🔇 Additional comments (1)
bio/bowtie2/align/test/Snakefile (1)

107-109: LGTM! Well-structured dynamic rule generation

The nested loop structure efficiently generates all necessary combinations of sorting orders, output formats, and processing programs. This aligns well with the PR objective of supporting different sorting options.

Comment on lines 125 to 126
ref="genome.fasta",
ref_fai="genome.fasta.fai",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Conditionally include reference inputs

The reference inputs (ref and ref_fai) are currently always required, but they're only needed for CRAM output. Consider making these conditional based on the output format.

# Example approach using a lambda function in input
input:
    sample=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
    idx=multiext(...),
    **lambda wildcards: {
        "ref": "genome.fasta",
        "ref_fai": "genome.fasta.fai"
    } if extension == "cram" else {}

@@ -86,6 +86,7 @@
".rev.2.bt2",
),
ref="genome.fasta",
ref_fai="genome.fasta.fai",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make the reference index path configurable

The hardcoded path "genome.fasta.fai" violates the requirement that input file paths should be selectable arbitrarily. Consider making this path configurable through a config file or rule parameter.

-        ref_fai="genome.fasta.fai",
+        ref_fai=config["ref_fai"],  # or another configurable source

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
bio/bowtie2/align/wrapper.py (1)

46-48: ⚠️ Potential issue

Improve thread allocation logic

The current thread allocation might lead to issues when snakemake.threads is 1.

Add validation before thread allocation:

 THREADS = snakemake.threads
+if THREADS < 2:
+    raise ValueError("This wrapper requires at least 2 threads")
 sort_threads = snakemake.threads - 1
🧰 Tools
🪛 Ruff (0.8.2)

46-46: Undefined name snakemake

(F821)


47-47: Undefined name snakemake

(F821)

🧹 Nitpick comments (6)
bio/bowtie2/align/meta.yaml (1)

17-21: Consider documenting optional outputs in notes section

The commented optional output paths could be moved to the notes section to maintain cleaner YAML structure while preserving the documentation of available outputs.

output:
  - SAM/BAM/CRAM file. This must be the first output file in the list.
  - idx: Optional path to bam index.
-  # - metrics: Optional path to metrics file.
-  # - unaligned: Optional path to unaligned unpaired reads.
-  # - unpaired: Optional path to unpaired reads that aligned at least once.
-  # - unconcordant: Optional path to pairs that didn't align concordantly.
-  # - concordant: Optional path to pairs that aligned concordantly at least once.
notes: |
  * The `extra` param allows for additional arguments for bowtie2.
  * This wrapper uses an inner pipe. Make sure to use at least two threads in your Snakefile.
+ * Optional outputs:
+   - metrics: Path to metrics file
+   - unaligned: Path to unaligned unpaired reads
+   - unpaired: Path to unpaired reads that aligned at least once
+   - unconcordant: Path to pairs that didn't align concordantly
+   - concordant: Path to pairs that aligned concordantly at least once
bio/bowtie2/align/wrapper.py (5)

16-23: Add input validation to get_extension

While the function handles basic cases well, it could benefit from input validation to handle edge cases.

Consider this enhanced implementation:

 def get_extension(filename: str) -> str:
+    if not filename:
+        raise ValueError("Filename cannot be empty")
     if filename.endswith((".gz", ".bz2")):
         return filename.split(".")[-2].lower()
     return filename.split(".")[-1].lower()

65-66: Enhance error message for input validation

The error message could be more descriptive to help users understand the expected format.

Consider this improvement:

 if not isinstance(SAMPLE, str) and len(SAMPLE) not in [1, 2]:
-    raise ValueError("input must have 1 (single-end) or 2 (paired-end) elements")
+    raise ValueError(
+        f"Input must have 1 (single-end) or 2 (paired-end) elements, got {len(SAMPLE)} elements"
+    )

106-110: Improve error messages for sort parameters

The error messages for sort parameters could be more user-friendly.

Consider these improvements:

 if SORT_ORDER not in {"coordinate", "queryname"}:
     raise ValueError(
-        f"Unexpected value for sort_order ({SORT_ORDER})"
-        "Valid values are 'coordinate' or 'queryname'"
+        f"Invalid sort_order '{SORT_ORDER}'. Valid values are: 'coordinate' or 'queryname'"
     )

 if SORT_PROGRAM not in {"none", "samtools", "picard"}:
     raise ValueError(
-        f"Unexpected value for sort_program ({SORT_PROGRAM})"
-        "Valid values are 'none', 'samtools' or 'picard'"
+        f"Invalid sort_program '{SORT_PROGRAM}'. Valid values are: 'none', 'samtools' or 'picard'"
     )

Also applies to: 112-116


150-153: Simplify conditional logic using ternary operator

The conditional logic for interleaved input can be simplified.

-        if IS_INTERLEAVED:
-            cmd_input = f"--interleaved {SAMPLE}"
-        else:
-            cmd_input = f"-U {SAMPLE}"
+        cmd_input = f"--interleaved {SAMPLE}" if IS_INTERLEAVED else f"-U {SAMPLE}"
🧰 Tools
🪛 Ruff (0.8.2)

150-153: Use ternary operator cmd_input = f"--interleaved {SAMPLE}" if IS_INTERLEAVED else f"-U {SAMPLE}" instead of if-else-block

Replace if-else-block with cmd_input = f"--interleaved {SAMPLE}" if IS_INTERLEAVED else f"-U {SAMPLE}"

(SIM108)


194-194: Remove unnecessary f-string prefixes

Some strings are marked as f-strings but don't contain any placeholders.

-            SAMTOOLS_OPTS += f"--write-index "
+            SAMTOOLS_OPTS += "--write-index "
-            PICARD_OPTS += f"--CREATE_INDEX true "
+            PICARD_OPTS += "--CREATE_INDEX true "

Also applies to: 214-214

🧰 Tools
🪛 Ruff (0.8.2)

194-194: f-string without any placeholders

Remove extraneous f prefix

(F541)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ceba80f and eb499d4.

📒 Files selected for processing (4)
  • bio/bowtie2/align/meta.yaml (2 hunks)
  • bio/bowtie2/align/test/Snakefile (4 hunks)
  • bio/bowtie2/align/wrapper.py (1 hunks)
  • test_wrappers.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
test_wrappers.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.2)
bio/bowtie2/align/wrapper.py

27-27: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


43-43: Undefined name snakemake

(F821)


46-46: Undefined name snakemake

(F821)


47-47: Undefined name snakemake

(F821)


51-51: Undefined name snakemake

(F821)


52-52: Undefined name snakemake

(F821)


53-53: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)


55-55: Undefined name snakemake

(F821)


57-57: Undefined name snakemake

(F821)


59-59: Undefined name snakemake

(F821)


70-70: Undefined name snakemake

(F821)


76-76: Undefined name snakemake

(F821)


150-153: Use ternary operator cmd_input = f"--interleaved {SAMPLE}" if IS_INTERLEAVED else f"-U {SAMPLE}" instead of if-else-block

Replace if-else-block with cmd_input = f"--interleaved {SAMPLE}" if IS_INTERLEAVED else f"-U {SAMPLE}"

(SIM108)


194-194: f-string without any placeholders

Remove extraneous f prefix

(F541)


214-214: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (8)
bio/bowtie2/align/meta.yaml (1)

25-27: LGTM! Well-structured sorting parameters

The new sorting parameters are well-organized and provide good flexibility:

  • sort_program: Clear options for sorting implementation
  • sort_extra: Allows customization of sorting behavior
  • sort_order: Standard options for sort ordering
bio/bowtie2/align/test/Snakefile (2)

Line range hint 1-21: LGTM! Well-structured base rule

The base rule test_bowtie2_sam is well-implemented with:

  • Clear input/output paths
  • Proper thread specification (>= 2 as required)
  • Consistent log file naming

75-90: LGTM! Well-designed sorting test rules

The sorting test rules are properly implemented:

  • Inherit from base rule using 'use rule as'
  • Consistent output path structure
  • Clear parameter specification for sort programs
test_wrappers.py (1)

2045-2070: LGTM! Comprehensive test coverage

The bowtie2 alignment tests provide good coverage:

  • Tests all output formats (SAM/BAM/CRAM)
  • Tests both sorting programs (samtools/picard)
  • Consistent with Snakefile output paths
  • Proper core allocation
bio/bowtie2/align/wrapper.py (4)

1-13: LGTM: Header and imports are appropriate

The imports are well-organized and include all necessary modules for the enhanced functionality.


85-99: LGTM: Output validation is thorough

The validation of output file extensions and index compatibility is comprehensive and well-implemented.


125-139: LGTM: Compatibility checks are comprehensive

The validation of CRAM output requirements and index file compatibility is thorough and well-implemented.


234-242: LGTM: Command execution is well-implemented

The use of a temporary directory and command construction is clean and efficient.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
bio/bowtie2/align/wrapper.py (3)

34-38: Consider removing commented-out output variables.

These commented-out variables (METRICS, UNALIGNED, etc.) and their related code (lines 179-188) should either be implemented or removed to maintain code cleanliness.

Would you like me to help clean up these commented sections?


197-198: Simplify thread check condition.

The condition if sort_threads >= 1 is redundant since we've already validated that there are at least 2 threads when sorting is enabled.

-        if sort_threads >= 1:
-            SAMTOOLS_OPTS += f"--threads {sort_threads} "
+        SAMTOOLS_OPTS += f"--threads {sort_threads} "

204-205: Make SORT_EXTRA modification more explicit.

The modification of SORT_EXTRA should be more explicit about appending the queryname sort option.

-        if SORT_ORDER == "queryname":
-            SORT_EXTRA += "-n "
+        queryname_opt = "-n " if SORT_ORDER == "queryname" else ""
+        SORT_EXTRA += queryname_opt
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb499d4 and a2e6504.

📒 Files selected for processing (1)
  • bio/bowtie2/align/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.2)
bio/bowtie2/align/wrapper.py

27-27: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


43-43: Undefined name snakemake

(F821)


46-46: Undefined name snakemake

(F821)


50-50: Undefined name snakemake

(F821)


51-51: Undefined name snakemake

(F821)


52-52: Undefined name snakemake

(F821)


53-53: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)


56-56: Undefined name snakemake

(F821)


58-58: Undefined name snakemake

(F821)


72-72: Undefined name snakemake

(F821)


78-78: Undefined name snakemake

(F821)


165-165: Undefined name snakemake

(F821)

🔇 Additional comments (3)
bio/bowtie2/align/wrapper.py (3)

16-23: LGTM! Well-documented helper function.

The get_extension function correctly handles both compressed and uncompressed files, with proper type annotations and documentation.


242-249: LGTM! Proper temporary directory handling.

The command execution is well-implemented using Python's tempfile module for secure temporary directory management.


52-54: Verify sorting functionality across different scenarios.

The implementation supports both samtools and picard sorting with different sort orders. Let's verify this functionality:

Also applies to: 194-239

✅ Verification successful

Based on the search results, I can now generate the final response as I have sufficient information to verify the sorting functionality. The results show that:

  1. The sorting implementation in bowtie2/align is consistent with other bioinformatics tools in the codebase (bwa, bwa-mem2, dragmap, etc.)
  2. The sort parameters (sort_program, sort_order, sort_extra) are used consistently across the codebase
  3. There are test cases in bio/bowtie2/align/test/Snakefile that verify both samtools and picard sorting
  4. The sort_order parameter correctly handles both "coordinate" and "queryname" options
  5. The implementation properly handles extra arguments through sort_extra parameter

Sorting functionality implementation is verified and consistent with codebase patterns

  • The implementation follows the established pattern used across multiple bioinformatics tools in the codebase
  • Test cases exist for both samtools and picard sorting implementations
  • The sort parameters are handled consistently with other similar wrappers
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify sorting implementation and its usage across the codebase

# Check for other wrappers using similar sorting logic
echo "Checking for similar sorting implementations:"
rg -A 5 'sort_program.*=.*("samtools"|"picard")'

# Check for test cases covering sorting functionality
echo "Checking for sort-related test cases:"
rg -A 10 'sort_program.*=.*(samtools|picard)' test/

# Check for potential issues with sort order handling
echo "Checking sort order handling:"
rg -A 5 'sort_order.*=.*("coordinate"|"queryname")'

Length of output: 25780

🧰 Tools
🪛 Ruff (0.8.2)

52-52: Undefined name snakemake

(F821)


53-53: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)

Comment on lines 104 to 108
if THREADS == 1 and SORT_PROGRAM != "none":
raise ValueError(
"Not enough threads requested. This wrapper requires at least two threads: "
"one for bowtie2 and one for samtools/picard."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consolidate duplicate thread validation logic.

There are two identical thread validation blocks. Consider consolidating them into a single validation at the beginning of the validation section.

Here's a suggested consolidation:

- if THREADS == 1 and SORT_PROGRAM != "none":
-     raise ValueError(
-         "Not enough threads requested. This wrapper requires at least two threads: "
-         "one for bowtie2 and one for samtools/picard."
-     )

# ... other validations ...

- if SORT_PROGRAM != "none" and THREADS <= 1:
-     raise ValueError(
-         "Not enough threads requested. This wrapper requires at least two threads: "
-         "one for bowtie2 and one for samtools/picard."
-     )

+ # Move this to the beginning of validation section
+ if SORT_PROGRAM != "none" and THREADS < 2:
+     raise ValueError(
+         "Not enough threads requested. This wrapper requires at least two threads: "
+         "one for bowtie2 and one for samtools/picard."
+     )

Also applies to: 127-131

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
bio/bowtie2/align/wrapper.py (4)

16-23: Add input validation to get_extension function

The function should validate that the input filename is not empty and is a string.

 def get_extension(filename: str) -> str:
+    if not filename:
+        raise ValueError("Filename cannot be empty")
     if filename.endswith((".gz", ".bz2")):
         return filename.split(".")[-2].lower()
     return filename.split(".")[-1].lower()

34-38: Clean up or document commented code

Several output variables are commented out. If these features are planned for future implementation, please add a TODO comment explaining the plan. If they are no longer needed, consider removing them.


108-112: Improve error message formatting

The error message for invalid sort_order is missing proper punctuation.

     raise ValueError(
-        f"Unexpected value for sort_order ({SORT_ORDER})"
-        "Valid values are 'coordinate' or 'queryname'"
+        f"Unexpected value for sort_order ({SORT_ORDER}). "
+        "Valid values are 'coordinate' or 'queryname'."
     )

187-235: Add error handling for invalid sort program combinations

While the validation checks prevent invalid combinations, it would be good practice to add an explicit error case in the pattern matching.

     case _:
+        if SORT_PROGRAM not in {"none"}:
+            raise ValueError(f"Unexpected sort program: {SORT_PROGRAM}")
         if sort_threads >= 1:
             SAMTOOLS_OPTS += f"--threads {sort_threads} "
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2e6504 and 3bdc418.

📒 Files selected for processing (1)
  • bio/bowtie2/align/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.2)
bio/bowtie2/align/wrapper.py

27-27: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


43-43: Undefined name snakemake

(F821)


46-46: Undefined name snakemake

(F821)


50-50: Undefined name snakemake

(F821)


51-51: Undefined name snakemake

(F821)


52-52: Undefined name snakemake

(F821)


53-53: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)


56-56: Undefined name snakemake

(F821)


58-58: Undefined name snakemake

(F821)


72-72: Undefined name snakemake

(F821)


78-78: Undefined name snakemake

(F821)


158-158: Undefined name snakemake

(F821)

🔇 Additional comments (1)
bio/bowtie2/align/wrapper.py (1)

239-246: LGTM! Clean implementation of command execution

The implementation properly manages temporary directories and handles command execution with appropriate logging.

@jlanga
Copy link
Contributor Author

jlanga commented Dec 19, 2024

@fgvieira I think I am done with this wrapper.

I've left commented the part about outputting extra files from bowtie2 (metrics, unaligned, unpaired, concordant and unconcordant). Some of them introduce a lot of complexity in the code because bowtie can output 1 or 2 fastq files depending on if the input is SE or PE. These files can be easily obtained with samtools view and samtools fastq. Also if you want gzipped files you have to use different flags (--un-conc becomes --un-conc-gz or --un-conc-bz2). It is a headache.

Let me know if you want me to keep that functionality on to uncomment it.

@fgvieira
Copy link
Collaborator

It seems you still have un-pushed commits. Can you push them?

@jlanga
Copy link
Contributor Author

jlanga commented Dec 20, 2024

I don't understand. All my commits are on the bowtie2_sort branch. I have nothing else pending to push in my computer.

@jlanga
Copy link
Contributor Author

jlanga commented Dec 20, 2024

I clicked the Update Branch button. I hope it was that.

@fgvieira
Copy link
Collaborator

On the environment.yaml file, the patch versions are not pinned (see my comments).

@jlanga
Copy link
Contributor Author

jlanga commented Dec 20, 2024

I see that we both removed the patch number, but in a comment later you appended the one for picard-slim:

- picard-slim =3.3.0

Is that correct? That one has to be fixed to 3.3.0?

@jlanga
Copy link
Contributor Author

jlanga commented Dec 20, 2024

Ok. I see other wrappers with picard-slim =3.3.0. Fixing that and repinning everything.

@fgvieira
Copy link
Collaborator

fgvieira commented Dec 20, 2024

but also bowtie2 and wrapper utils

@jlanga
Copy link
Contributor Author

jlanga commented Dec 20, 2024

Ok. Pinned everything with the exception of samtools. The pin file stays the same after that.

@fgvieira
Copy link
Collaborator

Why the total re-write of the wrapper? As I see it, you only needed to add the logic for picard, like (for example) in the bwa wrapper:

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view {samtools_opts}"
elif sort == "samtools":
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {samtools_opts} {sort_extra} -T {tmpdir}"
elif sort == "fgbio":
if sort_order == "queryname":
sort_extra += " -s Queryname"
pipe_cmd = "fgbio SortBam -i /dev/stdin -o {snakemake.output[0]} {sort_extra}"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = "picard SortSam {java_opts} {sort_extra} --INPUT /dev/stdin --TMP_DIR {tmpdir} --SORT_ORDER {sort_order} --OUTPUT {snakemake.output[0]}"
else:
raise ValueError(f"Unexpected value for params.sort ({sort})")

and change the line:

"| samtools view --with-header "

@jlanga
Copy link
Contributor Author

jlanga commented Dec 20, 2024

My initial plan was to just do either samtools view or sort, but I saw the wrapper from bwa-mem2 and I saw that it could be done for sam, bam and cram, sorted and unsorted, and with samtools and picard.

But bwa-mem2 only aligns and generates the sam file. Bowtie2 can produce extra outputs.

And now you brought to my attention fgbio SortSam.

And there are wrappers specific to a single sort program like bwa-mem2 and samblaster

What is the official view from snakemake wrappers? A monolithic wrapper htat handles multiple tool combinations, or multiple silly wrappers?

If we go through the silly wrappers, we can delete 100+ lines of code just for validation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
bio/bowtie2/align/wrapper.py (5)

16-23: Enhance extension handling robustness

The extension handling could be improved to be more resilient:

  • Case-insensitive extension comparison
  • Handle paths with multiple dots

Consider this improved implementation:

 def get_extension(filename: str) -> str:
     """
     Return file format since Bowtie2 reads files that
     could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
     """
-    if filename.endswith((".gz", ".bz2")):
-        return filename.split(".")[-2].lower()
-    return filename.split(".")[-1].lower()
+    filename = filename.lower()
+    base, ext = path.splitext(filename)
+    if ext in ('.gz', '.bz2'):
+        return path.splitext(base)[1][1:]  # Remove leading dot
+    return ext[1:]  # Remove leading dot

34-38: Document rationale for commented-out output variables

These commented-out variables suggest planned functionality for metrics and different alignment outputs. Consider adding a TODO comment explaining why they're disabled and when they might be re-enabled.

Add a comment explaining the rationale:

+ # TODO: These outputs are temporarily disabled due to complexity with SE/PE handling
+ # They can be re-enabled once we implement proper SE/PE output handling
 # METRICS = snakemake.output.get("metrics", None)
 # UNALIGNED = snakemake.output.get("unaligned", None)
 # UNPAIRED = snakemake.output.get("unpaired", None)
 # UNCONCORDANT = snakemake.output.get("unconcordant", None)
 # CONCORDANT = snakemake.output.get("concordant", None)

64-68: Enhance error message with actual input details

The error message could be more helpful by including the actual input received.

 if not isinstance(SAMPLE, str) and len(SAMPLE) not in [1, 2]:
     raise ValueError(
         "Input must have 1 (single-end) or 2 (paired-end) elements, "
-        f"got {len(SAMPLE)} elements"
+        f"got {len(SAMPLE)} elements: {SAMPLE}"
     )

91-101: Standardize error message formatting

The error messages for output validation should follow a consistent format.

 if bam_extension.lower() not in {"sam", "bam", "cram"}:
     raise ValueError(
-        f"Unrecognized extension for output file: {bam_extension}."
-        "Valid extensions are 'sam', 'bam' or 'cram'"
+        f"Unrecognized extension for output file: {bam_extension}. "
+        "Valid extensions are: 'sam', 'bam' or 'cram'"
     )

 if bai_extension not in {None, "bai", "crai"}:
     raise ValueError(
-        f"Unrecognized extension for index file: {bai_extension}."
-        "Valid extensions are 'bai' or 'crai'"
+        f"Unrecognized extension for index file: {bai_extension}. "
+        "Valid extensions are: 'bai' or 'crai'"
     )

200-206: Use consistent string formatting style

The command string uses inconsistent formatting styles. Consider using f-strings consistently.

         cmd_output = (
-            "| samtools sort "
-            "{SAMTOOLS_OPTS} "
-            "{SORT_EXTRA} "
-            "-T {TMPDIR} "
-            "-o {bam} "
+            f"| samtools sort "
+            f"{SAMTOOLS_OPTS} "
+            f"{SORT_EXTRA} "
+            f"-T {TMPDIR} "
+            f"-o {bam} "
         )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8242918 and a93881a.

📒 Files selected for processing (1)
  • bio/bowtie2/align/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.2)
bio/bowtie2/align/wrapper.py

27-27: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


39-39: Undefined name snakemake

(F821)


43-43: Undefined name snakemake

(F821)


46-46: Undefined name snakemake

(F821)


50-50: Undefined name snakemake

(F821)


51-51: Undefined name snakemake

(F821)


52-52: Undefined name snakemake

(F821)


53-53: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)


56-56: Undefined name snakemake

(F821)


58-58: Undefined name snakemake

(F821)


72-72: Undefined name snakemake

(F821)


78-78: Undefined name snakemake

(F821)


158-158: Undefined name snakemake

(F821)

🔇 Additional comments (1)
bio/bowtie2/align/wrapper.py (1)

247-254: LGTM! Proper temporary directory and shell execution handling

The implementation correctly uses a temporary directory and properly handles shell command execution with logging.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
bio/bowtie2/align/wrapper.py (4)

16-25: Add input validation to get_extension

The function should validate that the input is not None or empty to prevent potential runtime errors.

 def get_extension(filename: str) -> str:
+    if not filename:
+        raise ValueError("Filename cannot be None or empty")
     filename = filename.lower()
     base, ext = path.splitext(filename)

36-42: Consider documenting the plan for re-enabling outputs

The TODO comment indicates that several outputs are temporarily disabled due to SE/PE handling complexity. Consider:

  1. Creating a tracking issue for re-enabling these outputs
  2. Documenting the specific challenges with SE/PE handling
  3. Adding acceptance criteria for when these can be re-enabled

Would you like me to help create a GitHub issue to track this enhancement?


112-116: Improve error message formatting

The error message for invalid sort order is missing a space between the message parts.

     raise ValueError(
-        f"Unexpected value for sort_order ({SORT_ORDER})"
-        "Valid values are 'coordinate' or 'queryname'"
+        f"Unexpected value for sort_order ({SORT_ORDER}). "
+        "Valid values are 'coordinate' or 'queryname'"
     )

193-210: Consider extracting samtools command construction

The samtools command construction logic is complex and could benefit from being extracted into a helper function for better maintainability.

+def build_samtools_sort_command(bam, bai, sort_order, ref, sort_threads):
+    opts = get_samtools_opts(snakemake, parse_threads=False, param_name="sort_extra")
+    opts += f" --threads {sort_threads} "
+    if bai:
+        opts += "--write-index "
+    if sort_order == "queryname":
+        opts += "-n "
+    if get_extension(bam) == "cram":
+        opts += f"--reference {ref} --output-fmt CRAM "
+    return (
+        "| samtools sort "
+        f"{opts} "
+        "{SORT_EXTRA} "
+        "-T {TMPDIR} "
+        f"-o {bam} "
+    )

 match SORT_PROGRAM:
     case "samtools":
-        SAMTOOLS_OPTS += f"--threads {sort_threads} "
-        if BAI:
-            bam = f"{BAM}##idx##{BAI}"
-            SAMTOOLS_OPTS += "--write-index "
-        else:
-            bam = BAM
-        if SORT_ORDER == "queryname":
-            SORT_EXTRA += "-n "
-        if bam_extension == "cram":
-            SAMTOOLS_OPTS += f"--reference {REF} --output-fmt CRAM "
-        cmd_output = (
-            "| samtools sort "
-            "{SAMTOOLS_OPTS} "
-            "{SORT_EXTRA} "
-            "-T {TMPDIR} "
-            "-o {bam} "
-        )
+        output_bam = f"{BAM}##idx##{BAI}" if BAI else BAM
+        cmd_output = build_samtools_sort_command(
+            output_bam, BAI, SORT_ORDER, REF, sort_threads
+        )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a93881a and 7ba5b46.

📒 Files selected for processing (1)
  • bio/bowtie2/align/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.2)
bio/bowtie2/align/wrapper.py

29-29: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


31-31: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


35-35: Undefined name snakemake

(F821)


43-43: Undefined name snakemake

(F821)


47-47: Undefined name snakemake

(F821)


50-50: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)


55-55: Undefined name snakemake

(F821)


56-56: Undefined name snakemake

(F821)


57-57: Undefined name snakemake

(F821)


58-58: Undefined name snakemake

(F821)


60-60: Undefined name snakemake

(F821)


62-62: Undefined name snakemake

(F821)


76-76: Undefined name snakemake

(F821)


82-82: Undefined name snakemake

(F821)


162-162: Undefined name snakemake

(F821)

🔇 Additional comments (3)
bio/bowtie2/align/wrapper.py (3)

124-127: Consolidate thread validation logic

This thread validation appears to be duplicated. Consider consolidating all thread-related validations into a single location at the beginning of the validation section.


251-258: LGTM! Good use of temporary directory

The implementation properly handles temporary files and includes appropriate logging.


161-162: Verify thread allocation between bowtie2 and sort programs

The thread allocation between bowtie2 and sort programs needs verification to ensure optimal performance.

✅ Verification successful

Based on the code analysis, I can now provide a final response regarding the thread allocation verification:

Thread allocation between bowtie2 and sort is properly implemented

The wrapper implements a sensible thread allocation strategy:

  • When sorting is enabled, it reserves 1 thread for sorting (sort_threads = snakemake.threads - 1) and assigns the rest to bowtie2 (cmd_threads = THREADS)
  • The wrapper enforces this by checking if SORT_PROGRAM != "none" and THREADS < 2: raise ValueError("Not enough threads requested...")
  • This prevents resource contention between bowtie2 alignment and the sorting process
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if other wrappers in the codebase have similar thread allocation patterns
# This helps validate our approach to thread distribution

# Look for thread allocation patterns in other wrappers
rg -A 5 "threads\s*=\s*snakemake\.threads\s*-\s*[0-9]+" "bio/**/wrapper.py"

# Look for bowtie2 thread parameter usage
rg -A 5 "--threads.*bowtie2" "bio/**/wrapper.py"

Length of output: 632


Script:

#!/bin/bash
# Let's try a different approach to find the files and analyze thread allocation

# First, let's find the wrapper.py file
fd "wrapper.py" bio/

# Then look for thread-related patterns in the specific file
fd "wrapper.py" bio/ --exec rg -l "threads"

# Get the content of the specific wrapper file to analyze thread allocation
fd "wrapper.py" bio/bowtie2/align --exec cat {}

# Look for any configuration or documentation files that might specify threading behavior
fd "meta.yaml|README.md" bio/bowtie2/align --exec cat {}

Length of output: 27634

🧰 Tools
🪛 Ruff (0.8.2)

162-162: Undefined name snakemake

(F821)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants