Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add trimmomatic to tools/ #192

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

mjgattas
Copy link

No description provided.

}

command <<<
cd /usr/local/share/trimmomatic-0.36-5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is necessary? biocontainers should have added a trimmomatic executable to the $PATH, as that's what they usually do. It should be unneeded to invoke the full path here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to work out access to the executable, but without cd'ing to the exact directory I wasn't able to get it to work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's likely because it's being called with java. There is a utility wrapper script at /usr/local/bin/trimmomatic that provides the functionality and is on PATH. So you should be able to swap java -jar trimmomatic.jar with trimmomatic and remove the directory change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @adthrasher! Hadn't noticed that distinction

@a-frantz a-frantz changed the title Feat: Add trimmomatic tooling for trimming of single and paired-end files feat: Add trimmomatic to tools/ Dec 17, 2024
@a-frantz a-frantz changed the title feat: Add trimmomatic to tools/ feat: add trimmomatic to tools/ Dec 17, 2024
@mjgattas
Copy link
Author

@a-frantz I think the PR is in enough of a stable feature state that your review/suggestions are welcome. Please let me know how I can shore this up so it's ready for approval/merging


command <<<
ls -l /usr/local/share/trimmomatic-0.36-5/adapters/
trimmomatic PE -phred33 -threads ~{threads} \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these tasks should be combined. You can dynamically select the subcommand with ~{if (defined(<read_two>)) then "PE" else "SE"}

Comment on lines 125 to 134
-trimlog trimlog.txt ~{input_file_forward} ~{input_file_reverse} \
output_forward.fq.gz output_reverse.fq.gz \
~{if (phred64) then "-phred64" else "-phred33"} \
ILLUMINACLIP:~{illumina_clip} \
~{if defined(leading) then "LEADING:~{leading}" else ""} \
~{if defined(trailing) then "TRAILING:~{trailing}" else ""} \
SLIDINGWINDOW:~{window_size}:~{window_quality} \
MINLEN:~{minlen} \
~{if defined(crop) then "CROP:~{crop}" else ""} \
~{if defined(headcrop) then "HEADCROP:~{headcrop}" else ""} \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: indent the line continuations by one level (four spaces) and drop this last slash

tools/trimmomatic.wdl Show resolved Hide resolved
tools/trimmomatic.wdl Outdated Show resolved Hide resolved
Comment on lines 3 to 9
"leading": 3,
"trailing": 3,
"threads": 10,
"window_size": 4,
"window_quality": 15,
"minlen": 36,
"illumina_clip": "TruSeq3-SE:2:30:10",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the defaults of trimmomatic? You should specify the defaults in the input block so that users don't need to specify all this (unless they want to override the defaults).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I'm unfamiliar with the standards for trimming these read files, I'm not sure if these are "defaults" per se, but they are values used in the documentation's examples. Would you like me to make them defaults, or should we consult with end users to see what the most likely defaults would be?

parameter_meta {
input_file: "Input FASTQ format file to run Trimmomatic on."
output_name: "Output FASTQ format file of trimmed reads."
illumina_clip: "Cut adapter and other illumina-specific sequences from the read."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this param more? What syntax does trimmomatic expect?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will admit I don't fully understand what this adapter is, but there's more information about the Adapter on the trimmomatic documentation. It appears that this is an optional argument, so I can modify the wdl task accordingly.

The docs also mention: "Trimming occurs in the order which the steps are specified on the command line. It is recommended in most cases that adapter clipping, if required, is done as early as possible." Given this, I think the adapter cutting should be applied before other parameters, as it currently is in the wdl tasks.

Comment on lines 32 to 37
String outfile_name_two = (
if defined(read_two) then
basename(read_two, ".fastq.gz") + ".reverse.trimmed.fastq.gz"
else
""
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-frantz this block of code is giving me trouble, causing this issue:
(tools/trimmomatic.wdl Ln 34 Col 26) Expected String instead of File?; for basename argument #1 basename(read_two, ".fastq.gz") + ".reverse.trimmed.fastq.gz"

I'm confused, because a similar passing of a File into basename on line 28 does not cause any errors. Curious if you know how I should handle the optional outfile_name_two variable assignment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the error message from sprocket or miniwdl? Either way it's misleading/incorrect.

The issue is that read_two is optional, and must be coerced to a required param before being used in the basename func. But this section is going to be rewritten once I get the REGEX to you, so don't worry about it for now

tools/trimmomatic.wdl Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants