All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
This project adheres to Semantic Versioning.
- Update Picard version to 3.1.1
- Update BWA-MEM2, HISAT2 images to use SAMTools version 1.17
- Update Nextflow configuration test workflows
- Update README.md to match template
- Add Action to generate documentation in GitHub Pages
- Add Action to run Nextflow configuration regression tests
- Add
setup_docker_cpus
method
- Remove old
bl-base
Docker image
- Change name
base_output_dir
tooutput_dir_base
- Update Picard version to 3.0.0 after the most recent Broad release that updated the underlying Java version
- Use the PipeVal module from pipeline-Nextflow-module
- Update SAMTools version to 1.17
- Use modularized methods and schema functions for directory handling
- Use modularized methods for resource limits and allocations
- Setup
NFTest
with a-mini-n2 - Add retry with lower CPUs for alignment processes
- Add retry with increased memory for
MarkDuplicates
withPicard
- Explicit parameter to control BWA-MEM2 alt-aware mode
- Support for YAML input files through
-params-file
option - Additional test case for YAML files
- PlantUML workflow diagram
- Add github action to build PlantUML diagram
- Validate input parameters
- Old workflow diagram =======
- Change to github packages instead of dockerhub
- Standardize intermediate and output filenames using generate_standardized_filename
- Update input csv according to here (Section "Input structures for alignment pipelines")
run_MarkDuplicatesSpark_GATK
now retries once with 130GB on F72, and 140GB on M64- Update registered output function
- Instructions in README for setting up github PAT
- Parameter
docker_container_registry
indefault.config
- Release workflow
- Fix
sort_order
definition - Remove
run_index_SAMtools
, output index duringrun_merge_SAMtools
instead - Update
README.md
: fix links, format code, grammar - Remove sample name from
output_dir
intemplate.config
- Update PR template to follow here
- Remove
bam_output_dir
frommain.nf
since it is not used, undefined and causes warning - Change "shell" to "script" in processes
- Move
F16.config
to config folder - Rename process
Generate_Sha512sum
togenerate_sha512sum
- Rename process
run_validate
torun_validate_PipeVal
- Restructure repo to follow template
- Rename
align-DNA.nf
tomain.nf
- Change output directory of MarkDuplicatesSpark metrics file to '/QC'.
- Use SAMtools
sort
instead of PicardSortSam
- Add
retry
method torun_sort_SAMtools
andrun_MarkDuplicatesSpark_GATK
(if run out of RAM then retry with more memory) - Add process
run_merge_SAMtools
: use whenparams.mark_duplicates=false
to ensure multiple BAM outputs are merged .github/CODEOWNERS
- Add config file for F16 node
- Use SAMtools index in the case MarkDuplicates (set by mark_duplicates parameter) is false
- Add parameter to toggle Spark metric generation. Default is off.
- Update
.gitignore
file according to template - Standardize output and log directory structure
- Update index file extension from all processes to .bam.bai
- Standardize config files
- Remove spark_temp_dir parameter from config template
- Replace temp_dir parameter with work_dir parameter
- Intermediate file removal
- Spark tempdir permission checks
- Update GATK to 4.2.4.1 to address Log4j vulnerabilities (https://github.com/advisories/GHSA-8489-44mv-ggj8, https://github.com/advisories/GHSA-p6xc-xr62-6r2g)
- Update Picard version to 2.26.10 to address Log4j vulnerabilities (https://github.com/advisories/GHSA-8489-44mv-ggj8)
- Add F32 config file
- Add mark_duplicates parameter to enable exclusion or inclusion of MarkDuplicates processes.
- Changed names of midmem.config and execute.config into F72.config and M64.config respectively.
- Rename bug report to "Issue report" and remove old node names from it
- Update GATK to 4.2.4.0 to address Log4j critical vulnerability (https://github.com/advisories/GHSA-jfh8-c2jp-5v3q)
- Fix potential Spark temp directory permissions issue
- Benchmarking report with BWA-MEM 2.1 added
- GPL2 License added
- MarkDuplicatesSpark process added as an option
- Removed explicit index creation process and enabled option for MarkDuplicate process to create index
- Allow CPU and memory allocation to dictate parallelization rather than maxForks
- HISAT2 aligner functionality and the option to run either BWA-MEM2/HISAT2 or both at once. The default aligner is BWA-MEM2.
- A python script to generate config files from command line.
- Update config file to process inputs for each aligner separately. Old config files still work and BWA-MEM2 will be run as usual.
- #112 Update BWA-MEM2 and SAMtools docker to SAMtools 1.12
- #121 Update version information in the main script
- #126 Update output directory structure
- Process names standardized
- #128 Use explicit tab delimiters to ensure proper program tagging
- Updated validation docker image to v2.1.5
- Dockerfiles for BWA-MEM2, jvarkit-cmpbams, and Picard removed and moved to their own separate repositories (docker-BWA-MEM2, docker-jvarkit-cmpbams, and docker-Picard, respectively).
- #61 Update validation to 2.1.0.
- #76 Update version documentation and manifest.
- #78 #81 update resources setting for alignment, sort, and markduplicate
- #79 Update CHANGELOG.md to reflect Keep a Changelog format.
- #82 Save outputs in directories based on FASTQ library/lane #2.
- #83 Rename main workflow module.
- #88 node specific configs are not included properly
- #89 docker permission is not set properly
- #90 Fixed dockerfiles to pass dockerfilelint
- #70 Fixes crash related to checking default node CPU and memory configurations
- #67 Run docker with group permissions of the user executing the pipeline
- #65 Check for write permission on output directories before executing
- #67 Run docker with group permissions of the user executing the pipeline
- #65 Check for write permission on output directories before executing
- #43 Port to DSL2
- A small pipeline to generate the reference genome index files. This is a separate nextflow script from the main pipeline script.
- Processes in a Docker container are executed as the user automatically instead of root.
- Process name for alignment is changed to align_BWA_mem_convert_SAM_to_BAM_samtools to be readable
- Gave sudo to Docker when running on the sge cluster
- #31 Error: Unknown method invocation
includeConfig
- #32 Error: Please specify the disease_id, patient_id, dataset_id, sample_id, analyte, and technology in the config file
- Validation scripts are fully implemented. The pipeline will stop if invalid input/output files detected, e.g., files not found are wrong file type
- Enabled input and output directly from and to the Boutros Lab data storage
- Simplified the config file opened to users with only essensial parameters included
- bwa-mem2 is upgraded to v2.1. It provides a smaller indexed genome and lower cpu usage comparing to the previous version v2.0.
- Nextflowization of align-DNA pipeline
- Dynamic resource allocation
- Version tool updates (BWA 0.7.17, SAMtools 1.10, Picard Tools 2.23.3)
- Initial Release