v1.22
Highlights - notebooks and performance
Version 1.22 brings some major behind-the-scenes refactoring to MultiQC. This unlocks a number of new features, such as the ability to use MultiQC as a Python library in scripts / notebooks, and run-time validation of plot config attributes.
This release also introduces some huge performance improvements thanks to @rhpvorderman.
Compared to v1.21, a typical v1.22 run is 53% faster and has a 6x smaller peak-memory footprint - well worth updating! 🏃🏻♂️ 💨
Finally, support for the depreciated HighCharts plotting library is fully removed in v1.22, bringing to a close a long standing project to migrate to Plotly.
For more information, please see the upcoming MultiQC release blog article on the Seqera website: https://seqera.io/blog/
MultiQC updates
- Remove the
highcharts
template and Highcharts and Matplotlib dependencies (#2409) - Remove CSP.txt and the linting check, move the script that prints missing hashes under
scripts
. Admins of servers with Content Security Policy can use it to print missing hashes when they install a new MultiQC version with:python scripts/print_missing_csp.py --report full_report.html
(#2421) - Do not maintain change log between releases (#2427)
- Use native clipboard API (#2419)
- Profile runtime: visualize per-module memory and run time (#2548, #2547)
- Refactoring for performance:
- Search file blocks rather than individual lines for faster results (#2513)
- Refactor file content search for a 40% speed increase (#2505)
- Sort
filepatterns
for faster searching (#2506) - Use
array.array
for in-memory plot data, stream to render Jinja and dump JSON to reduce memory requirement (#2515) - Speed up all modules by caching
spectra.scale
and using sets instead of lists (#2509) - Stream json data to a file to save 30% of the memory (#2510)
- Do
replace_nan
in place rather than creating a new object (#2529) - Use gzip rather than lzstring for compression and decompression of the plot data (#2504)
- Use gzip level 6 for faster json compression (#2553)
- Clean up module raw data after running each module, significantly reduces the memory footprint (#2551)
- Refactoring for interactivity and validation:
- Top-level functions for MultiQC use as a library (#2442)
- Pydantic models for plots and datasets (#2442)
- Validating plot configs with Pydantic (#2534)
- Use dataclasses for table and violin columns (#2546)
- Break up the main run function into submodules (#2446)
- Deprecate
multiqc.utils.config
andmultiqc.utils.report
in favour ofmultiqc.config
andmultiqc.report
(#2542) - Static typing of the report and config modules (#2445)
- Add type hints into core codebase (#2434)
- Consistent config options: rename
decimalPlaces
tott_decimals
(#2451) - Remove encoding and shebang headers from module files (#2425)
- Refactor line plot categories: keep boolean throughout the code, and data points as pairs for simplicity (#2418)
- Fixes:
- Fix error when using default sort (#2544)
- Do not attempt to render flat plot when no data (#2490)
- Fix export plots with
--export
and always export data (#2489) - Fix: make sure
modify
lambda not present in JSON dump (#2455) - Enable
--export
even when writing interactive plots (#2444) - Replace
NaN
withnull
in exported JSON (#2432) - Fix
y_minrange
option (#2415)
- Reduce report size: exclude plot data for sections in
remove_sections
(#2460) - Add
ge
andle
tocond_formatting_rules
(#2494) - CI: use
uv pip
(#2352) - Lint check for use of
f["content_lines"]
(#2485) - Allow to set style of line graph (
lines
orlines+markers
) per plot (#2413) - Add
CMD
toDockerfile
so a default run without any parameters displays the--help
(#2279)
New modules
- Hostile (#2501)
- New module: Hostile is a short and long host reads removal tool
- Sequali (#2441)
- New module: Sequali Universal sequencing QC
Module updates
- Adapter Removal
- Standardize module names: use the came case (#2433)
- Bamdst
- BBTools
- Set missing values to
None
forbbmap qahist
(#2411)
- Set missing values to
- Bcftools
- Stats: add multialleic sites column (#2414)
- BCL Convert
- Busco
- Fix barplot colors (#2453)
- Cell Ranger
- Fix parsing antibody tab without
antibody_treemap_plot
(#2525)
- Fix parsing antibody tab without
- Cutadapt
- Speed up module by caching parsing versions (#2528)
- DRAGEN
- Add ploidy estimation table (#2496)
- fastp
- When could not parse sample name from command (i.e.
stdin
), use filename and proceed (#2536)
- When could not parse sample name from command (i.e.
- FastQC
- Skip per tile sequence quality section in FastQC reports for better performance (#2552)
- Fix a
ZeroDivisionError
error (#2462) - Fix memory leak to make 7 times faster and use 10 times less memory (#2552)
- Do not keep intermediate data in memory to reduce memory footprint further (#2516 )
- Add option to ignore FastQC quality thresholds (#2486)
- goleft indexcov
- Work correctly even if no valid contigs in input (#2540)
- mosdepth
- Fix absolute coverage plot (#2488)
- nonpareil
- Change write_data_file label to be consistent with other modules (#2472)
- Picard
- qc3C
- Fix detecting sample name for relative path (#2502)
- QualiMap
- BamQC: when trimming long tails, keep at least 20x (#2431)
- Samtools
- Space Ranger
- fix for missing
genomic_dna
section (#2429)
- fix for missing
- xengsort
- Fix parsing long files (do no use
content_lines
) (#2484)
- Fix parsing long files (do no use
New Contributors
- @clintval made their first contribution in #2254
- @alanhoyle made their first contribution in #2279
- @rhpvorderman made their first contribution in #2441
- @TBradley27 made their first contribution in #2473
- @SumeetTiwari07 made their first contribution in #2501
Full Changelog: v1.21...v1.22