- Overview
- HMFtools WiGiTs
- Other Tools
- Pipeline Inputs
- Workflows
- Common Reports
- sash Module Outputs
- Coverage
- Reference Data
- FAQ
The sash Workflow is a genomic analysis framework comprising three primary pipelines:
- Somatic Small Variants (SNV somatic): Detects single nucleotide variants (SNVs) and indels in tumor samples, emphasizing clinical relevance.
- Somatic Structural Variants (SV somatic): Identifies large-scale genomic alterations (deletions, duplications, etc.) and integrates copy number data.
- Germline Variants (SNV germline): Focuses on inherited variants linked to cancer predisposition.
These pipelines utilize Bolt (a Python package designed for modular processing) and leverage outputs from the DRAGEN Variant Caller alongside the Hartwig Medical Foundation (HMF) tools integrated via Oncoanalyser. Each pipeline is tailored to a specific type of genomic variant, incorporating filtering, annotation and HTML reports for research and curation.
HMFtools is an open-source suite for cancer genomics developed by the Hartwig Medical Foundation. Key components used in sash include:
-
SAGE (Somatic Alterations in Genome): A tiered SNV/indel caller targeting cancer hotspots from databases including Cancer Genome Interpreter, CIViC, and OncoKB to recover low-frequency variants missed by DRAGEN. Outputs a VCF with confidence tiers (hotspot, panel, high/low confidence).
-
PURPLE: Estimates tumor purity (tumor cell fraction) and ploidy (average copy number), integrates copy number data, and calculates TMB (tumor mutation burden) and MSI (microsatellite instability).
-
Cobalt: Calculates read-depth ratios from sequencing data, providing essential input for copy number analysis. Its outputs are used by PURPLE to generate accurate copy number profiles across the genome.
-
Amber: Computes B-allele frequencies, which are critical for estimating tumor purity and ploidy. The Amber directory contains these measurements, supporting PURPLE's analysis.
A framework for running PCGR and other genomic reporting tools.
Tool for comprehensive clinical interpretation of somatic variants, providing tiered classifications and extensive annotation.
Tool for predisposition variant analysis and reporting in germline samples.
UMCCR-developed R package for generating cancer genomics reports.
Tool for structural variant annotation and visualization to classify complex rearrangements.
Esvee is a structural variant caller optimised for short read sequencing that identifies somatic and germline somatic rearrangements.
Tool for detecting viral integration events in human genome sequencing data.
{tumor_id}.hard-filtered.vcf.gz: Somatic variant calls from DRAGEN pipeline.- Optional:
${tumor_id}.hrdscore.csvhomologous recombination deficiency scores (surfaced in the cancer report when present).
${tumor_id}.esvee.ref_depth.vcf.gzand the accompanyingesvee/directory: depth and preparation files used to seed eSVee structural variant calling.
{tumor_id}.sage.somatic.vcf.gz: Somatic SNV/indel calls from SAGE.
- Directory:
virusbreakend/: Contains outputs from VIRUSBreakend, used for detecting viral integration events.
- Directory:
cobalt/: Contains read-depth ratio data required for copy number analysis by PURPLE.
- Directory:
amber/: Contains B-allele frequency measurements used by PURPLE to estimate tumor purity and ploidy.
- File:
chord/{tumor_id}.chord.prediction.tsv(optional): HRD predictions generated by oncoanalyser; incorporated into the cancer report when present.
In the Somatic Small Variants workflow, variant detection is performed using the DRAGEN Variant Caller and Oncoanalyser (relying on SAGE and PURPLE outputs). It's structured into four steps: Re-calling, Annotation, Filter, and Report. The final outputs include an HTML report summarizing the results.
- Re-calling SAGE variants to recover low-frequency mutations in hotspots.
- Annotate variants with clinical and functional information using PCGR.
- Filter variants based on quality and frequency criteria, while retaining those of potential clinical significance.
- Generate comprehensive HTML reports (PCGR, Cancer Report, LINX, MultiQC).
The variant calling re-calling step uses variants from SAGE, which is more sensitive than DRAGEN in detecting variants, particularly those with low allele frequency. SAGE focuses on cancer hotspots, prioritizing predefined genomic regions of high clinical or biological relevance with its filtering system. This enables the re-calling of biologically significant variants that may have been missed otherwise.
-
From DRAGEN: Somatic small variant caller VCF
${tumor_id}.main.dragen.vcf.gz
-
From Oncoanalyser: SAGE VCF
${tumor_id}.main.sage.filtered.vcf.gz
Filtered on chromosomes 1-22, X, Y, and M.
- Re-calling: VCF
${tumor_id}.rescued.vcf.gz
- Select High-Confidence SAGE Calls in Hotspot Regions:
- Filter the SAGE output to retain only variants that pass quality filters and overlap with known hotspot regions.
- Compare the input VCF and the SAGE VCF to identify overlapping and unique variants.
- Annotate existing somatic variant calls also present in the SAGE calls in the input VCF:
- For each variant in the input VCF, check if it exists in the SAGE existing calls.
- For variants integrated by SAGE:
- If
SAGE FILTER=PASSand input VCFFILTER=PASS:- Set
INFO/SAGE_HOTSPOTto indicate the variant is called by SAGE in a hotspot.
- Set
- If
SAGE FILTER=PASSand input VCFFILTERis notPASS:- Set
INFO/SAGE_HOTSPOTandINFO/SAGE_RESCUEto indicate the variant is re-called from SAGE. - Update
FILTER=PASSto include the variant in the final analysis.
- Set
- If
SAGE FILTERis notPASS:- Append
SAGE_lowconfto theFILTERfield to flag low-confidence variants.
- Append
- If
- Transfer SAGE
FORMATfields to the input VCF with aSAGE_prefix.
- Combine annotated input VCF with novel SAGE calls:
- Prepare novel SAGE calls. For each variant in the SAGE VCF missing from the input VCF:
- Rename certain
FORMATfields in the novel SAGE VCF to avoid namespace collisions:- For example,
FORMAT/SBis renamed toFORMAT/SAGE_SB.
- For example,
- Retain necessary
INFOandFORMATannotations while removing others to streamline the data.
- Rename certain
- Prepare novel SAGE calls. For each variant in the SAGE VCF missing from the input VCF:
The Annotation process employs Reference Sources (GA4GH/GIAB problem region stratifications, GIAB high confidence regions, gnomAD, Hartwig hotspots), UMCCR panel of normals (built from approximately 200 normal samples), and the PCGR tool to enrich variants with classification and clinical information. These annotations are used to decide which variants are retained or filtered in the next step.
- Small variant VCF
${tumor_id}.rescued.vcf.gz
- Annotated VCF
${tumor_id}.annotations.vcf.gz
- Set FILTER to "PASS" for unfiltered variants:
- Iterate over the input VCF file and set the
FILTERfield toPASSfor any variants that currently have no filter status (FILTERis.orNone).
- Iterate over the input VCF file and set the
- Annotate the VCF against reference sources:
- Use vcfanno to add annotations to the VCF file:
- gnomAD (version 2.1)
- Hartwig Hotspots
- ENCODE Blacklist
- Genome in a Bottle High-Confidence Regions (v4.2.1)
- Low and High GC Regions (< 30% or > 65% GC content, compiled by GA4GH)
- Bad Promoter Regions (compiled by GA4GH)
- Use vcfanno to add annotations to the VCF file:
- Annotate with UMCCR panel of normals counts:
- Use vcfanno and bcftools to annotate the VCF with counts from the UMCCR panel of normals.
- Standardize the VCF fields:
- Add new
INFOfields for use with PCGR:TUMOR_AF,NORMAL_AF: Tumor and normal allele frequencies.TUMOR_DP,NORMAL_DP: Tumor and normal read depths.
- Add the
ADFORMAT field:AD: Allelic depths for the reference and alternate alleles.
- Add new
- Prepare VCF for PCGR annotation:
- Make minimal VCF header keeping only INFO AF/DP, and contigs size.
- Move tumor and normal
FORMAT/AFandFORMAT/DPannotations to theINFOfield as required by PCGR. - Set
FILTERtoPASSand remove allFORMATand sample columns.
- Run PCGR (v1.4.1) to annotate VCF against external sources:
- Classify variants by tiers based on annotations and functional impact according to AMP/ASCO/CAP guidelines.
- Add
INFOfields into the VCF:TIER,SYMBOL,CONSEQUENCE,MUTATION_HOTSPOT,TCGA_PANCANCER_COUNT,CLINVAR_CLNSIG,ICGC_PCAWG_HITS,COSMIC_CNT. - External sources include VEP, ClinVar, COSMIC, TCGA, ICGC, Open Targets Platform, CancerMine, DoCM, CBMDB, DisGeNET, Cancer Hotspots, dbNSFP, UniProt/SwissProt, Pfam, DGIdb, and ChEMBL.
- Transfer PCGR annotations to the full set of variants:
- Merge the PCGR annotations back into the original VCF file.
- Ensure that all variants, including those not selected for PCGR annotation, have relevant clinical annotations where available.
- Preserve the
FILTERstatuses and other annotations from the original VCF.
The Filter step applies a series of stringent filters to somatic variant calls in the VCF file, ensuring the retention of high-confidence and biologically meaningful variants.
- Annotated VCF
${tumor_id}.annotations.vcf.gz
- Filtered VCF
${tumor_id}*filters_set.vcf.gz
Variants that do not meet these criteria will be filtered out unless they qualify for Clinical Significance Exceptions:
| Filter Type | Threshold/Criteria |
|---|---|
| Allele Frequency (AF) Filter | Tumor AF < 10% (0.10) |
| Allele Depth (AD) Filter | Fewer than 4 supporting reads (6 in low-complexity regions) |
| Non-GIAB AD Filter | Stricter thresholds outside GIAB high-confidence regions |
| Problematic Genomic Regions Filter | Overlap with ENCODE blacklist, bad promoter, or low-complexity regions |
| Population Frequency (gnomAD) Filter | gnomAD AF ≥ 1% (0.01) |
| Panel of Normals (PoN) Germline Filter | Present in ≥ 5 normal samples or PoN AF > 20% (0.20) |
| Exception Category | Criteria |
|---|---|
| Reference Database Hit Count | COSMIC count ≥10 OR TCGA pan-cancer count ≥5 OR ICGC PCAWG count ≥5 |
| ClinVar Pathogenicity | ClinVar classification of conflicting_interpretations_of_pathogenicity, likely_pathogenic, pathogenic, or uncertain_significance |
| Mutation Hotspots | Annotated as HMF_HOTSPOT, PCGR_MUTATION_HOTSPOT, or SAGE Hotspots (CGI, CIViC, OncoKB) |
| PCGR Tier Exception | Classified as TIER_1 OR TIER_2 |
The Report step utilizes the Personal Cancer Genome Reporter (PCGR) and other tools to generate comprehensive reports.
- Purple purity data
- Filtered VCF
${tumor_id}*filters_set.vcf.gz
- DRAGEN VCF
${tumor_id}.main.dragen.vcf.gz
- PCGR Cancer report
${tumor_id}.pcgr.grch38.html
- Generate BCFtools Statistics on the Input VCF:
- Run
bcftools statsto gather statistics on variant quality and distribution.
- Run
- Calculate Allele Frequency Distributions:
- Filter and normalize variants according to high-confidence regions.
- Extract allele frequency data from tumor samples.
- Produce both a global allele frequency summary and a subset of allele frequencies restricted to key cancer genes.
- Compare Variant Counts From Two Variant Sets (DRAGEN vs. BOLT):
- Count the total number and types of variants (SNPs, Indels, Others) passing filters in both the DRAGEN VCF and the Filtered BOLT VCF.
- Count Variants by Processing Stage.
- Parse Purity and Ploidy Information (Purple Data).
- Run PCGR (GRCh38 VEP 113 /
pcgr_ref_data.20250314) to generate the final report. If PCGR struggles with very large VCFs, tune chunking with--pcgr_variant_chunk_sizeto cap variants per batch.
After filtering, the pipeline converts the somatic VCF to MAF using vcf2maf (v1.6.22) for downstream tools that expect MAF format.
- MAF file for the tumour/normal pair
${tumor_id}.maf
The Somatic Structural Variants (SVs) pipeline identifies and annotates large-scale genomic alterations, including deletions, duplications, inversions, insertions, and translocations in tumor samples. Calls now come from eSVee (replacing GRIDSS/GRIPSS), but the downstream PURPLE/SnpEff/prioritisation steps remain unchanged.
- eSVee filtering:
- Refines the structural variant calls using read counts, panel-of-normals, known fusion hotspots, and repeat masker annotations.
- PURPLE:
- Combines the eSVee-filtered SV calls with copy number variation (CNV) data and tumor purity/ploidy estimates.
- Annotation:
- Combines SV calls with CNV data and annotates using SnpEff.
- Prioritization:
- Prioritizes SV annotations based on AstraZeneca-NGS using curated reference data.
- Report:
- Generates cancer report and MultiQC output.
- eSVee (GRIDSS/GRIPSS replacement)
${tumor_id}.esvee.somatic.vcf.gz
- eSVee filtering:
- Evaluate split-read and paired-end support; discard variants with low support.
- Apply panel-of-normals filtering to remove artifacts observed in normal samples.
- Retain variants overlapping known oncogenic fusion hotspots (using UMCCR-curated lists).
- Exclude variants in repetitive regions based on Repeat Masker annotations.
- PURPLE:
- Merge SV calls with CNV segmentation data.
- Estimate tumor purity and ploidy.
- Adjust SV breakpoints based on copy number transitions.
- Classify SVs as somatic or germline.
- Annotation:
- Compile SV and CNV information into a unified VCF file.
- Extend the VCF header with PURPLE-related INFO fields (e.g., PURPLE_baf, PURPLE_copyNumber).
- Convert CNV records from TSV format into VCF records with appropriate SVTYPE tags (e.g., 'DUP' for duplications, 'DEL' for deletions).
- Run SnpEff to annotate the unified VCF with functional information such as gene names, transcript effects, and coding consequences.
- Prioritization:
- Run the prioritization module (forked from the AstraZeneca simple_sv_annotation tool) using reference data files including known fusion pairs, known fusion 5′ and 3′ lists, key genes, and key tumor suppressor genes.
- Classify Variants:
- Structural Variants (SVs): Variants labeled with the source
sv_esvee. - Copy Number Variants (CNVs): Variants labeled with the source
cnv_purple.
- Structural Variants (SVs): Variants labeled with the source
- Prioritize variants on a 4-tier system using prioritize_sv:
- 1 (high) - 2 (moderate) - 3 (low) - 4 (no interest)
- Exon loss:
- On cancer gene list (1)
- Other (2)
- Gene fusion:
- Paired (hits two genes):
- On list of known pairs (1) (curated by HMF)
- One gene is a known promiscuous fusion gene (1) (curated by HMF)
- On list of FusionCatcher known pairs (2)
- Other:
- One or two genes on cancer gene list (2)
- Neither gene on cancer gene list (3)
- Unpaired (hits one gene):
- On cancer gene list (2)
- Others (3)
- Paired (hits two genes):
- Upstream or downstream: A specific type of fusion where one gene comes under the control of another gene's promoter, potentially leading to overexpression (oncogene) or underexpression (tumor suppressor gene):
- On cancer gene list genes (2)
- LoF or HIGH impact in a tumor suppressor:
- On cancer gene list (2)
- Other TS gene (3)
- Other (4)
- Filter Low-Quality Calls:
- Apply Quality Filters:
- Keep variants with sufficient read support (e.g., split reads (SR) ≥ 5 and paired reads (PR) ≥ 5).
- Exclude Tier 3 and Tier 4 variants where
SR < 5andPR < 5. - Exclude Tier 3 and Tier 4 variants where
SR < 10,PR < 10, and allele frequencies (AF0orAF1) are below 0.1.
- Apply Quality Filters:
- Report:
- Generate MultiQC and cancer report outputs.
Filtering Select passing variants in the given gene panel transcript regions made with PMCC familial cancer clinic list then make CPSR report.
- DRAGEN VCF
${normal_id}.hard-filtered.vcf.gz
- CPSR report
${normal_id}.cpsr.grch38.html
- Prepare:
- Selection of Passing Variants:
- Raw germline variant calls from DRAGEN are filtered to retain only those variants marked as PASS (or with no filter flag).
- Selection of Gene Panel Variants:
- The filtered variants are further restricted to regions defined by the gene panel transcript regions file, based on the PMCC familial cancer clinic list.
- Selection of Passing Variants:
- Report:
- Generate CPSR (Cancer Predisposition Sequencing Report) summarizing germline findings.
UMCCR cancer report containing:
- Data Source: filtered somatic VCF
- Tool: PURPLE
- Data Source: filtered somatic SNV VCF (Sigrap MutationalPatterns output)
- Tool: Sigrap (MutationalPatterns wrapper)
- Data Source: –
- Note: No dedicated contamination metric is currently generated
- Data Source: COBALT (providing read-depth ratios) and AMBER (providing B-allele frequency measurements)
- Tool: PURPLE, which uses these inputs to compute sample purity (percentage of tumor cells) and overall ploidy (average copy number)
- Data Source: optional DRAGEN HRD score (
${tumor_id}.hrdscore.csv), Sigrap HRDetect JSON, and oncoanalyser CHORD predictions - Tool: DRAGEN HRD, Sigrap HRDetect, and CHORD
- Data Source: Indels in microsatellite regions from SNV/CNV
- Tool: PURPLE
- Data Source: eSVee SV VCF and PURPLE CNV segmentation
- Tools: eSVee, PURPLE, and the AstraZeneca simple_sv_annotation prioritisation rules
- Data Source: PURPLE CNV outputs (segmentation files, gene-level CNV TSV)
- Tool: PURPLE
The LINX report includes the following:
- Tables of Variants:
- Breakends
- Links
- Driver Catalog
- Plots:
- Cluster-Level Plots
General Stats: Overview of QC metrics aggregated from all tools, providing high-level sample quality information.
DRAGEN: Mapping metrics (mapped reads, paired reads, duplicated alignments, secondary alignments), WGS coverage (average depth, cumulative coverage, per-contig coverage), fragment length distributions, trimming metrics, and time metrics for pipeline steps.
PURPLE: Sample QC status (PASS/FAIL), ploidy, tumor purity, polyclonality percentage, tumor mutational burden (TMB), microsatellite instability (MSI) status, and variant metrics for somatic and germline SNPs/indels.
BcfTools Stats: Variant substitution types, SNP and indel counts, quality scores, variant depth, and allele frequency metrics for both somatic and germline variants.
DRAGEN-FastQC: Per-base sequence quality, per-sequence quality scores, GC content (per-sequence and per-position), HRD score, sequence length distributions, adapter contamination, and sequence duplication levels.
Personal Cancer Genome Reporter (PCGR) tool generates a comprehensive, interactive HTML report that consolidates filtered and annotated variant data, providing detailed insights into the somatic variants identified.
Key Metrics:
- Variant Classification and Tier Distribution: PCGR categorizes variants into tiers based on their clinical and biological significance. The report details the proportion of variants across different tiers, indicating their potential clinical relevance.
- Mutational Signatures: The report includes analysis of mutational signatures, offering insights into the mutational processes active in the tumor.
- Copy Number Alterations (CNAs): Visual representations of CNAs are provided, highlighting significant gains and losses across the genome. Genome-wide plots display regions of copy number gains and losses.
- Tumor Mutational Burden (TMB): Calculations of TMB are included, which can have implications for immunotherapy eligibility. The report presents the TMB value, representing the number of mutations per megabase.
- Microsatellite Instability (MSI) Status: Assessment of MSI status is performed, relevant for certain cancer types and treatment decisions.
- Clinical Trials Information: Information on relevant clinical trials is incorporated, offering potential therapeutic options based on the identified variants.
Note: The PCGR tool is designed to process a maximum of 500,000 variants. If the input VCF file contains more than this limit, variants exceeding 500,000 will be filtered out.
The CPSR (Cancer Predisposition Sequencing Report) includes the following:
Settings:
- Sample metadata
- Report configuration
- Virtual gene panel
Summary of Findings:
- Variant statistics
Variant Classification:
ClinVar and Non-ClinVar variants:
- Class 5 - Pathogenic variants
- Class 4 - Likely Pathogenic variants
- Class 3 - Variants of Uncertain Significance (VUS)
- Class 2 - Likely Benign variants
- Class 1 - Benign variants
- Biomarkers
PCGR TIER according to ACMG:
- Tier 1 (High): Highest priority variants with strong clinical relevance.
- Tier 2 (Moderate): Variants with potential clinical significance.
- Tier 3 (Low): Variants with uncertain significance.
- Tier 4 (No Interest): Variants unlikely to be clinically relevant.
The sash workflow utilizes coverage metrics from DRAGEN to evaluate the sequencing quality and depth across target regions. Coverage analysis includes:
- Mean coverage across targeted genomic regions
- Percentage of target regions covered at various depth thresholds (10X, 20X, 50X, 100X)
- Coverage uniformity metrics
- Gap analysis for regions with insufficient coverage
These metrics are integrated into the MultiQC report, providing a comprehensive overview of sequencing quality and coverage.
Curated gene panels for specific analyses, including the germline cancer predisposition gene panel used in the Germline Small Variants workflow.
- Ensembl reference data (GRCh38)
- Somatic driver catalogs
- Known fusion gene pairs
- Driver gene panels
- gnomAD (v2.1): Provides population allele frequencies to help distinguish common variants from rare ones.
- ClinVar (20220103): Offers clinically curated variant information, aiding in the interpretation of potential pathogenicity.
- COSMIC: Contains data on somatic mutations found in cancer, facilitating the identification of cancer-related variants.
- Gene Panels: Focuses analysis on specific sets of genes relevant to particular conditions or research interests.
- SnpEff Databases: Used for predicting the effects of variants on genes and proteins.
- Panel of Normals (PON): Helps filter out technical artifacts by comparing against a set of normal samples.
- RepeatMasker: Identifies repetitive genomic regions to prevent false-positive variant calls.
Databases/datasets PCGR Reference Data:
- Version:
pcgr_ref_data.20250314.grch38.tgzwith GRCh38 VEP 113 cache (homo_sapiens_vep_113_GRCh38.tar.gz). Both archives are auto-extracted by thePREPARE_REFERENCEsubworkflow. - Contents include refreshed ClinVar, COSMIC, dbNSFP, gnomAD, OncoKB/CGI biomarker sets, and PCGR/CPSR configuration files aligned with PCGR v2.x.
- File:
smlv_somatic/filter/{tumor_id}.pass.vcf.gz - Description: Contains somatic single nucleotide variants (SNVs) with filtering applied (VCF format).
- File:
sv_somatic/prioritise/{tumor_id}.sv.prioritised.vcf.gz - Description: Contains somatic structural variants (SVs) with prioritization applied (VCF format).
- File:
cancer_report/cancer_report_tables/purple/{tumor_id}-purple_cnv_som.tsv.gz - Description: Contains somatic copy number variations (CNVs) data (TSV format).
- File:
cancer_report/cancer_report_tables/purple/{tumor_id}-purple_cnv_som_gene.tsv.gz - Description: Contains gene-level somatic copy number variations (CNVs) data (TSV format).
- File:
dragen_germline_output/{normal_id}.hard-filtered.vcf.gz - Description: Contains germline single nucleotide variants (SNVs) with hard filtering applied (VCF format).
- File:
purple/{tumor_id}.purple.purity.tsv - Description: Contains estimated tumor purity, ploidy, and microsatellite status (TSV format).
- File:
smlv_somatic/report/pcgr/{tumor_id}.pcgr.grch38.json.gz - Description: Contains PCGR annotations, including tumor mutational burden (TMB) (JSON format).
- File:
${tumor_id}.hrdscore.csv(fromdragen_somatic_dir) - Description: Optional DRAGEN homologous recombination deficiency (HRD) score propagated into the cancer report when provided.
- File:
sigrap/hrdetect/hrdetect.json.gz - Description: HRDetect JSON summarising HRD probability from combined SNV/SV/CNV signals.
- Directory:
sigrap/mutpat/ - Description: Mutational signature TSVs/plots (SBS/DBS/indels) generated by Sigrap’s MutationalPatterns wrapper.
- File:
vcf2maf/{tumor_id}.maf - Description: MAF representation of the filtered somatic VCF for downstream tools that prefer MAF input.
A: Rescue is performed by BOLT using SAGE hotspot calls layered onto the DRAGEN VCF. PCGR is only used later for reporting/annotation; it does not drive the rescue step.
Q: How are hypermutated samples handled in the current version, and is there any impact on derived metrics such as TMB or MSI?
A: In the current version of sash, hypermutated samples are identified based on a threshold of 500,000 total somatic variant counts. If the variant count exceeds this threshold, the sample is flagged as hypermutated. When this occurs, we will filter variants that: 1) don't have clinical impact, 2) aren't in hotspot regions, until we meet the threshold. This impacts the TMB and MSI calculations by PURPLE. Currently, we are using the TMB and MSI values from PURPLE in these edge cases. A future release will provide correct TMB and MSI calculations from PURPLE.
A: We filter on chromosomes 1-22 and chromosomes X, Y, M. All other non-standard chromosomes and contigs are filtered out.
Q: What inputs for the cancer reporter - have they changed (and what can we harmonize); e.g., where is the Circos plot from at this point?
A: Circos plots are generated by PURPLE.
Q: We dropped the CACAO coverage reports. Can we discuss how to utilize DRAGEN or HMFtools coverage information instead?
A: DRAGEN coverage metrics are now integrated into the MultiQC report, providing a comprehensive overview of sequencing quality and coverage across the genome. We are exploring further integration of HMFtools coverage analysis for future releases.
A: The cancer report surfaces the PURPLE-derived TMB; the PCGR HTML also reports its own TMB estimate for comparison.
A: Sigrap MutationalPatterns uses the filtered somatic VCF (post-rescue and filtering); its outputs are published under sigrap/mutpat/ and fed into the cancer report.
A: Currently, sash does not calculate a dedicated contamination metric. Tumor purity estimation from PURPLE serves as the primary indicator of sample quality.
A: SASH reuses the WiGiTS export to re-run eSVee with UMCCR reference data and panel-of-normals, then applies PURPLE, SnpEff and simple_sv_annotation. GRIDSS/GRIPSS are no longer used.
A: No, the somatic small variant workflow data is not used in the structural variant (SV) workflow. These are independent analyses that run in parallel.
