Reference package for ComBAT harmonization of clinical MRI data. It ships the ComBAT implementations for adapting clinical sites to a reference site along with ready-to-run scripts to prepare datasets, fit a model, apply the harmonization and analyze the outputs. While Clinical-ComBAT was designed and tested for the harmonization of diffusion MRI metrics (like fractional anisotropy, mean diffusivity, apparent fiber density) it can also be used on other type of data like volumetric data.
- Girard, G., Edde, M., Dumais, F., et al. (2025). Clinical-ComBAT: a diffusion MRI harmonization method for clinical normative modeling applications. Submitted to Medical Image Analysis.
- Jodoin, P.-M., Edde, M., Girard, G., et al. (2025). Challenges and best practices when using ComBAT to harmonize diffusion MRI data. Nature Scientific Reports, 15, 41508. https://www.nature.com/articles/s41598-025-25400-x
- Fortin, J.-P., Parker, D., Tun¸c, B., et al. (2017). Harmonization of multi-site diffusion tensor imaging data. NeuroImage, 161, 149–170. https://doi.org/10.1016/j.neuroimage.2017.08.047
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
You can copy, redistribute, and adapt this work, but only for non-commercial purposes. You must give the original creator credit (Attribution), and if you adapt the work, your new version must be shared under the same or a compatible license (ShareAlike). The work cannot be used for commercial gain, meaning for activities primarily intended to generate money. In such case, please contact Pierre-Marc Jodoin.
☝️ BUT, if you don't want to use uv, clinical_combat can still be installed by omitting the uv from all the installation command lines below.
Make sure your pip is up-to-date before trying to install:
uv pip install --upgrade pip
Then
# 1) create a Python >= 3.9 environment
python -m venv .venv
source .venv/bin/activate
# 2) install clinical_combat
uv pip install -e .The toolbox mainly depends on numpy, pandas, matplotlib, and seaborn.
All scripts accept compressed or uncompressed CSV files.
| Folder / file | Description |
|---|---|
src/ |
Python package (harmonization, utilities, visualization). |
src/clinical_combat/cli/ |
Production-ready scripts to fit, apply, and visualize ComBAT. |
src/clinical_combat/cli/tests/ |
Automated checks (for example pytest scripts/tests/test_combat_pipeline.py). |
src/clinical_combat/cli_dev/ |
Research helpers and additional plotting utilities (optional for end-users). |
src/clinical_combat/data/ |
Example datasets and sample figures. |
pyproject.toml |
Package configuration file. |
Scripts expect CSV files containing at least the columns below:
sid,site,bundle,metric,mean,age,sex,handedness,disease
sid: subject identifiersite: site name (string)bundle: bundle or region namemetric: diffusion metric (for examplemd,fa)mean: numeric value per bundle (mean, median, etc.)age,sex,handedness: covariates- use integer values (1 or 2) for
sexandhandedness; when a covariate is unknown, add the column filled with1and the scripts will disable that effect automatically
- use integer values (1 or 2) for
diseaseacts as a flag; any row whose value is notHCis dropped before fitting the model
src/clinical_combat/data/ contains fully fledged examples (CamCAN.md.raw.csv.gz and
ModifiedCamCAN.md.raw.csv.gz) illustrating the column layout
distribution.
The code supports two harmonization modes, namely clinical and pairwise. In both cases, the procedure harmonizes data from a moving site onto a reference site.
| Method | Description |
|---|---|
clinical (default) |
Harmonizes a moving site to a normative reference following the Clinical-ComBAT method (Girard et al., 2025). It fits site-specific polynomial covariate models, anchors variance with Bayesian priors suited to small cohorts, and auto-tunes the hyperparameters to keep the harmonized metrics consistent with the reference population. |
pairwise |
Adaptation of the original ComBAT (Fortin et al., 2017) that still fits both sites together but explicitly anchors the harmonization to a chosen reference site. For more details, see Jodoin et al. (2025), ComBAT Harmonization for Diffusion MRI: Challenges and Best Practices (Nature Scientific Reports:41508). |
Common options for both methods:
- age filtering (
--limit_age_range) - covariate selection (
--ignore_sex,--ignore_handedness) - age effect polynomial order (
--degree)
Run the bundled example once to check your setup:
# From the project root
combat_pipeline \
src/clinical_combat/data/CamCAN.md.raw.csv.gz \
src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \
--method clinical \
--out_dir quickstart_demo/This produces:
- a fitted model (
quickstart_demo/ModifiedCamCAN-CamCAN.md.clinical.model.csv) - harmonized data (
quickstart_demo/ModifiedCamCAN.md.clinical.csv.gz) - QC metrics and figures inside
quickstart_demo/
combat_pipeline runs the full pipeline in sequence (fit → apply → QC → figures) and logs
each spawned command.
ref_data(required): reference-site CSV (*.raw.csv[.gz]).mov_data(required): moving-site CSV to harmonize.--method {clinical,pairwise}(defaultclinical): harmonization strategy.--degree(default 2 for clinical, 1 for pairwise when omitted): polynomial degree for age.--limit_age_range(default disabled): drop reference subjects outside the moving age range.--ignore_sex(default disabled): remove sex from the covariate model.--ignore_handedness(default disabled): remove handedness from the model.--no_empirical_bayes(default disabled): skip empirical Bayes estimation.--robust(default disabled, not implemented): placeholder for robust mode.--regul_ref(clinical only, default 0): ridge penalty applied to reference regression.--regul_mov(clinical only, default -1; pairwise falls back to 0): moving-site penalty or auto-tuning.--nu(clinical only, default 5): variance hyperparameter for the moving site.--tau(clinical only, default 2): covariate hyperparameter for the moving site.--bundles(defaultmni_IIT_mask_skeletonFAin plots): bundle subset for figures (allfor every bundle).--degree_qc(default 0): QC model degree override (0 reuses the harmonization degree).--out_dir(default./): root directory for models, results, and figures.--output_model_filename(default auto-generated): custom name for the saved model.--output_results_filename(default auto-generated): custom name for the harmonized CSV.--verbose/-v(defaultWARNING): logging verbosity (INFOwith-v,DEBUGwith-v DEBUG).--overwrite/-f(default disabled): allow overwriting existing files.
Example:
combat_pipeline src/clinical_combat/data/CamCAN.md.raw.csv.gz \
src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \
--method clinical \
--out_dir results/clinical_pipeline/combat_fit estimates harmonization parameters and writes a *.model.csv.
ref_data(required): reference-site CSV.mov_data(required): moving-site CSV.--method {clinical,pairwise}(defaultclinical): harmonization variant.--degree(default 2 for clinical, 1 for pairwise when omitted): polynomial age order.--limit_age_range(default disabled): match reference ages to the moving-site range.--ignore_sex(default disabled): drop sex from the design matrix.--ignore_handedness(default disabled): drop handedness from the design matrix.--no_empirical_bayes(default disabled): rely on classical estimates for alpha/sigma.--ignore_bundles(defaultleft_ventricle right_ventricle): bundles removed prior to fitting.--regul_ref(clinical only, default 0): ridge penalty on the reference regression.--regul_mov(clinical only, default -1; pairwise falls back to 0): moving-site penalty or auto-tuning.--nu(clinical only, default 5): variance hyperparameter for the moving site.--tau(clinical only, default 2): covariate hyperparameter for the moving site.--out_dir(default./): directory for the generated model.--output_model_filename(default auto-generated): custom model filename.--verbose/-v(defaultWARNING): logging verbosity.--overwrite/-f(default disabled): authorize overwriting existing files.
Example:
combat_fit src/clinical_combat/data/CamCAN.md.raw.csv.gz \
src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \
--method pairwise \
--out_dir models/pairwise/combat_apply consumes a moving-site CSV and a saved *.model.csv, then produces
harmonized measurements (site.metric.method.csv.gz by default).
mov_data(required): moving-site CSV to transform.model(required): harmonization model generated bycombat_fitorcombat_pipeline.--out_dir(default./): directory for the harmonized output.--output_results_filename(default auto-generated): custom output filename.--verbose/-v(defaultWARNING): logging level.--overwrite/-f(default disabled): allow overwriting.
Example:
combat_apply src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \
models/pairwise/ModifiedCamCAN-CamCAN.md.pairwise.model.csv \
--out_dir harmonized/pairwise/combat_QC: reports Bhattacharyya distances between reference and moving datasets.ref_data(required): reference-site CSV (HC subjects only are used).mov_data(required): moving-site CSV.model(required): harmonization model (*.model.csv).--degree_qc(default 0): QC polynomial degree (0 reuses the model degree).--ignore_bundles(defaultleft_ventricle right_ventricle): bundles to drop.--print_only(default disabled): skip writing the distance file.--out_dir(default./): directory for QC outputs.--output_results_filename(default auto-generated): custom QC filename.--verbose/-v(defaultWARNING),--overwrite/-f(default disabled).- Example:
combat_QC src/clinical_combat/data/CamCAN.md.raw.csv.gz \ src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \ models/pairwise/ModifiedCamCAN-CamCAN.md.pairwise.model.csv \ --out_dir qc_reports/
Common helper flags: each script accepts -v/--verbose (default WARNING) and -f/--overwrite
(default disabled).
-
combat_visualize_data: scatterplots for raw or harmonized datasets.in_files(required, one or more): CSV files to display (reference first for legend clarity).--bundles(defaultmni_IIT_mask_skeletonFA; useallfor everything): bundles drawn.--display_marginal_hist(default disabled): add marginal histograms.--hide_disease(default disabled): remove non-HC subjects.--out_dir(default./),--outname(default none),--add_suffix(default none): figure export controls.--fixed_ylim(default auto): clamp Y axis to provided[min max].--xlim(default20 90): X-axis age range.--no_background(default disabled): export without background styling.- Example:
combat_visualize_data src/clinical_combat/data/CamCAN.md.raw.csv.gz \ src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \ harmonized/pairwise/ModifiedCamCAN.md.pairwise.harmonized.csv.gz \ --bundles mni_AF_L mni_AF_R \ --out_dir figures/data/
-
combat_visualize_model: overlays regression models with data.in_reference(required): reference raw CSV.in_moving(required): moving raw CSV.in_model(required): harmonization model CSV.--bundles(defaultmni_IIT_mask_skeletonFA;allfor everything).--hide_disease(default disabled): remove non-HC rows.--display_marginal_hist(default disabled): add marginal histograms.--out_dir(default./),--outname(default none),--add_suffix(default none).--fixed_color(default palette-driven): manually set reference/moving colors.--lightness(default 1.0): scale palette brightness.--only_models(default disabled): hide scatter data and show regression lines only.--line_width(default 2.5): width of regression lines.--fixed_ylim(default auto) and--xlim(default20 90): axis limits.--no_background(default disabled): export without background styling.- Example:
combat_visualize_model src/clinical_combat/data/CamCAN.md.raw.csv.gz \ src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \ models/pairwise/ModifiedCamCAN-CamCAN.md.pairwise.model.csv \ --out_dir figures/model/ \ --only_models
-
combat_visualize_harmonization: age curves before/after harmonization.in_reference(required): reference raw CSV.in_movings(required, two or more): moving raw CSV plus harmonized CSV (order matters).--out_dir(default./),--outname(default none),--add_suffix(default none).--bundles(defaultmni_IIT_mask_skeletonFA;allallowed),--ages(default20 90).--sexes,--handednesses,--diseases(default all values present): cohort filters.--hide_disease(default disabled): remove disease rows entirely.--display_point(default disabled): scatter representation for moving site.--display_marginal_hist(default disabled): add marginal histograms.--hide_percentiles(default disabled): swap percentile bands for SD bands.--window_size(default 20),--window_count(default 10),--no_dynamic_window(default disabled): sliding window controls.--min_subject_per_site(default 10): minimum subjects per site retained.--randomize_line(default disabled) or--line_style(default dashed): adjust moving-line style.--increase_ylim(default 5): percentage padding on the Y axis when not fixed.--fixed_ylim(default auto): clamp Y axis to specified bounds.--y_axis_percentile(default1 99): percentile range used for automatic Y limits.--percentiles(default5 25 50 75 95): percentile bands drawn.--line_widths(default0.25 1 2 1 0.25): line widths for percentile envelopes.--display_errors(default disabled) &--error_metric {uncertainty,bounds}(defaultuncertainty): plot error bars for single-subject harmonization outputs.- Example:
combat_visualize_harmonization src/clinical_combat/data/CamCAN.md.raw.csv.gz \ src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \ harmonized/pairwise/ModifiedCamCAN.md.pairwise.harmonized.csv.gz \ --bundles all \ --out_dir figures/harmonization/
combat_info: prints population statistics for a single CSV.in_file(required): dataset summarised. No optional switches.- Example:
combat_info src/clinical_combat/data/CamCAN.md.raw.csv.gz
- Inspect the datasets
combat_info src/clinical_combat/data/CamCAN.md.raw.csv.gz
- Fit a harmonization model
combat_fit \ src/clinical_combat/data/CamCAN.md.raw.csv.gz \ src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \ --method clinical \ --out_dir out/models/ - Apply the harmonization
combat_apply \ src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \ out/models/ModifiedCamCAN-CamCAN.md.clinical.model.csv \ --out_dir out/harmonized/ - Quality control
combat_QC \ src/clinical_combat/data/CamCAN.md.raw.csv.gz \ src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \ out/models/ModifiedCamCAN-CamCAN.md.clinical.model.csv - Visualize the results
combat_visualize_harmonization \ src/clinical_combat/data/CamCAN.md.raw.csv.gz \ src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \ out/harmonized/ModifiedCamCAN.md.clinical.csv.gz \ --out_dir out/figures/
combat_pipeline can execute steps 2 through 5 in sequence and logs each
invoked command.
