Clinical-ComBAT

Reference package for ComBAT harmonization of clinical MRI data. It ships the ComBAT implementations for adapting clinical sites to a reference site along with ready-to-run scripts to prepare datasets, fit a model, apply the harmonization and analyze the outputs. While Clinical-ComBAT was designed and tested for the harmonization of diffusion MRI metrics (like fractional anisotropy, mean diffusivity, apparent fiber density) it can also be used on other type of data like volumetric data.

References

Girard, G., Edde, M., Dumais, F., et al. (2025). Clinical-ComBAT: a diffusion MRI harmonization method for clinical normative modeling applications. Submitted to Medical Image Analysis.
Jodoin, P.-M., Edde, M., Girard, G., et al. (2025). Challenges and best practices when using ComBAT to harmonize diffusion MRI data. Nature Scientific Reports, 15, 41508. https://www.nature.com/articles/s41598-025-25400-x
Fortin, J.-P., Parker, D., Tun¸c, B., et al. (2017). Harmonization of multi-site diffusion tensor imaging data. NeuroImage, 161, 149–170. https://doi.org/10.1016/j.neuroimage.2017.08.047

Licence

Shield:

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

You can copy, redistribute, and adapt this work, but only for non-commercial purposes. You must give the original creator credit (Attribution), and if you adapt the work, your new version must be shared under the same or a compatible license (ShareAlike). The work cannot be used for commercial gain, meaning for activities primarily intended to generate money. In such case, please contact Pierre-Marc Jodoin.

Quick installation

⚠️ We highly suggest to install uv to speedup clinical_combat installation: https://docs.astral.sh/uv/getting-started/installation/

☝️ BUT, if you don't want to use uv, clinical_combat can still be installed by omitting the uv from all the installation command lines below.

Make sure your pip is up-to-date before trying to install:

uv pip install --upgrade pip

Then

# 1) create a Python >= 3.9 environment
python -m venv .venv
source .venv/bin/activate

# 2) install clinical_combat
uv pip install -e .

The toolbox mainly depends on numpy, pandas, matplotlib, and seaborn. All scripts accept compressed or uncompressed CSV files.

Project layout

Folder / file	Description
`src/`	Python package (harmonization, utilities, visualization).
`src/clinical_combat/cli/`	Production-ready scripts to fit, apply, and visualize ComBAT.
`src/clinical_combat/cli/tests/`	Automated checks (for example `pytest scripts/tests/test_combat_pipeline.py`).
`src/clinical_combat/cli_dev/`	Research helpers and additional plotting utilities (optional for end-users).
`src/clinical_combat/data/`	Example datasets and sample figures.
`pyproject.toml`	Package configuration file.

Expected data format

Scripts expect CSV files containing at least the columns below:

sid,site,bundle,metric,mean,age,sex,handedness,disease

sid: subject identifier
site: site name (string)
bundle: bundle or region name
metric: diffusion metric (for example md, fa)
mean: numeric value per bundle (mean, median, etc.)
age, sex, handedness: covariates
- use integer values (1 or 2) for sex and handedness; when a covariate is unknown, add the column filled with 1and the scripts will disable that effect automatically
disease acts as a flag; any row whose value is not HC is dropped before fitting the model

src/clinical_combat/data/ contains fully fledged examples (CamCAN.md.raw.csv.gz and ModifiedCamCAN.md.raw.csv.gz) illustrating the column layout distribution.

Choosing a ComBAT variant

The code supports two harmonization modes, namely clinical and pairwise. In both cases, the procedure harmonizes data from a moving site onto a reference site.

Method	Description
`clinical` (default)	Harmonizes a moving site to a normative reference following the Clinical-ComBAT method (Girard et al., 2025). It fits site-specific polynomial covariate models, anchors variance with Bayesian priors suited to small cohorts, and auto-tunes the hyperparameters to keep the harmonized metrics consistent with the reference population.
`pairwise`	Adaptation of the original ComBAT (Fortin et al., 2017) that still fits both sites together but explicitly anchors the harmonization to a chosen reference site. For more details, see Jodoin et al. (2025), ComBAT Harmonization for Diffusion MRI: Challenges and Best Practices (Nature Scientific Reports:41508).

Common options for both methods:

age filtering (--limit_age_range)
covariate selection (--ignore_sex, --ignore_handedness)
age effect polynomial order (--degree)

Easy start

Run the bundled example once to check your setup:

# From the project root
combat_pipeline \
    src/clinical_combat/data/CamCAN.md.raw.csv.gz \
    src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \
    --method clinical \
    --out_dir quickstart_demo/

This produces:

a fitted model (quickstart_demo/ModifiedCamCAN-CamCAN.md.clinical.model.csv)
harmonized data (quickstart_demo/ModifiedCamCAN.md.clinical.csv.gz)
QC metrics and figures inside quickstart_demo/

Main scripts

Combined workflow

combat_pipeline runs the full pipeline in sequence (fit → apply → QC → figures) and logs each spawned command.

ref_data (required): reference-site CSV (*.raw.csv[.gz]).
mov_data (required): moving-site CSV to harmonize.
--method {clinical,pairwise} (default clinical): harmonization strategy.
--degree (default 2 for clinical, 1 for pairwise when omitted): polynomial degree for age.
--limit_age_range (default disabled): drop reference subjects outside the moving age range.
--ignore_sex (default disabled): remove sex from the covariate model.
--ignore_handedness (default disabled): remove handedness from the model.
--no_empirical_bayes (default disabled): skip empirical Bayes estimation.
--robust (default disabled, not implemented): placeholder for robust mode.
--regul_ref (clinical only, default 0): ridge penalty applied to reference regression.
--regul_mov (clinical only, default -1; pairwise falls back to 0): moving-site penalty or auto-tuning.
--nu (clinical only, default 5): variance hyperparameter for the moving site.
--tau (clinical only, default 2): covariate hyperparameter for the moving site.
--bundles (default mni_IIT_mask_skeletonFA in plots): bundle subset for figures (all for every bundle).
--degree_qc (default 0): QC model degree override (0 reuses the harmonization degree).
--out_dir (default ./): root directory for models, results, and figures.
--output_model_filename (default auto-generated): custom name for the saved model.
--output_results_filename (default auto-generated): custom name for the harmonized CSV.
--verbose/-v (default WARNING): logging verbosity (INFO with -v, DEBUG with -v DEBUG).
--overwrite/-f (default disabled): allow overwriting existing files.

Example:

combat_pipeline src/clinical_combat/data/CamCAN.md.raw.csv.gz \
    src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \
    --method clinical \
    --out_dir results/clinical_pipeline/

Model fitting

combat_fit estimates harmonization parameters and writes a *.model.csv.

ref_data (required): reference-site CSV.
mov_data (required): moving-site CSV.
--method {clinical,pairwise} (default clinical): harmonization variant.
--degree (default 2 for clinical, 1 for pairwise when omitted): polynomial age order.
--limit_age_range (default disabled): match reference ages to the moving-site range.
--ignore_sex (default disabled): drop sex from the design matrix.
--ignore_handedness (default disabled): drop handedness from the design matrix.
--no_empirical_bayes (default disabled): rely on classical estimates for alpha/sigma.
--ignore_bundles (default left_ventricle right_ventricle): bundles removed prior to fitting.
--regul_ref (clinical only, default 0): ridge penalty on the reference regression.
--regul_mov (clinical only, default -1; pairwise falls back to 0): moving-site penalty or auto-tuning.
--nu (clinical only, default 5): variance hyperparameter for the moving site.
--tau (clinical only, default 2): covariate hyperparameter for the moving site.
--out_dir (default ./): directory for the generated model.
--output_model_filename (default auto-generated): custom model filename.
--verbose/-v (default WARNING): logging verbosity.
--overwrite/-f (default disabled): authorize overwriting existing files.

Example:

combat_fit src/clinical_combat/data/CamCAN.md.raw.csv.gz \
    src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \
    --method pairwise \
    --out_dir models/pairwise/

Model application

combat_apply consumes a moving-site CSV and a saved *.model.csv, then produces harmonized measurements (site.metric.method.csv.gz by default).

mov_data (required): moving-site CSV to transform.
model (required): harmonization model generated by combat_fit or combat_pipeline.
--out_dir (default ./): directory for the harmonized output.
--output_results_filename (default auto-generated): custom output filename.
--verbose/-v (default WARNING): logging level.
--overwrite/-f (default disabled): allow overwriting.

Example:

combat_apply src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \
    models/pairwise/ModifiedCamCAN-CamCAN.md.pairwise.model.csv \
    --out_dir harmonized/pairwise/

Evaluation and quality control (QC) to assess the alignment of the harmonized population.

combat_QC: reports Bhattacharyya distances between reference and moving datasets.
- ref_data (required): reference-site CSV (HC subjects only are used).
- mov_data (required): moving-site CSV.
- model (required): harmonization model (*.model.csv).
- --degree_qc (default 0): QC polynomial degree (0 reuses the model degree).
- --ignore_bundles (default left_ventricle right_ventricle): bundles to drop.
- --print_only (default disabled): skip writing the distance file.
- --out_dir (default ./): directory for QC outputs.
- --output_results_filename (default auto-generated): custom QC filename.
- --verbose/-v (default WARNING), --overwrite/-f (default disabled).
- Example:
```
combat_QC src/clinical_combat/data/CamCAN.md.raw.csv.gz \
    src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \
    models/pairwise/ModifiedCamCAN-CamCAN.md.pairwise.model.csv \
    --out_dir qc_reports/
```

Visualization

Common helper flags: each script accepts -v/--verbose (default WARNING) and -f/--overwrite (default disabled).

combat_visualize_data: scatterplots for raw or harmonized datasets.
- in_files (required, one or more): CSV files to display (reference first for legend clarity).
- --bundles (default mni_IIT_mask_skeletonFA; use all for everything): bundles drawn.
- --display_marginal_hist (default disabled): add marginal histograms.
- --hide_disease (default disabled): remove non-HC subjects.
- --out_dir (default ./), --outname (default none), --add_suffix (default none): figure export controls.
- --fixed_ylim (default auto): clamp Y axis to provided [min max].
- --xlim (default 20 90): X-axis age range.
- --no_background (default disabled): export without background styling.
- Example:
```
combat_visualize_data src/clinical_combat/data/CamCAN.md.raw.csv.gz \
    src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \
    harmonized/pairwise/ModifiedCamCAN.md.pairwise.harmonized.csv.gz \
    --bundles mni_AF_L mni_AF_R \
    --out_dir figures/data/
```
combat_visualize_model: overlays regression models with data.
- in_reference (required): reference raw CSV.
- in_moving (required): moving raw CSV.
- in_model (required): harmonization model CSV.
- --bundles (default mni_IIT_mask_skeletonFA; all for everything).
- --hide_disease (default disabled): remove non-HC rows.
- --display_marginal_hist (default disabled): add marginal histograms.
- --out_dir (default ./), --outname (default none), --add_suffix (default none).
- --fixed_color (default palette-driven): manually set reference/moving colors.
- --lightness (default 1.0): scale palette brightness.
- --only_models (default disabled): hide scatter data and show regression lines only.
- --line_width (default 2.5): width of regression lines.
- --fixed_ylim (default auto) and --xlim (default 20 90): axis limits.
- --no_background (default disabled): export without background styling.
- Example:
```
combat_visualize_model src/clinical_combat/data/CamCAN.md.raw.csv.gz \
    src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \
    models/pairwise/ModifiedCamCAN-CamCAN.md.pairwise.model.csv \
    --out_dir figures/model/ \
    --only_models
```
combat_visualize_harmonization: age curves before/after harmonization.
- in_reference (required): reference raw CSV.
- in_movings (required, two or more): moving raw CSV plus harmonized CSV (order matters).
- --out_dir (default ./), --outname (default none), --add_suffix (default none).
- --bundles (default mni_IIT_mask_skeletonFA; all allowed), --ages (default 20 90).
- --sexes, --handednesses, --diseases (default all values present): cohort filters.
- --hide_disease (default disabled): remove disease rows entirely.
- --display_point (default disabled): scatter representation for moving site.
- --display_marginal_hist (default disabled): add marginal histograms.
- --hide_percentiles (default disabled): swap percentile bands for SD bands.
- --window_size (default 20), --window_count (default 10), --no_dynamic_window (default disabled): sliding window controls.
- --min_subject_per_site (default 10): minimum subjects per site retained.
- --randomize_line (default disabled) or --line_style (default dashed): adjust moving-line style.
- --increase_ylim (default 5): percentage padding on the Y axis when not fixed.
- --fixed_ylim (default auto): clamp Y axis to specified bounds.
- --y_axis_percentile (default 1 99): percentile range used for automatic Y limits.
- --percentiles (default 5 25 50 75 95): percentile bands drawn.
- --line_widths (default 0.25 1 2 1 0.25): line widths for percentile envelopes.
- --display_errors (default disabled) & --error_metric {uncertainty,bounds} (default uncertainty): plot error bars for single-subject harmonization outputs.
- Example:
```
combat_visualize_harmonization src/clinical_combat/data/CamCAN.md.raw.csv.gz \
    src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \
    harmonized/pairwise/ModifiedCamCAN.md.pairwise.harmonized.csv.gz \
    --bundles all \
    --out_dir figures/harmonization/
```

Dataset inspection

combat_info: prints population statistics for a single CSV.
- in_file (required): dataset summarised. No optional switches.
- Example:
```
combat_info src/clinical_combat/data/CamCAN.md.raw.csv.gz
```

Typical pipeline

Inspect the datasets

combat_info src/clinical_combat/data/CamCAN.md.raw.csv.gz

Fit a harmonization model

combat_fit \
    src/clinical_combat/data/CamCAN.md.raw.csv.gz \
    src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \
    --method clinical \
    --out_dir out/models/

Apply the harmonization

combat_apply \
    src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \
    out/models/ModifiedCamCAN-CamCAN.md.clinical.model.csv \
    --out_dir out/harmonized/

Quality control

combat_QC \
    src/clinical_combat/data/CamCAN.md.raw.csv.gz \
    src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \
    out/models/ModifiedCamCAN-CamCAN.md.clinical.model.csv

Visualize the results

combat_visualize_harmonization \
    src/clinical_combat/data/CamCAN.md.raw.csv.gz \
    src/clinical_combat/data/ModifiedCamCAN.md.raw.csv.gz \
    out/harmonized/ModifiedCamCAN.md.clinical.csv.gz \
    --out_dir out/figures/

combat_pipeline can execute steps 2 through 5 in sequence and logs each invoked command.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
src/clinical_combat		src/clinical_combat
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clinical-ComBAT

References

Licence

Quick installation

Project layout

Expected data format

Choosing a ComBAT variant

Easy start

Main scripts

Combined workflow

Model fitting

Model application

Evaluation and quality control (QC) to assess the alignment of the harmonized population.

Visualization

Dataset inspection

Typical pipeline

About

Uh oh!

Releases 3

Packages

Contributors 7

Uh oh!

Languages

License

scil-vital/clinical-ComBAT

Folders and files

Latest commit

History

Repository files navigation

Clinical-ComBAT

References

Licence

Quick installation

Project layout

Expected data format

Choosing a ComBAT variant

Easy start

Main scripts

Combined workflow

Model fitting

Model application

Evaluation and quality control (QC) to assess the alignment of the harmonized population.

Visualization

Dataset inspection

Typical pipeline

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 7

Uh oh!

Languages

Packages