-
Notifications
You must be signed in to change notification settings - Fork 2
Worked Example
First, follow the install instructions given in the README. You will need the permissions necessary to install Python packages on your system.
An example configuration file and script to generate dummy data have been provided here. Generate the data by running
python generate.py
and move the resulting ROOT files to a separate directory.
The configuration file is designed to perform a classification on the dummy data using a gradient boosted decision tree. The dummy data contains two signal processes labelled signal{1,2}, four background processes labelled background{1,2,3,4}, each with five features (observables) labelled f{1,2,3,4,5}.
The provided values in the configuration file do not need to be edited, with the exception of input_dir
which should point to where the ROOT files were moved in the set up. Don't forget the trailing slash!
To perform the classification, run
tact path/to/config.yaml
The tool will initially print the event counts in each process, in the format unweighted (weighted)
. Next, the progress of the classification is recorded (if available). After this the classification report and confusion matrix are given for the test and training samples, followed by the two sample Kolmogorov-Smirnov test p-values in signal and background for the test and training samples. Finally, the feature importance is displayed, if available for the selected classifier.
Using the provided configuration, the tool's output will be saved into three directories: mva/
, plots/
, and root/
.
This directory contains a .pkl
file with the trained classifier.
This directory contains plots showing the ROC curve roc_all.pdf
, the distribution of the features in signal and background vars_all.pdf
, the response of the classifier in signal and background for the test and training sample response_all.pdf
, and the correlation matrices for the features corr_*.pdf
.
This directory contains ROOT files with TH1s of the classifier response for every Ttree in the input files. These are formatted such that they can be passed to the Higgs Analysis Combined Limit tool (or THETA, if specified).