Skip to content

Worked Example

lunik1 edited this page Feb 20, 2018 · 2 revisions

Set Up

First, follow the install instructions given in the README. You will need the permissions necessary to install Python packages on your system.

An example configuration file and script to generate dummy data have been provided here. Generate the data by running

python generate.py

and move the resulting ROOT files to a separate directory.

Configuration

The configuration file is designed to perform a classification on the dummy data using a gradient boosted decision tree. The dummy data contains two signal processes labelled signal{1,2}, four background processes labelled background{1,2,3,4}, each with five features (observables) labelled f{1,2,3,4,5}.

The provided values in the configuration file do not need to be edited, with the exception of input_dir which should point to where the ROOT files were moved in the set up. Don't forget the trailing slash!

Running the tool

To perform the classification, run

tact path/to/config.yaml

The tool will initially print the event counts in each process, in the format unweighted (weighted). Next, the progress of the classification is recorded (if available). After this the classification report and confusion matrix are given for the test and training samples, followed by the two sample Kolmogorov-Smirnov test p-values in signal and background for the test and training samples. Finally, the feature importance is displayed, if available for the selected classifier.

Examining Output

Using the provided configuration, the tool's output will be saved into three directories: mva/, plots/, and root/.

mva/

This directory contains a .pkl file with the trained classifier.

plots/

This directory contains plots showing the ROC curve roc_all.pdf, the distribution of the features in signal and background vars_all.pdf, the response of the classifier in signal and background for the test and training sample response_all.pdf, and the correlation matrices for the features corr_*.pdf.

root/

This directory contains ROOT files with TH1s of the classifier response for every Ttree in the input files. These are formatted such that they can be passed to the Higgs Analysis Combined Limit tool (or THETA, if specified).

Clone this wiki locally