GitHub - bourcierj/higgsml: A solution to the Higgs boson machine learning challenge

#masterdac2019 #reds #projet_higgsml

This repository contains our solution to the Higgs Boson machine learning challenge held on Kaggle in 2014.

The data is simulated data with features characterizing events of particle decays detected by the ATLAS experiment. The task is to classify the events into two classes, specifically "tau tau decay of a Higgs boson" events versus "background" events. The ultimate goal is to improve the discovery significance of the experiment. The evaluation metric is the approximate median significance (AMS), a function of a statistical test.

We used tree-based boosting and bagging techniques, specifically XGBOOST and random forest algorithm.

We applied an hyperparameters search procedure using the tree Parzen estimator algorithm with cross-validation. Additionally the threshold used to round probability predictions to one of the two classes was also tuned as an hyperparameter to maximize the AMS.

The CERN data can be downloaded at this link.

Results

Cross-validation best results	XGBOOST	Random forest
Mean AMS	3.6011	3.5434
Variance AMS	0.0023	0.0105
Mean threshold	0.8541	0.8204
Variance threshold	0.0024	0.0009
Best trial	92	43

AMS scores by dataset	XGBOOST	Random forest
Train	4.1752	4.8762
Test (private leaderboard)	3.4904	3.5274
Public leaderboard	3.3889	3.4102

Fig. 1: XGBOOST AMS scores vs threshold for each of the cross-validation folds used at the best trial (the trial with best mean max of curves).

Fig. 2: Random forest AMS scores vs threshold for each of the cross-validation folds used at the best trial.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
figures		figures
.gitignore		.gitignore
HiggsBosonCompetition_AMSMetric_rev1.py		HiggsBosonCompetition_AMSMetric_rev1.py
README.md		README.md
adaboost_hyperopt.py		adaboost_hyperopt.py
adaboost_inference.py		adaboost_inference.py
baseline.py		baseline.py
data_exploration.html		data_exploration.html
data_exploration.ipynb		data_exploration.ipynb
extratrees_hyperopt.py		extratrees_hyperopt.py
extratrees_inference.py		extratrees_inference.py
extratrees_submission.py		extratrees_submission.py
plot_utils.py		plot_utils.py
randomforest_hyperopt.py		randomforest_hyperopt.py
randomforest_inference.py		randomforest_inference.py
randomforest_submission.py		randomforest_submission.py
requirements.txt		requirements.txt
utils.py		utils.py
xgboost_default_hparams.py		xgboost_default_hparams.py
xgboost_hyperopt.py		xgboost_hyperopt.py
xgboost_inference.py		xgboost_inference.py
xgboost_submission.py		xgboost_submission.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Results

About

Releases

Packages

Languages

bourcierj/higgsml

Folders and files

Latest commit

History

Repository files navigation

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages