Hackaton Kaggle competition of Statistical Learning course, University La Sapienza, Rome (IT).
The competitons lasted 48 hours and saw 15 teams of students from Scienze Statistiche, Data Science and Statistical Methods for Data science degree courses compete against each other.
Briefly speaking the task assigned during this hackathon was to correctly classify three alert levels (target variable) in the high-frequency work of compressors located inside the particle accelerator (LHC) at CERN,Geneva. The dataset shows a very strong imbalance in the target alert level 3, which is the most important (and difficult) to classify. The data are presented in the form of time series, informed by time period of detection of the compressors and the location in which it is located within the accelerator.
Our team developed, in the light of the data handled an adequate and informative feature extraction by transforming the time series with a Fourier transform. The magnitude peaks extracted from the transformations will represent our most informative features, taking into account especially those detected at low compressor frequencies, as well as one versus all correlations and autocorrelations. and autocorrelations.
![]() |
![]() |
![]() |
After over and undersampling we implement a logistic regression with selected best parameters in cross validation. Note that we implelemented a model agnostic permutations feature importance in order to filter the most importart variables and reduce the dimensionality of our data.
The strong imbalance of input data made this hackathon a very compelling challenge. We can state that the low frequencies of the power spectrum extracted are very informative for the purpose of alert detection, however in order to improve our models we could try to eliminate the oversampling procedure and work better on the logistics regression weights. Please notice that this work has been done in 48 hours of hackaton.