Skip to content

A machine learning model that predicts who survived the Titan disaster

License

Notifications You must be signed in to change notification settings

Cucchi01/titanic-kaggle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Titanic - Machine Learning from Disaster- Kaggle competition

This is a Kaggle competition based on the data available from the Titanic disaster. The aim is to predict from the information who survived. The data regarding the passengers are partitioned in two. In the data folder, there are some files that are provided by the competition and some that are generated by the code in the main.py file. Some important files are:

  • train.csv contains the data of most of the passengers and if they survived
  • test.csv contains the data of the rest of the passengers and we have to predict if they survived
  • gender_submission.csv is a possible submission that predicts the outcome using only the sex attribute
  • results.csv is a possible solution generated by the code
  • main.py contains the code I used to generate the predictions

Results

I achieved 0.77990/1.000 using model.pth, so it can predict 77.99% of the time the outcome of that tragic day based on the information provided.

Pay attention to the top results on the Leaderboard that achieved a perfect result. They have used the information available online to create manually the Predictions.

Technology

  • Python version. 3.10.0
  • PyTorch version: 1.12.0+cpu
  • Numpy version: 1.23.1
  • Pandas version: 1.4.3
  • Matplotlib version: 3.5.2

Program logic

The code is structured in a way that could be read from top to bottom starting from the main(). The first step is to retrieve the data from the files and generate the Tensors, which will be used to train and make the prediction. After reading the files, the data is manipulated and the "Name", "Ticket", "Fare", and "Cabin" features are dropped. 1 The categorical data is managed with Hot Encoding and the Tensors generated are normalized. Then the Tensors are saved to file, so in the next iterations these steps can be skipped. The creation of the model operates similarly. If the model was already created, then the code retrieves it from file and uses it for generating the output. Otherwise, it is created and trained in trainingLoop(...). I added the option to force the creation of a new model, overwriting the old version on file. The number of epochs and the LR(learning rate) can be changed using the constants at the top of the file.

The structure of the Model is defined in the only class present after the constants.

Footnotes

  1. I have created the branch "consideringTitleFare" to explore if the Title in the "Name" and the "Fare" information could help with the prediction. The model that considers these pieces of information, for the moment, is less efficient

Releases

No releases published

Packages

No packages published

Languages