Titanic - Machine Learning from Disaster- Kaggle competition

This is a Kaggle competition based on the data available from the Titanic disaster. The aim is to predict from the information who survived. The data regarding the passengers are partitioned in two. In the data folder, there are some files that are provided by the competition and some that are generated by the code in the main.py file. Some important files are:

train.csv contains the data of most of the passengers and if they survived
test.csv contains the data of the rest of the passengers and we have to predict if they survived
gender_submission.csv is a possible submission that predicts the outcome using only the sex attribute
results.csv is a possible solution generated by the code
main.py contains the code I used to generate the predictions

Results

I achieved 0.77990/1.000 using model.pth, so it can predict 77.99% of the time the outcome of that tragic day based on the information provided.

Pay attention to the top results on the Leaderboard that achieved a perfect result. They have used the information available online to create manually the Predictions.

Technology

Python version. 3.10.0
PyTorch version: 1.12.0+cpu
Numpy version: 1.23.1
Pandas version: 1.4.3
Matplotlib version: 3.5.2

Program logic

The code is structured in a way that could be read from top to bottom starting from the main(). The first step is to retrieve the data from the files and generate the Tensors, which will be used to train and make the prediction. After reading the files, the data is manipulated and the "Name", "Ticket", "Fare", and "Cabin" features are dropped. ¹ The categorical data is managed with Hot Encoding and the Tensors generated are normalized. Then the Tensors are saved to file, so in the next iterations these steps can be skipped. The creation of the model operates similarly. If the model was already created, then the code retrieves it from file and uses it for generating the output. Otherwise, it is created and trained in trainingLoop(...). I added the option to force the creation of a new model, overwriting the old version on file. The number of epochs and the LR(learning rate) can be changed using the constants at the top of the file.

The structure of the Model is defined in the only class present after the constants.

I have created the branch "consideringTitleFare" to explore if the Title in the "Name" and the "Fare" information could help with the prediction. The model that considers these pieces of information, for the moment, is less efficient ↩

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Titanic - Machine Learning from Disaster- Kaggle competition

Results

Technology

Program logic

About

Releases

Packages

Languages

License

Cucchi01/titanic-kaggle

Folders and files

Latest commit

History

Repository files navigation

Titanic - Machine Learning from Disaster- Kaggle competition

Results

Technology

Program logic

Footnotes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages