In this notebook, The goal is to correctly predict if someone survived the Titanic shipwreck using different Machine Learning Model and Hyperparameter tunning.
Python Libraries for Data Science
🗂️
-
Understand the shape of the data (Histograms, box plots, etc.)
-
Data Cleaning
-
Data Exploration
-
Feature Engineering
-
Data Preprocessing for Model
-
Basic Model Building
-
Model Tuning
-
Ensemble Modle Building
-
Results
Competition sites like Kaggle define the problem to solve or questions to ask while providing the datasets for training your data science model and testing the model results against a test dataset. The question or problem definition for Titanic Survival competition is described here at Kaggle.
Knowing from a training set of samples listing passengers who survived or did not survive the Titanic disaster, can our model determine based on a given test dataset not containing the survival information, if these passengers in the test dataset survived or not.
We may also want to develop some early understanding about the domain of our problem. This is described on the Kaggle competition description page here. Here are the highlights to note.
On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. Translated 32% survival rate. One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.
The competition solution workflow goes through seven stages described in the Data Science Solutions book.
- Question or problem definition.
- Acquire training and testing data.
- Wrangle, prepare, cleanse the data.
- Analyze, identify patterns, and explore the data.
- Model, predict and solve the problem.
- Visualize, report, and present the problem solving steps and final solution.
- Supply or submit the results.
-
Pandas | NumPy | Matplotlib | Seaborn |
Kaggle Project Link: Titanic Survival Prediction 🛳️ 🔗
Kaggle Titanic Datasets: Titanic Train
& Titanic Test
Spotify Data Analysis using Python
📊
Statistics for Data Science using Python
📊
Sales Insights - Data Analysis using Tableau & SQL
📊
Kaggle - Pandas Solved Exercises
📊
Python Libraries for Data Science
🗂️