The data is obtained from https://www.kaggle.com/c/titanic. The aim of the task was to use machine learning to create a model that predicts which passengers survived the Titanic shipwreck
The overall goal for this kaggle dataset was to get started in how competitions work and how it is assessed to further compete in future competitions
I will explore the data and perform pre-processing techniques to make the data set suitable for the model fitting that follows.
A quick graph analysis of suitable attributes to the response target variable (Survived) is also performed to also identify key variables that impact the probability of survival rate through inspection
Implementation of classication models using each of the following techniques are also used to determine the feature importance within the dataset.
- Decision Tree
- Bagging
- Boosting
- Random Forest
Full report can be found on my github: https://github.com/TingHanGan/titanic-kaggle.
The best classifcation model out of all 4 models resulted in the Random Forest with a score of 0.78468.