Titanic - Machine Learning from Disaster Model Card

Basic Information

Name: Tapiwanashe Emmanuel Matare
Email Address: [email protected]
Date: [4 December 2024]
Model Version: 1.0
License:License MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE

Model Implementation Code: Link to Colab

Intended Use

Intended Uses: This model is intended for predicting survival on the Titanic based on passenger characteristics.
Intended Users: Data scientists, researchers, and educators interested in machine learning applications.
Out-of-Scope Uses: This model should not be used for real-time decision-making in critical situations.

Training Data

Source of Training Data: The Titanic dataset from Kaggle.
Training Data Division: The training data was divided into 70% training and 30% validation.
Number of Rows:
- Training Data: [623 rows]
- Validation Data: [134 rows]

Data Dictionary:

Column Name	Modeling Role	Measurement Level	Description
Pclass	Input	Nominal	Passenger class (1st, 2nd, or 3rd)
Sex	Input	Nominal	Gender of the passenger
Age	Input	Continuous	Age of the passenger
SibSp	Input	Discrete	Number of siblings/spouses aboard
Parch	Input	Discrete	Number of parents/children aboard
Fare	Input	Continuous	Fare paid by the passenger
Embarked	Input	Nominal	Port of embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)
Survived	Target	Binary	Survival status (0 = No, 1 = Yes)

EDA (Exploratory Data Analysis)

Engaging in Exploratory Data Analysis (EDA), I embark on a journey of understanding my dataset deeply. Through various visualization techniques, I:
Visualize distributions of key numerical features like age, number of siblings/spouses (SibSp), number of parents/children (Parch), and fare. This helps me uncover trends and identify outliers.
Explore relationships between different variables, potentially revealing correlations that could impact survival predictions.

Data Cleaning

Data cleaning is a pivotal stage in my analysis. In this phase, I:
Address missing values meticulously, strategically choosing methods like imputation to fill in the gaps in my data.
Drop irrelevant columns such as "PassengerId," "Cabin," "Name," and "Ticket," which don't significantly contribute to my analysis.
Engage in feature engineering, crafting new features or transforming existing ones to enrich my dataset. This can lead to improved predictive performance.

Test Data

Source of Test Data: The Titanic dataset from Kaggle.
Number of Rows in Test Data: [134 rows]
Differences in Columns: The test data does not include the 'Survived' column.

Model Details

Input Columns: Pclass, Sex, Age, SibSp, Parch, Fare, Embarked
Target Column: Survived
Type of Model: Decision Tree Classifier
Software Used: Python with scikit-learn
Version of Software: scikit-learn version [1.5.2]
Hyperparameters:
- Max Depth: [5]
- Min Samples Split: [2]

Quantitative Analysis

Metrics Used for Evaluation:
- AUC (Area Under the Curve)
- AIR (Accuracy Improvement Rate)

Below is a summary table showing the metrics for Train, Validation, and Test datasets:

Metric	Train	Validation	Test
AUC	0.895773	0.82433	0.819393
Accuracy	N/A	N/A	0.768657
AIR	N/A	N/A	0.768657

The chart below shows the model's heatmap

Model Card: Decision Tree Classifier

Model Overview

This model is a Decision Tree Classifier trained to predict Survival Status:

Survival = 0: No (Did not survive)
Survival = 1: Yes (Survived)

The chart below illustrates the model's performance based on tree depth, showcasing the Training AUC and Validation AUC.

Performance Metrics on Test Data

AUC on Test Data: 0.7687
Accuracy on Test Data: 0.7687

Confusion Matrix on Test Data

True\Predicted	No (Survival = 0)	Yes (Survival = 1)
No (Survival = 0)	69	18
Yes (Survival = 1)	13	34

Performance Metrics by Sex

Sex = 1 (Male)

Confusion Matrix:

True\Predicted No (Survival = 0) Yes (Survival = 1)

No (Survival = 0) 65 7

Yes (Survival = 1) 12 3
Accuracy: 0.7816

Sex = 0 (Female)

Confusion Matrix:

True\Predicted No (Survival = 0) Yes (Survival = 1)

No (Survival = 0) 4 11

Yes (Survival = 1) 1 31
Accuracy: 0.7447

Notes

The model is designed to predict survival status, with Survival = 0 representing "Did not survive" and Survival = 1 representing "Survived."
The overall accuracy on test data is 76.87%, with differences in accuracy observed between males and females.
The visualization of tree depth vs. AUC highlights potential overfitting, as seen by the divergence between training and validation AUC as tree depth increases.

Recommendations

Further tuning of the model might reduce overfitting.
Consider additional stratified analysis by other variables to evaluate performance consistency across subgroups.

Ethical Considerations

Potential Negative Impacts

Math or Software Problems:
- The model may produce biased predictions if trained on non-representative data.
Real-world Risks:
- Misclassification could lead to incorrect assumptions about passenger safety.

Potential Uncertainties

Math or Software Problems:
- Variability in model performance due to changes in input data quality.
Real-world Risks:
- Decisions based on model predictions could affect public perception and safety measures.

Unexpected Results

The model's performance may vary significantly between different demographic groups (e.g., gender, age).

Conclusion

-At a predictive accuracy of 76.6%, my model demonstrates its potential to forecast Titanic passenger survival effectively. This project not only illustrates the practical application of machine learning techniques on historical data but also provides insights into the influential factors behind survival rates during the Titanic disaster.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Heatmap.png		Heatmap.png
README.md		README.md
Titanic_Machine_Learning_From_Disaster (1).ipynb		Titanic_Machine_Learning_From_Disaster (1).ipynb
Titanic_Machine_Learning_From_Disaster.ipynb		Titanic_Machine_Learning_From_Disaster.ipynb
download.png		download.png
gender_submission.csv		gender_submission.csv
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Titanic - Machine Learning from Disaster Model Card

Basic Information

Intended Use

Training Data

EDA (Exploratory Data Analysis)

Data Cleaning

Test Data

Model Details

Quantitative Analysis

Model Card: Decision Tree Classifier

Model Overview

Performance Metrics on Test Data

Confusion Matrix on Test Data

Performance Metrics by Sex

Sex = 1 (Male)

Sex = 0 (Female)

Notes

Recommendations

Ethical Considerations

Potential Negative Impacts

Potential Uncertainties

Unexpected Results

Conclusion

About

Releases

Packages

Contributors 2

Languages

Hbnyoni/Titanic

Folders and files

Latest commit

History

Repository files navigation

Titanic - Machine Learning from Disaster Model Card

Basic Information

Intended Use

Training Data

EDA (Exploratory Data Analysis)

Data Cleaning

Test Data

Model Details

Quantitative Analysis

Model Card: Decision Tree Classifier

Model Overview

Performance Metrics on Test Data

Confusion Matrix on Test Data

Performance Metrics by Sex

Sex = 1 (Male)

Sex = 0 (Female)

Notes

Recommendations

Ethical Considerations

Potential Negative Impacts

Potential Uncertainties

Unexpected Results

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages