- Project Motivation
- Installation
- About the Data
- About the Web App
- Licensing, Authors & Acknowledgements
Climate change has increased the number and severity of natural disasters all over the world. The shorter the response time of emergency services, the greater the number of lives that can be saved. This app is using a machine learning model that was trained on over 25,000 natural disaster- related messages.
Disaster relief workers can use this app to input messages and it will return a classification of the message, such as "aid-related", "medical help", "water". With better identification of messages, we hope that emergency services could allocate resources or assistance more effectively.
See the live dashboard here
You can replicate this dashboard locally by following these instructions.
The code should using Python version 3.6 and above.
- Start by cloning the repository into your local machine
- The additional libraries required to execute the code can be installed by running
pip install -r requirements.txt
- Go ito the app/ directory and run
python run.py
to run the web app locally - Open http://0.0.0.0:3001/ on your browser to view the web app
- Dataset: disaster_messages.csv and disaster_categories.csv contain messages data and the corresponding classification of the messages
- process_data.py: This script cleans up and merges the two dataset. The result is then saved into an SQLite DB. In this project, the DB is named as DisasterResponse DB. The whole process is executed by running
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- train_classifier.py: This script loads the data from the DisasterResponse.db and uses it as an input for the ML model. The fine-tuned model is then saved as a pickle file called classifier.pkl. This whole process is executed by running
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- templates folder: Contains files used for the frontend part of the web app
- run.py: A file that contains pre-processing of the data that will be passed into the files in the templates folder. This also contains the script necessary to render the web app.
Some of the labels in the original dataset are highly imbalanced; this affects the model's F1 score. For simplicity's purposes, these highly imbalanced fields are dropped in the process_data.py file. In the future, pre-processing methods to deal with imbalanced dataset for classification such as MLSMOTE can be used.
This project was completed as a part of Udacity Data Science Nanodegree. Credits to Figure Eight for providing the dataset.