Most people suffer from emotional distress due to going through a significant life change, financial crisis, being a caregiver or due to various physical and mental health conditions. Inability to regulate emotion in such episodes can potentially lead to self-destructive behavior such as substance abuse, self-harm or suicide. However, due to public and personal “stigma” associated with mental health, most people do not reach out for help. Even therapeutic consultations are limited and are not available 24/7 to support people when they are going through a traumatic episode. Therefore, it is important to assess the ability of AI driven chatbots to help people to deal with emotional distress and help them regulate emotion. One of the major limitations in developing such a chatbot is the unavailability of a curated dialogue dataset containing emotional support. With this project, we aim to curate and analyse such a dataset having the potential to train and evaluate mental care giving chatbot that can support people in emotional distress.
The codes are implemented in Python 3. You will need the following dependencies installed:
-
$ pip install requests
-
$ pip install nltk
-
$ pip install contractions
-
$ pip install swifter
-
$ pip install language-tool-python
-
$ pip install tqdm
-
$ pip install emoji
-
$ pip install joblib
-
$ pip install profanity-check
-
$ pip install vaderSentiment
-
$ pip install tensorflow
reddit-scrape-pushshift.ipynb
: The notebook is mainly used for scraping Reddit textual data using Pushshift APIs.preprocess.ipynb
: Preprocess raw scraped conversation data and convert them to table-like data frames.EDA.ipynb
: The notebook presents various analyses and graphical representations to attain insights and find patterns.utils4text.py
: This file contains the supporting functions applied inEDA.ipynb
.EmoBERT.ipynb
: The notebook for making emotion prediction on the messages in dialogues. Before running it, make sure to load the checkpoints HERE.
- The dataset can be found in two folders, raw and dataset, in Google Drive.
- Categories
- For scraping dialogues from Reddit, run the notebook
reddit-scrape-pushshift.ipynb
. Note that it would take several hours to finish scraping on the subreddits like r/depression, r/offmychest and r/suicidewatch. - Run
preprocess.ipynb
to transform the scraped data in the JSON format into data frames. - If you want to explore the dialogues, check
EDA.ipynb
for more details. - Run
EmoBERT.ipynb
to get the emotion prediction of the utterances.
Licensed under MIT License