This repository contains the Unstructured Data Analysis Final Project for Imperial College London. It includes a comprehensive analysis of six novels by Charles Dickens, using various data analysis techniques to explore thematic elements, character development, emotional undertones, etc.
The code is built with the following libraries:
- Python >= 3.11.5
- numpy >= 1.24.3
- pandas >= 2.0.3
- nltk >= 3.8.1
- matplotlib >= 3.7.2
- seaborn >= 0.12.2
- scikit-learn >= 1.3.0
- wordcloud >= 1.9.3
The dataset comprises six novels by Charles Dickens, sourced from Project Gutenberg.
The analysis is conducted in a Jupyter Notebook (UDA_FinalProject_Liu.ipynb). This notebook includes all the code and visualizations created during the analysis. In addition, I run the code on my laptop.
This repository is released under the MIT license. See the LICENSE for additional details.
Special thanks to:
- Project Gutenberg, for providing the text of Charles Dickens' novels.
- NRC Word-Emotion Association Lexicon for the emotion lexicon used in this analysis.