Everyday while surfing the social media we encounter a lot of comments, reviews, tweets etc. that we believe might hurt the sentiments of the people of a particular group or a community. These comments are believed to be toxic in nature, which thus defines the problem that we are trying to solve with this project i.e Classifying the comments on the social media into various categories of toxicity, which are - Toxic, Severe-toxic, Obscene, Threat, Insult, Identity_hate. This is a Multi Label Classification problem which means that a given comment may belong to more than one category at the same time.
- Python 3.7
- Numpy
- Pandas
- Matplotlib
- NLTK
- Seaborn
- Getting the dataset
- Getting insights from dataset using visualisation tools.
- Preprocessing the data using NLTK.
- Applying Multi Label classification algorithms.
- Comparing the results and choosing the best among them.
Predicted an accuracy score of 88.16% using Binary Relevance method with SVM classifier.