Spotting the Unseen: Identifying Sexism in Twitter Movements

Author: Jerin Easo Thomas
Date: Spring 2023
Affiliation: Luddy School of Informatics, Computing and Engineering, Indiana University Bloomington
Contact: [email protected]

Introduction

Sexism comments on social media propagate harmful stereotypes and gender-based prejudice, impacting users psychologically and fostering discrimination. This project aims to identify sexism in Twitter movements using machine learning and deep learning approaches. By analyzing sexist speech, we seek to create safer online environments, combat gender-based discrimination, and gain insights into societal prejudices.

Research Question

This study focuses on two key questions:

How reliable and efficient are machine learning methods in spotting sexism in Twitter movements?
What are the source intentions behind tweeting sexist comments?

Methods

Various methodologies were employed in this study:

Valence Aware Dictionary and Sentiment Reasoner (VADER): Used for sentiment analysis, especially on social media text.
Neural Network: Deep learning model structured to mimic the human brain's organization.
Robustly Optimized BERT Pretraining Approach (RoBERTa): Transformer-based neural network architecture pre-trained on a large text corpus.
Pytesseract: Python library for extracting text from image-based data.
Google Translate API: Utilized for language translations.

Data

Data was collected from Twitter movements #MeToo, #8M, and #Time'sUp using the EXIST dataset. The dataset includes tweets in English and Spanish, annotated for sexism. Approximately 7900 rows of data were collected, comprising tweet comments, language, annotators, user details, and labels.

Analysis

The analysis proceeded in three stages:

Data Gathering and Preprocessing: Included data analysis, preprocessing, and text normalization techniques.
Model Building and Evaluation: Utilized VADER for sentiment analysis, neural networks, and RoBERTa for tweet classification.
Tweet Classification and Data Visualization: Employed models to classify tweets as sexist or non-sexist, analyze source intentions, and visualize the results.

Results

Key findings from the analysis include:

Distribution of Classified Tweets: Spanish tweets exhibited higher sexism rates compared to English tweets.
Effectiveness of Image-based Data Classification: Models effectively distinguished sexist and non-sexist tweets from image-based data.
Source Intention Distribution: Majority of English tweets showed 'Direct' source intention, while Spanish tweets displayed varied intentions.
Most Often Used Words: Word clouds revealed specific words more common in sexist tweets, providing insights into language usage.

Conclusion

The RoBERTa-large model emerged as the most accurate for tweet classification, demonstrating its efficacy in identifying sexism and detecting source intentions. This study provides valuable insights into combating sexism on social media platforms and fostering inclusive online communities.

References

Thomas Davidson, Dana Warmsey, Michael Macy, Ingmar Weber. "Automated Hate Speech Detection and the Problem of Offensive Language." Link
Shimi Gersome and Jerin Mahibha. "Sexism Identification In Social Media Using Deep Learning Models." Link
EXIST: sEXism Identification in Social Networks. Link
Francisco Rodriguez-Sanchez, Jorge Carrillo-de-Albornoz, and Laura Plaza. "Automatic Classification of Sexism in Social Networks: An Empirical Study on Twitter Data." Link
Regina Konig and Angela Heine. "Learning to detect sexism: An evaluation of the effects of a brief video-based intervention using ROC analysis." Link
Google Translate API. Link
Training and evaluation with the built-in methods. Link
Sentiment Analysis using VADER. Link

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
EXIST 2023 Dataset.zip		EXIST 2023 Dataset.zip
JET_Paper3.docx		JET_Paper3.docx
JET_SMM_Proposal3_V3.pdf		JET_SMM_Proposal3_V3.pdf
Neural Network.ipynb		Neural Network.ipynb
README.md		README.md
SMM_PAPER3_V2.ipynb		SMM_PAPER3_V2.ipynb
Supplemental.zip		Supplemental.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotting the Unseen: Identifying Sexism in Twitter Movements

Introduction

Research Question

Methods

Data

Analysis

Results

Conclusion

References

About

Releases

Packages

Languages

Jerin1107/IdentifyingSexismInTwitterMovements

Folders and files

Latest commit

History

Repository files navigation

Spotting the Unseen: Identifying Sexism in Twitter Movements

Introduction

Research Question

Methods

Data

Analysis

Results

Conclusion

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages