Sense Exploring of Emojis with Word2Vec Model and Using Emojis to Modify Short-text Sentiments Classification

step1-Data collection

- period: 2022-0415-2022-0528
- filter: retweet or media
- querry: tweeets that contain at least one concerned emoji

Our dataset consists of Tweets which contain at least one of the following emoji: The baseline selection of common-used emoji refers to Ian D. Wood, Sebastian Ruder, 2016

The tag of emoji is besed on human intuition accroding to the original paper, instead of meaning in context.

step2-Data cleaning

2.1-First EDA

2.2-Data cleaning

For English tweets, we follow the following steps:

1. remove url
2. remove user names
3. remove punctuations
4. remove stopwords
5. lower the words
6. lemmatization
7. keep only english characters and emojis

For Chinese tweets, we:

1. remove url
2. remove user names
3. cut the words
4. remove stopwords (en & zh)
5. transform to simplified Chinese
6. remove punctuations (en & zh)
7. remove sensitive Chinese words
8. remove non-chinese words

2.3-Extract pure emojis and text

And, we generate 3 new coloumns which:

remove all emojis

include all emojis

only include concerned emojis

For English tweets, we follow the following steps:

remove url
remove user names
remove punctuations
remove stopwords
lower the words
lemmatization

And, we generate 3 new coloumns which:

    - 1) remove all emojis
    - 1) remove non-english words
    - 2) all emojis
    - 3) only concerned emojis

For Chinese tweets, we follow the following steps:

remove url
remove user names
cut the words
remove stopwords (en & zh)
transform to simplified Chinese
remove punctuations (en & zh)
remove sensitive Chinese words
remove english words

And, we generate 3 new coloumns which:

    - 1) remove all emojis
    - 2) all emojis
    - 3) only concerned emojis

step3-EDA & Word embedding

3.1 data exploring

plot the most frequent emojis

3.2 word embedding

Word2Vec word embedding with only emoji
word embedding with both emoji and words

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
1_tweet_collect_en.ipynb		1_tweet_collect_en.ipynb
1_tweet_collect_zh.ipynb		1_tweet_collect_zh.ipynb
2_Data_cleaning_en.ipynb		2_Data_cleaning_en.ipynb
2_Data_cleaning_zh.ipynb		2_Data_cleaning_zh.ipynb
3_EDA_and_basic_model_en.ipynb		3_EDA_and_basic_model_en.ipynb
3_EDA_and_basic_model_zh.ipynb		3_EDA_and_basic_model_zh.ipynb
4_sentiment analysis_final.ipynb		4_sentiment analysis_final.ipynb
README.md		README.md
demo_ppt.pdf		demo_ppt.pdf
en_most_similar_names.csv		en_most_similar_names.csv
en_word2vec.wordvectors		en_word2vec.wordvectors
en_word2vec_skipgram_300.model		en_word2vec_skipgram_300.model
fre_emojis_en.csv		fre_emojis_en.csv
fre_emojis_zh.csv		fre_emojis_zh.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sense Exploring of Emojis with Word2Vec Model and Using Emojis to Modify Short-text Sentiments Classification

step1-Data collection

step2-Data cleaning

2.1-First EDA

2.2-Data cleaning

2.3-Extract pure emojis and text

step3-EDA & Word embedding

3.1 data exploring

3.2 word embedding

step4-Sentiment Classification

4.1 Method 1: only text

4.2 Method 2: text + emojis' description names

4.3 Method 3: text + emojis' most similar text tokens

4.4 method 4: text + pre-trained word embedding model

About

Releases

Packages

Languages

leyaoliatan/Sentiments-in-Tweets-with-Emojis

Folders and files

Latest commit

History

Repository files navigation

Sense Exploring of Emojis with Word2Vec Model and Using Emojis to Modify Short-text Sentiments Classification

step1-Data collection

step2-Data cleaning

2.1-First EDA

2.2-Data cleaning

2.3-Extract pure emojis and text

step3-EDA & Word embedding

3.1 data exploring

3.2 word embedding

step4-Sentiment Classification

4.1 Method 1: only text

4.2 Method 2: text + emojis' description names

4.3 Method 3: text + emojis' most similar text tokens

4.4 method 4: text + pre-trained word embedding model

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages