example-data

This repo includes processing script we have for the each dataset we included in TID-8 datasets.

Note that here we explicitly include the annotations one annotator has on other examples to help the later modeling process. But in TID-8 datasets, such information is omitted for simplicity.

If you want to use a cleaned version of these datasets, you may go to TID-8 datasets directly.

Otherwise, you may download each dataset and rename and process them accordingly. Here are the links we used to download these raw datasets:

Commitmentbank dataset: https://github.com/mcdm/CommitmentBank
FriendsQIA dataset: https://github.com/friendsQIA/Friends_QIA, specifically at https://github.com/friendsQIA/Friends_QIA/tree/main/Data/Friends_data
GoEmotions dataset: https://github.com/google-research/google-research/tree/master/goemotions
HS-Brexit dataset: https://le-wi-di.github.io/, specifically at https://github.com/Le-Wi-Di/le-wi-di.github.io/blob/main/data_post-competition.zip
Humor dataset: https://github.com/ukplab/acl2019-GPPL-humour-metaphor, specifically at https://github.com/UKPLab/acl2019-GPPL-humour-metaphor/blob/master/data/pl-humor-full/results.tsv
MultiDomain Agreement dataset: https://le-wi-di.github.io/, specifically at https://github.com/Le-Wi-Di/le-wi-di.github.io/blob/main/data_post-competition.zip
Pejorative dataset: https://github.com/t-davidson/hate-speech-and-offensive-language, specifically at https://github.com/t-davidson/hate-speech-and-offensive-language/tree/master/data
Sentiment dataset: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/F6EMTS
Toxicity ratings dataset: This data is not publicaly available and there are many constraints there. If you are interested, please contact the authors of this dataset to access the data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

example-data

Files

README.md

Latest commit

History

README.md

File metadata and controls

example-data