A compilation of the most famous/used argument mining corpora in English.
Id | Corpus | Relevant Papers |
---|---|---|
1 | AraucariaDB | Reed, C., 2005 Reed et al., 2008 |
2 | European Court of Human Rights (ECHR) | Mochales & Moens, 2007 Mochales & Moens, 2008 |
3 | Internet Argument corpus (IAC) | Walker et al. 2012 |
4 | Argument Annotated Essays Corpus (AAEC) | Stab & Gurevych, 2014a |
5 | Wikipedia articles | Aharoni et al., 2014 |
6 | User-generated Web Discourse Gold standard Toulmin corpus (study 2) |
Habernal & Gurevych, 2017 |
Some statistics and characteristics of the previously listed corpora are presented below.
1 - AraucariaDB | |
---|---|
Domain | Newspapers and court cases |
Language | English |
Size | Over 700 analyses, and a total of 80,000 words |
Argument model | Walton’s schemes |
Annotation process |
|
Agreement | Unknown |
Comments | Text gathered from newspaper editorials, parliamentary records, judicial summaries and discussion boards |
URL | https://arg-tech.org/index.php/research/araucariadb/ |
2 - European Court of Human Rights (ECHR) | |
---|---|
Domain | Legal |
Language | English |
Size | 12,904 sent., 10,133 non arg. and 2,771 arg., 2,355 premises and 416 conclusions |
Time | 4 weeks - docs analyzed by 2 lawyers |
Argument model | Argumentation schemes AC: conclusion, and premise AR: support / attack |
Annotation process |
|
Agreement | K = 0.58, K = 0.80 |
Comments | 55 documents composed of 25 legal cases and 29 admissibility reports |
3 - Internet Argument corpus (IAC) | |
---|---|
Domain | Political |
Language | English |
Size | Set of 390,704 posts in 11,800 discussions (from debate site 4forums.com) |
Annotation process |
|
Agreement | K(topic) = 0.22–0.60 K(avg) = 0.47v |
Comments | Corpus for research in political debate on Internet forums. It consists of approximately 11,000 discussions, 390,000 posts, and some 73,000,000 words |
URL | https://nlds.soe.ucsc.edu/iac |
4 - Argument Annotated Essays Corpus (AAEC) | |
---|---|
Domain | Persuasive essays (various) |
Language | English |
Size | 90 persuasive essays, 1,673 sentences with 34,917 tokens |
Argument model | AC: major claim, claim, and premise AR: support / attack |
Annotation process |
|
Agreement | αU(comp) = 0.72 αU(rel) = 0.81 |
Comments | The corpus consists of 90 English persuasive essays (collected from essay forum). The corpus contains 1,879 sentences. 402 Essays about 8 controversial topics |
URL | http://corpora.aifdb.org/AAECv2 |
5 - Wikipedia articles | |
---|---|
Domain | Various |
Language | English |
Size | ~50,000 sent, 2,683 argument elements, collected in the context of 33 controversial topics |
Argument model | AC: claim and its associated supporting evidence In detail: Topic, Context Dependent Claim (CDC), Context Dependent Evidence (CDE) |
Annotation process | 20 carefully trained in-house labelers Two-stage labeling approach:
|
Agreement | K(claim) = 0.39 K(evidence) = 0.4 |
Comments | A corpus of 2,683 argument elements, collected in the context of 33 predefined controversial topics |
6 - User-generated Web Discourse | |
---|---|
Language | English |
Size | 340 documents |
Time | Each annotator spent 35 hours by annotating in the course of 5 weeks Discussions and consolidation of the gold data took another 6 hours |
Argument model | Adaptation of Toulmin’s model AC: claim, premise, backing, rebuttal, and refutation |
Annotation process | All docs were annotated by 3 independent annotators. Three phases:
|
Agreement | αU = 0.48 Joint logos (claim, premise, backing, rebuttal, refutation) for Articles + Blog posts + Comments + Forum posts |
Comments | Contains 340 documents about 6 controversial topics in education |
URL | https://bit.ly/2vdkHOD |
Created on Oct 04, 2022
Created by:
This project is licensed under the terms of the Apache License 2.0.
This work was supported by the Spanish Ministry of Science and Innovation (PID2019-108965GB-I00).