forked from acoli-repo/conll-rdf
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
128 lines (92 loc) · 4.64 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
Universal Dependencies - English Dependency Treebank
Universal Dependencies English Web Treebank v1.3 -- 2016-05-15
https://github.com/UniversalDependencies/UD_English
A Gold Standard Universal Dependencies Corpus for English,
built over the source material of the English Web Treebank
LDC2012T13 (https://catalog.ldc.upenn.edu/LDC2012T13).
LICENSE/COPYRIGHT
Universal Dependencies English Web Treebank © 2013, 2014, 2015, 2016
by The Board of Trustees of The Leland Stanford Junior University.
All Rights Reserved.
The annotations and database rights of the Universal Dependencies
English Web Treebank are licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License.
You should have received a copy of the license along with this
work. If not, see <http://creativecommons.org/licenses/by-sa/4.0/>.
The underlying texts come from various sources collected for the
LDC English Web Treebank. Some parts are in the public domain.
Portions may be © 2012 Google Inc., © 2011 Yahoo! Inc.,
© 2012 Trustees of the University of Pennsylvania and/or
© other original authors.
STRUCTURE
This directory contains a corpus of sentences annotated using Universal Dependencies annotation.
The corpus comprises 254,830 words and 16,622 sentences, taken from various web media including
weblogs, newsgroups, emails, reviews, and Yahoo! answers; see the LDC2012T13 documentation for
more details on the source of the sentences. The trees were automatically converted into Stanford
Dependencies and then hand-corrected to Universal Dependencies. All the dependency annotations
have been single-annotated, and a limited portion of them have been double-annotated with
interannotator agreement at approximately 96%.
This corpus is compatible with the CoNLL-U format defined for Universal Dependencies. See:
http://universaldependencies.github.io/docs/format.html
The dependency taxonomy can be found on the Universal Dependencies web site:
http://universaldependencies.github.io/docs/
DEVIATIONS FROM UD
Version 1.3 of the English UD treebank conforms to the UD guidelines in
almost all respects, but there are a couple of remaining deviations:
* The UD dependency 'name' is only used for person names.
* Person names are annotated right-headed whereas they should be left-headed.
CHANGELOG
2016-05-15 v1.3
-- Improved mapping of WDT to UPOS
-- Corrected lemma of "n't" to "not"
-- Fixed some errors between advcl, ccomp and parataxis
-- Fixed inconsistent analyses of sentences repeated between dev and train sets
-- Fixed miscellaneous syntactic issues in a few sentences
2015-11-15 v1.2
-- Bugfix: removed _NFP suffix from some lemmas
-- Fixed date annotations to adopt UD standard
-- Remove escaping of ( and ) from word tokens (XPOSTAGs are still -LRB- and -RRB-)
-- Improved precision of xcomp relation
-- Improved recall of name relation
-- Corrected lemmas for reduced auxiliaries
-- Corrected UPOS tags of pronominal uses of this/that/these/those (from DET to PRON)
-- Corrected UPOS tags of subordinating conjunctions (from ADP to SCONJ)
-- Corrected UPOS tags of some main verbs (from AUX to VERB)
FIXES
To help improve the corpus, please alert us to any errors you find in it.
The best way to do this is to file a github issue at:
https://github.com/UniversalDependencies/UD_English/issues
CONTRIBUTORS
Annotation of the Universal Dependencies English Web Treebank was carried out by
(in order of size of contribution):
Natalia Silveira
Timothy Dozat
Miriam Connor
Marie-Catherine de Marneffe
Samuel Bowman
Hanzhi Zhu
Daniel Galbraith
Christopher Manning
John Bauer
Creation of the CoNLL-U files, including calculating UPOS, feature, and lemma information
was primarily done by
Sebastian Schuster
Natalia Silveira
The construction of the Universal Dependencies English Web Treebank was partially funded
by a gift from Google, Inc., which we gratefully acknowledge.
CITATIONS
You are encouraged to cite this paper if you use the Universal Dependencies English Web Treebank:
@inproceedings{silveira14gold,
year = {2014},
author = {Natalia Silveira and Timothy Dozat and Marie-Catherine de Marneffe and Samuel Bowman
and Miriam Connor and John Bauer and Christopher D. Manning},
title = {A Gold Standard Dependency Corpus for {E}nglish},
booktitle = {Proceedings of the Ninth International Conference on Language
Resources and Evaluation (LREC-2014)}
}
Documentation status: complete
Data source: manual
Data available since: UD v1.0
License: CC BY-SA 4.0
Genre: blog social reviews
Contributors: Silveira, Natalia; Dozat, Timothy; Manning, Christopher; Schuster, Sebastian; Bauer, John; Connor, Miriam; de Marneffe, Marie-Catherine; Bowman, Sam; Zhu, Hanzhi; Galbraith, Daniel