-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for .conllu format? #2
Comments
Also: CoNLL-U allows comments (lines starting with |
hello,
do you have any ideas on that? some weeks ago, i already experimented with the UD representation tool Annodoc (which is utterly complex,) in order to include it as an output format in arborator's quick page because i needed it for a paper where i wanted the trees to look "universal". you can see the ramshackle implementation here: http://arborator.ilpga.fr/q.cgi - click on the "Show Annodoc graph" button. Sometimes you got to reload to get the graph back. their javascript and mine are still interfering. it's usable but not yet "pushable" to github... |
|
Hi Kim - if you can expose the textual format underlying the annodoc graph, that could be useful for preparing UD documentation, like this:
My examples for Coptic UD are all in conll10, since they're all annotated in Arborator, and it would be nice to move from a separate PDF to the UD online documentation system with automatic conversion of the examples. |
hi guys, The quick page (see http://arborator.ilpga.fr/q.cgi) now supports the CoNLLu format (http://universaldependencies.org/format.html).
these special lines are used only to construct the sentence on top of the tree, but not graphically modifiable. they are rewritten at the right position in the conll.
One problem is the definition of extra dependencies beyond the tree: Now the Arborator writes the first governor (by order) into the normal spot and additional (later) governors that appear later in the sentence are written into the special column. This means that the order between multiple governors can change after tree modification (between the usual columns for governors and the extra governors' column as well as inside that special column). You can try it out on the arborator page. The next step would be to integrate it into the python code of the database based side of the arborator. This would have to be done in the tree2nodedic function of conll.py and possibly in the database.py file in order to store the additional information somewhere for re-exporting. i won't have much time very soon for that. so if you can give it a try, i'd be grateful. amir: concerning the textual format you mention, the annodoc also supports conllu directly. so why would you need this other (old stanford) format? you can also try the new button in the quick page to get ud-style graphs. if you could improve this (remove interfering js, possibly export to svg, ...) that would be great! |
Do you mean, functions containing colons?
It says "enhanced representations may require additional dependency relations", which is indeed vague. For English, the "enhanced" and "enhanced++" representations are described here. As I understand it, there are tools that heuristically add the enhanced edges given the basic tree. So in a sense, the enhanced edges are secondary, but I don't know if there will be much need to annotate them manually if they can be added automatically. |
The visualization looks great! One request that I hope would be easy to add: also displaying an orthographic word layer below the token layer, if they differ. E.g. for
The token layer is what is currently displayed ("that ~be a terribly...") and the orthographic word layer would be "that's _ a terribly...", with "that's" spanning 2 tokens). |
concerning the graph structure beyond the tree:i didn't know that paper. astonishing how they can speak for pages about the representation of the syntax and semantics of light verbs in dependencies and not cite mel'cuk! concerning the orthographic layer:this is part of a bigger problem: the non-configurability of the quick page. if we add extra lines, they should be optional, not to make each tree too high (just as the tree height and other parameters). even the gloss we just added takes up space (and i reduced the distance between the lines) i think the orthographic word should definitely be displayable in the main database based part of arborator where everything can be configured. but the quick page is mainly to look at conll files and modify them slightly. |
UD v2 includes a new CoNLL-U specification: http://universaldependencies.org/format.html The changes from v1 are summarized here: http://universaldependencies.org/v2/conll-u.html |
Thanks for an excellent tool! I've started using it for annotating with Universal Dependencies, which normally use the CoNLL-U format. This is an enhancement to CoNLL10, the main difference being the ability to represent multiword tokens. I have been stripping out the multiword lines before uploading to Arborator, but it would be nice if they were preserved, and even better if they were displayed (e.g., by underlining groups of tokens). How hard would this be to add?
The text was updated successfully, but these errors were encountered: