Base package for upcoming swiss-german nlp sources.
The source will be placed in the /data
-folder, in order to separate it from the code.
A Package can build multiple things, such as:
- wordlist
- POS
- NER
- all
This list will expand in the future.
Here is a short description of the build-types.
This build will generate a wordlist.txt
-file, which contains a list of words (one word per line), that can be used for a dictionary.
This build will generate a pos.txt
-file, which contains a pair/tuple/dictionary of a word and a tag, divided by a slash (/
).
e.g. Chäschüechli/N
For other type of words, see "A Universal Part-of-Speech Tagset".
This build will generate a ner.txt
-file, which contains a pair/tuple/dictionary of a word and a tag, divided by a slash (/
).
e.g. Wilhelm Tell/PERSON
For other type of entities, see "Named Entity Recognition".
Builds all available builds.
When creating a package, fork this repository or create a new, with the contents of this repository.
This repository serves as a base template for sources.