German lemmatizer confused by capitalization #9466
giopina
started this conversation in
Language Support
Replies: 1 comment
-
The current extremely simple lookup lemmatizer is just not very good, a closely related discussion: #8695 (comment). It doesn't know anything about POS, casing, or spelling variation. We do have some good news to report, though: we have internal work-in-progress on a statistical lemmatizer that should be much, much better than the lookup lemmatizer, which was only meant to be a stopgap solution and has been the default for German for way too long at this point. With the new lemmatizer, the accuracy on TIGER is ~97%. Keep an eye out for an official announcement soon! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm having issues with the lemmatizer for German (both in v3.0.6 and v2.3.2, using both
de_core_news_lg
andde_dep_news_trf
).Basically, the lemmatizer gets confused by the capitalization of the verb, and can't assign the right lemma to it (while the POS tagger is actually correct).
Code:
Output:
(the lemma of meldet should be melden)
Is there some general solution/fix to this issue?
Originally posted by @giopina in #2668 (comment)
Beta Was this translation helpful? Give feedback.
All reactions