-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of FST analysis as part of entries in importjson does not completely reproduce previous behaviour #122
Comments
Cause is that |
e.g. entry for cats should have:
currently, there's 7247 entries that should have an analysis and do not. Entries like also, there's 4834 entries that did not have an analysis and now do have one. Entries like |
There will be several notes to add about these examples, but for starters, This reduces to an issue with the POStag matching, again. Therefore, first step here is to actually document and implement an appropriate linguist-based approach for matching POS tags between dictionaries and FSTs. There has been considerable discussion about this (some in emails), that will be added to the appropriate (new) issue. |
After matching against the referenced https://github.com/giellalt/lang-crk/blob/main/tools/shellscripts/add-explicit-fields-to-crkeng.sh, issues with
|
To fix regression, after ensuring that analysis includes |
Most missing entries were Ipc, and a buggy comparison where |
Some analyses:
|
As for many of the other elements such as mac-âyiwiwin, they are a case where there is orthographical variation at the preverb/prenoun-stem junction, based on reduction in speech. The full form would be maci-âyiwiwin, but because the stem starts with a vowel the preverb-final We started a discussion with Arok about how to deal with these forms. One would be inclined to choose one variant as the more standard form, and then accept the variants (rather than creating two FST lemmas, if one enumerates both in the LEXC file for stems.) Currently these are sort-of catched by the script, in that the
|
Yes, I'd need to revise some parts of how the FST is generated for these sandhi forms. |
(Was "Search regression: my cats / my dogs", but that behaviour has been fixed. Keeping the issue for the major source of inconsistencies that caused the previously observable bug.. See discussion after #122 (comment))
there is some issue (likely associated with the English Phrase FST not adding an
+A
tag) that prevents the dev version from correctly providing an inflected form when searching my cats/my dogs. However, the FST behaviours are equivalent, so a different justification for the failure must be identified to make the problem reproducible. Needs fixing.The text was updated successfully, but these errors were encountered: