Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of dialect markers from CW (ý) #115

Open
3 tasks done
fbanados opened this issue Jun 18, 2024 · 6 comments
Open
3 tasks done

Handling of dialect markers from CW (ý) #115

fbanados opened this issue Jun 18, 2024 · 6 comments
Assignees

Comments

@fbanados
Copy link
Member

fbanados commented Jun 18, 2024

ý characters are used in CW to denote a sound that change between different Cree dialects. For itwewina, these must be changed to y. We need a more consistent and future-proof way of handling these both in the crk-db repo and in morphodict.

See also #44, #30, #68, UAlbertaALTLab/morphodict#929, UAlbertaALTLab/morphodict#649, UAlbertaALTLab/morphodict#197, UAlbertaALTLab/morphodict#255,
UAlbertaALTLab/morphodict#96 (comment)
UAlbertaALTLab/itwewina#104 (comment),

function proto2sro(string) {
,

  • Ensure generation process currently replaces ý to y for itwewina morphodict
  • Ensure crk-db utilizes correct spellings for merging with AECD (otherwise entries are removed)
  • Decide on a future process to handle these characters uniformly
@fbanados fbanados self-assigned this Jun 18, 2024
@aarppe
Copy link
Contributor

aarppe commented Jun 18, 2024

Here's where this is discussed for the FST side:

giellalt/lang-crk#30

@fbanados
Copy link
Member Author

This issue also affects the merging of entries: e.g. AECD has a yôwênam entry, while CW uses ýowênam. If the merging process does not handle this, definitions are left out.

@fbanados
Copy link
Member Author

fbanados commented Jul 3, 2024

Agreement is that the strict analyzer FST should be slightly relaxed to accept ý -> y. Eventually, also the generator FST should generate ý whenever the data used to build the FST supports it (that is, not every y shall become ý). In this way, main entries with ý can preserve the letter in their identifier, and thus itwewina can later implement an option to either show or hide the ý in the presentation.

@aarppe
Copy link
Contributor

aarppe commented Jul 3, 2024

We can have an optional conversion of ý (->) y on the analysis side of the strict (normative) analyzing FST.

@fbanados
Copy link
Member Author

fbanados commented Jul 5, 2024

I've recompiled the FSTs and these have been deployed to the dev version for testing.

@fbanados
Copy link
Member Author

A pending issue (linguist work) is to ensure that other dictionaries (MD and AECD) are consistently marked with dialect markers when they have no matching entry in CW. A first approach is to collect a list of words that could possibly be misspelled in each dictionary and could have a ý (they have a y), and wait for linguist confirmation of their status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants