Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative spellings from CW #123

Open
fbanados opened this issue Jul 9, 2024 · 6 comments
Open

Alternative spellings from CW #123

fbanados opened this issue Jul 9, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request ready-for-review source:CW Arok Wolvengrey's Cree Words

Comments

@fbanados
Copy link
Member

fbanados commented Jul 9, 2024

\alt tags in toolbox refer to alternative spellings (of the dictionary head) that should be included as part of the lexicographical info presented for entries. generation of linguistInfo.analysis should include this info as well.

@fbanados fbanados added enhancement New feature or request source:CW Arok Wolvengrey's Cree Words labels Jul 9, 2024
@fbanados fbanados self-assigned this Jul 9, 2024
@aarppe
Copy link
Contributor

aarppe commented Jul 9, 2024

Here are counts of how often entries have alternative spellings, and how many such alternatives per entry:

less crk/Wolvengrey_altlab.toolbox | gawk 'BEGIN { FS="\n"; RS=""; } $0 ~ /\n\\alt/ { nalt++; alt=0; for(i=1; i<=NF; i++) if(match($i,"^.alt ")!=0) alt++; n[alt]++; } END { printf "n(alt)\talt\n"; for(i in n) printf "%i\t%i\n", n[i], i; }'
n(alt)	alt
20918	0
4570	1
777	2
185	3
53	4
24	5
5	6
3	7

@aarppe
Copy link
Contributor

aarppe commented Jul 9, 2024

E.g., entry pâmwayês (IPC) has seven \alt fields:

\alt maywês
\alt maywêsk
\alt mwayê
\alt mwayês
\alt pâmayas
\alt pâmayês
\alt pâmoyês

These could be presented in a tabular format, like the following:

Alternatives
maywês
maywêsk
mwayê
mwayês
pâmayas
pâmayês
pâmoyês

I can't imagine what could be meaningful row labels; perhaps this could rather be a single column table.

In any event, Alternatives would then require some relabelings, e.g. Different spellings in plain English, and something else in Cree.

fbanados added a commit that referenced this issue Jul 24, 2024
fbanados added a commit that referenced this issue Jul 24, 2024
There was a bug in entry merging when there are notes.  This manifested
in pinawêw not showing up for "she sheds", as the removal of the note
left the entry as "s/hesheds" instead.
@fbanados
Copy link
Member Author

Implemented:
Screenshot 2024-07-24 at 5 09 59 PM

This is still missing relabellings, especially for cree.

@aarppe
Copy link
Contributor

aarppe commented Jul 26, 2024

The relabelings have been added to crk.altlab.tsv.

@aarppe
Copy link
Contributor

aarppe commented Jul 30, 2024

I realized that we could have as a first column the dialect that a variant pertains to, as Arok codes that as wC for Woods Cree and sC for Swampy Cree. For instance, for the CW entry awêýiwa, one could present the following alternative forms in a tabular format:

Dial Alt
pC awîniwa
sC awêńiwa
wC awîthiwa

The default dialect would be pC for Plains Cree. I can add relabelings for the dialect codes, perhaps using the ISO codes for the linguistic relabelings, the full language names for the plain English ones, and then the exonyms for the nêhiyawêwin ones.

@aarppe
Copy link
Contributor

aarppe commented Aug 15, 2024

We want the following features:

  • the orthographical variants should be accepted and suggested by the spellchecker --> need to be included in the general normative model.
  • the orthographical variants should not be generated in the inflectional paradigms --> need to excluded from the dictionary normative model.

In LEXC, this can be coded as follows:

LEXICON NOUNS_STEMS
...
maci-ayiwiwin:maci-ayiwiwin NI ;
maci-ayiwiwin:mac-âyiwiwin NI_sandhi ;
maci-ayiwiwin:macâyiwiwin NI_sandhi ;
...

LEXICON NI_sandhi
@P.Var.Sandhi@ NI ;

LEXICON NOUN_ENDLEX
...
@R.Var.Sandhi@+Var/Sandhi:@R.Var.Sandhi@ # ;
...

Then, the normative generator for dictionary purposes needs to filter out the variant cases. However, in spell-checking, we do want to recognize and generate those forms, so they will need to remain in the general normative analyzer and generator. The descriptive analyzers will always include the variants.

We will also need to consider the spellrelax rules, so that they do not duplicate analyses for the variants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ready-for-review source:CW Arok Wolvengrey's Cree Words
Projects
None yet
Development

No branches or pull requests

2 participants