Alternative spellings from CW #123

fbanados · 2024-07-09T00:54:38Z

\alt tags in toolbox refer to alternative spellings (of the dictionary head) that should be included as part of the lexicographical info presented for entries. generation of linguistInfo.analysis should include this info as well.

The text was updated successfully, but these errors were encountered:

aarppe · 2024-07-09T00:57:40Z

Here are counts of how often entries have alternative spellings, and how many such alternatives per entry:

less crk/Wolvengrey_altlab.toolbox | gawk 'BEGIN { FS="\n"; RS=""; } $0 ~ /\n\\alt/ { nalt++; alt=0; for(i=1; i<=NF; i++) if(match($i,"^.alt ")!=0) alt++; n[alt]++; } END { printf "n(alt)\talt\n"; for(i in n) printf "%i\t%i\n", n[i], i; }'
n(alt)	alt
20918	0
4570	1
777	2
185	3
53	4
24	5
5	6
3	7

aarppe · 2024-07-09T01:06:54Z

E.g., entry pâmwayês (IPC) has seven \alt fields:

\alt maywês
\alt maywêsk
\alt mwayê
\alt mwayês
\alt pâmayas
\alt pâmayês
\alt pâmoyês

These could be presented in a tabular format, like the following:

Alternatives
	maywês
	maywêsk
	mwayê
	mwayês
	pâmayas
	pâmayês
	pâmoyês

I can't imagine what could be meaningful row labels; perhaps this could rather be a single column table.

In any event, Alternatives would then require some relabelings, e.g. Different spellings in plain English, and something else in Cree.

There was a bug in entry merging when there are notes. This manifested in pinawêw not showing up for "she sheds", as the removal of the note left the entry as "s/hesheds" instead.

fbanados · 2024-07-24T23:12:58Z

Implemented:

This is still missing relabellings, especially for cree.

aarppe · 2024-07-26T15:36:18Z

The relabelings have been added to crk.altlab.tsv.

aarppe · 2024-07-30T21:34:56Z

I realized that we could have as a first column the dialect that a variant pertains to, as Arok codes that as wC for Woods Cree and sC for Swampy Cree. For instance, for the CW entry awêýiwa, one could present the following alternative forms in a tabular format:

Dial	Alt
pC	awîniwa
sC	awêńiwa
wC	awîthiwa

The default dialect would be pC for Plains Cree. I can add relabelings for the dialect codes, perhaps using the ISO codes for the linguistic relabelings, the full language names for the plain English ones, and then the exonyms for the nêhiyawêwin ones.

aarppe · 2024-08-15T10:23:25Z

We want the following features:

the orthographical variants should be accepted and suggested by the spellchecker --> need to be included in the general normative model.
the orthographical variants should not be generated in the inflectional paradigms --> need to excluded from the dictionary normative model.

In LEXC, this can be coded as follows:

LEXICON NOUNS_STEMS
...
maci-ayiwiwin:maci-ayiwiwin NI ;
maci-ayiwiwin:mac-âyiwiwin NI_sandhi ;
maci-ayiwiwin:macâyiwiwin NI_sandhi ;
...

LEXICON NI_sandhi
@P.Var.Sandhi@ NI ;

LEXICON NOUN_ENDLEX
...
@R.Var.Sandhi@+Var/Sandhi:@R.Var.Sandhi@ # ;
...

Then, the normative generator for dictionary purposes needs to filter out the variant cases. However, in spell-checking, we do want to recognize and generate those forms, so they will need to remain in the general normative analyzer and generator. The descriptive analyzers will always include the variants.

We will also need to consider the spellrelax rules, so that they do not duplicate analyses for the variants.

fbanados added enhancement New feature or request source:CW Arok Wolvengrey's Cree Words labels Jul 9, 2024

fbanados self-assigned this Jul 9, 2024

fbanados added a commit that referenced this issue Jul 24, 2024

Include spelling variants for #123

f25cdf9

fbanados added the ready-for-review label Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative spellings from CW #123

Alternative spellings from CW #123

fbanados commented Jul 9, 2024 •

edited by aarppe

Loading

aarppe commented Jul 9, 2024

aarppe commented Jul 9, 2024 •

edited

Loading

fbanados commented Jul 24, 2024

aarppe commented Jul 26, 2024

aarppe commented Jul 30, 2024

aarppe commented Aug 15, 2024

Alternative spellings from CW #123

Alternative spellings from CW #123

Comments

fbanados commented Jul 9, 2024 • edited by aarppe Loading

aarppe commented Jul 9, 2024

aarppe commented Jul 9, 2024 • edited Loading

fbanados commented Jul 24, 2024

aarppe commented Jul 26, 2024

aarppe commented Jul 30, 2024

aarppe commented Aug 15, 2024

fbanados commented Jul 9, 2024 •

edited by aarppe

Loading

aarppe commented Jul 9, 2024 •

edited

Loading