Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: Make process of collection independent of the choice of main dictionary #124

Open
7 tasks
fbanados opened this issue Jul 11, 2024 · 1 comment
Open
7 tasks
Assignees

Comments

@fbanados
Copy link
Member

Currently, the process of generating an importjson file strongly depends on the contents of the CW dictionary. We intend to make the following changes:

  • There's repeated code to process each dictionary. Isolate code that processes complete dictionaries in one place
  • Encapsulate source specific changes into each source's class
  • Separate special initialization processes currently done while converting CW into a general class so that any source can be "the first to appear"
  • Turn aggregation from a global operation to a one-to-one process so that source priority can be easily changed.
  • Ensure entries that have no mapping on previously aggregated entries are still included (depends on previous issues)\
  • Refactor instructions to use _altlab versions of alternative dictionaries instead of main (mostly immutable) sources.
  • Change documentation and dependencies on use of FSTs: Currently we only use the relaxed analyzer as a way to account for spelling differences between dictionaries.

It is currently expected that finalizing this process will expand the crkeng_dictionary.importjson file with around 8k senses that are currently being discarded by the matching process.

@fbanados fbanados self-assigned this Jul 11, 2024
@aarppe
Copy link
Contributor

aarppe commented Jul 12, 2024

The fields in CW and MD relevant to dictionary comparison from the LEXC perspective are discussed in #125.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants