Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace indexing step with nearest-neighbor search #131

Open
thatbudakguy opened this issue Nov 14, 2020 · 1 comment
Open

replace indexing step with nearest-neighbor search #131

thatbudakguy opened this issue Nov 14, 2020 · 1 comment
Labels
enhancement New feature or request
Milestone

Comments

@thatbudakguy
Copy link
Member

thatbudakguy commented Nov 14, 2020

this is useful for two reasons:

  • it allows seeding non-exact matches, which in turn allows capturing potentially interesting matches where most (or none!) of the match is exact
  • it helps us solve the multiple readings problem if we represent sound content as a vector so that there is not a 1-to-1 relationship between graphs and phonemes. this would let us compare using more abstract metrics like cosine distance, instead of text-specific metrics like edit distance.

we could start by looking into any libraries that do locality-sensitive hashing, like datasketch or the popular annoy. there's a great explanation of LSH here and a detailed one related to document comparison in 3.4.1 of Mining Massive Datasets.

@thatbudakguy thatbudakguy added the enhancement New feature or request label Nov 14, 2020
@thatbudakguy thatbudakguy added this to the v3.0 milestone Feb 19, 2021
@thatbudakguy
Copy link
Member Author

if we use a vector database as suggested by #152, we could get this type of search built-in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant