New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

replace indexing step with nearest-neighbor search #131

Open

thatbudakguy opened this issue Nov 14, 2020 · 1 comment

Labels

Milestone

Member

thatbudakguy commented Nov 14, 2020 •

edited

Loading

this is useful for two reasons:

it allows seeding non-exact matches, which in turn allows capturing potentially interesting matches where most (or none!) of the match is exact
it helps us solve the multiple readings problem if we represent sound content as a vector so that there is not a 1-to-1 relationship between graphs and phonemes. this would let us compare using more abstract metrics like cosine distance, instead of text-specific metrics like edit distance.

we could start by looking into any libraries that do locality-sensitive hashing, like datasketch or the popular annoy. there's a great explanation of LSH here and a detailed one related to document comparison in 3.4.1 of Mining Massive Datasets.

thatbudakguy added the enhancement label

thatbudakguy added this to the v3.0 milestone

Member Author

thatbudakguy commented Apr 19, 2024

if we use a vector database as suggested by #152, we could get this type of search built-in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment