Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzy finding #121

Open
EmilianoCostantini opened this issue Jul 9, 2019 · 7 comments
Open

Fuzzy finding #121

EmilianoCostantini opened this issue Jul 9, 2019 · 7 comments

Comments

@EmilianoCostantini
Copy link

Buku has interesting options involving search with regexps, such as:

  • buku --sany foo bar baz
  • buku --deep foo bar baz

It would be great if they could be leveraged from within Bukubrow searchbox, maybe by means of dedicated settings in the addon Preferences.

Consider for instance a bookmark having

  • Gradle▶Repository▶Central

as substring in either title or comments.
Such bookmark should appear in Bukubrow search results for each of the following keyword combinations:

  • Gradle▶Repository▶Central
  • Gradle Repository Central
  • Grad Repo Cent
  • grad repo cent
  • cent repo grad

just like it does in the ordinary bookmark sidebar's searchbox (Ctrl-B).

@samhh
Copy link
Owner

samhh commented Jul 10, 2019

Is this feature request essentially to add fuzzy matching? That's something I've wondered about.

@samhh samhh changed the title Regexps in SearchBox Fuzzy finding Jul 12, 2019
@EmilianoCostantini
Copy link
Author

Fuzzy matching would be awesome, but even more taxing than what I was thinking.
My idea was deterministically calculate the keywords permutations, then add regExps.
No stochasticity involved.

Say for instance the user enters in the searchbox the keywords:

  • xxxxxx yyyyyy zzzzzz wwwwww

That would be 4 different words, that can be ordered in 4! (that is, 24) different ways:

  1. xxxxxx yyyyyy zzzzzz wwwwww
  2. xxxxxx yyyyyy wwwwww zzzzzz
  3. xxxxxx wwwwww zzzzzz yyyyyy
  4. wwwwww yyyyyy zzzzzz xxxxxx
  5. zzzzzz yyyyyy xxxxxx wwwwww
  6. ...

(and so on and so forth, up till the 24th possible permutation.)

Each permutation should be converted to regexp:

  1. /([\s\S]*)xxxxxx([\s\S]*)yyyyyy([\s\S]*)zzzzzz([\s\S]*)wwwwww([\s\S]*)/i
  2. /([\s\S]*)xxxxxx([\s\S]*)yyyyyy([\s\S]*)wwwwww([\s\S]*)zzzzzz([\s\S]*)/i
  3. /([\s\S]*)xxxxxx([\s\S]*)wwwwww([\s\S]*)zzzzzz([\s\S]*)yyyyyy([\s\S]*)/i
  4. /([\s\S]*)wwwwww([\s\S]*)yyyyyy([\s\S]*)zzzzzz([\s\S]*)xxxxxx([\s\S]*)/i
  5. /([\s\S]*)zzzzzz([\s\S]*)yyyyyy([\s\S]*)xxxxxx([\s\S]*)wwwwww([\s\S]*)/i
  6. ...

Then each bookmark matching at least one of the 24 regexps in either title or description/comments should be added to results.

As you can check on RegExr, such a bookmark would be —for instance— one containing the substrings:

  • foobar--xxxxxx----foobar foobar foobar...YYYYYY...foobar foobar foobar****ZZzzzz**foobar foobar foobar__wwwWWW____foobar

regardless both the order and the case, just as long as each and every keyword is there.


Even making it as efficient as possible, though, such kind of business logic could relevantly impact performances; therefore it could be good to make it optional by means of dedicated item/s in the addon Preferences page.

@samhh
Copy link
Owner

samhh commented Jul 12, 2019

I have surprisingly little knowledge about algorithms so I'll have to do some research on what you've touched on in your comment before coming back to this ticket, but I would like to incorporate a feature like this at some point. It might be a while though, I need to prioritise performance issues with enormous databases.

@EmilianoCostantini
Copy link
Author

You're doing a priceless job, really.
The uncanny decision to remove the 'Description' field by Mozilla folks has caused me some serious discomfort.
Buku coupled with your addon could really be a life saver for those who, like me, heavily depend on such field.

P.S. as you can see I've tried to move the feature request upstream; this seems to make more sense to me.

@samhh
Copy link
Owner

samhh commented Jul 15, 2019

Ah, a heavy user of the description field, you are the perfect target audience for a question I've been pondering.

Which of these would be preferable to you?

  • Search name by default, and search the description specifically with > (present behaviour)
  • Search both name and description by default, with priority given to name matches

@EmilianoCostantini
Copy link
Author

The name is almost negligible to me, therefore I would choose name and description without hesitation :)

@samhh
Copy link
Owner

samhh commented Sep 1, 2019

Note to self: Given that there's work ongoing to try and move the filtering logic over to the host, when that happens it may be possible to easily implement fuzzy finding for each regex-matched group using a library like this. fzf (written in Go) is a good benchmark for how intuitive the matching can be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants