Fuzzy finding #121

EmilianoCostantini · 2019-07-09T19:31:07Z

Buku has interesting options involving search with regexps, such as:

buku --sany foo bar baz
buku --deep foo bar baz

It would be great if they could be leveraged from within Bukubrow searchbox, maybe by means of dedicated settings in the addon Preferences.

Consider for instance a bookmark having

Gradle▶Repository▶Central

as substring in either title or comments.
Such bookmark should appear in Bukubrow search results for each of the following keyword combinations:

Gradle▶Repository▶Central
Gradle Repository Central
Grad Repo Cent
grad repo cent
cent repo grad

just like it does in the ordinary bookmark sidebar's searchbox (Ctrl-B).

The text was updated successfully, but these errors were encountered:

samhh · 2019-07-10T17:19:53Z

Is this feature request essentially to add fuzzy matching? That's something I've wondered about.

EmilianoCostantini · 2019-07-12T20:03:22Z

Fuzzy matching would be awesome, but even more taxing than what I was thinking.
My idea was deterministically calculate the keywords permutations, then add regExps.
No stochasticity involved.

Say for instance the user enters in the searchbox the keywords:

xxxxxx yyyyyy zzzzzz wwwwww

That would be 4 different words, that can be ordered in 4! (that is, 24) different ways:

xxxxxx yyyyyy zzzzzz wwwwww
xxxxxx yyyyyy wwwwww zzzzzz
xxxxxx wwwwww zzzzzz yyyyyy
wwwwww yyyyyy zzzzzz xxxxxx
zzzzzz yyyyyy xxxxxx wwwwww
...

(and so on and so forth, up till the 24th possible permutation.)

Each permutation should be converted to regexp:

/([\s\S]*)xxxxxx([\s\S]*)yyyyyy([\s\S]*)zzzzzz([\s\S]*)wwwwww([\s\S]*)/i
/([\s\S]*)xxxxxx([\s\S]*)yyyyyy([\s\S]*)wwwwww([\s\S]*)zzzzzz([\s\S]*)/i
/([\s\S]*)xxxxxx([\s\S]*)wwwwww([\s\S]*)zzzzzz([\s\S]*)yyyyyy([\s\S]*)/i
/([\s\S]*)wwwwww([\s\S]*)yyyyyy([\s\S]*)zzzzzz([\s\S]*)xxxxxx([\s\S]*)/i
/([\s\S]*)zzzzzz([\s\S]*)yyyyyy([\s\S]*)xxxxxx([\s\S]*)wwwwww([\s\S]*)/i
...

Then each bookmark matching at least one of the 24 regexps in either title or description/comments should be added to results.

As you can check on RegExr, such a bookmark would be —for instance— one containing the substrings:

foobar--xxxxxx----foobar foobar foobar...YYYYYY...foobar foobar foobar****ZZzzzz**foobar foobar foobar__wwwWWW____foobar

regardless both the order and the case, just as long as each and every keyword is there.

Even making it as efficient as possible, though, such kind of business logic could relevantly impact performances; therefore it could be good to make it optional by means of dedicated item/s in the addon Preferences page.

samhh · 2019-07-12T21:39:53Z

I have surprisingly little knowledge about algorithms so I'll have to do some research on what you've touched on in your comment before coming back to this ticket, but I would like to incorporate a feature like this at some point. It might be a while though, I need to prioritise performance issues with enormous databases.

EmilianoCostantini · 2019-07-15T18:00:52Z

You're doing a priceless job, really.
The uncanny decision to remove the 'Description' field by Mozilla folks has caused me some serious discomfort.
Buku coupled with your addon could really be a life saver for those who, like me, heavily depend on such field.

P.S. as you can see I've tried to move the feature request upstream; this seems to make more sense to me.

samhh · 2019-07-15T19:06:13Z

Ah, a heavy user of the description field, you are the perfect target audience for a question I've been pondering.

Which of these would be preferable to you?

Search name by default, and search the description specifically with > (present behaviour)
Search both name and description by default, with priority given to name matches

EmilianoCostantini · 2019-07-17T16:50:13Z

The name is almost negligible to me, therefore I would choose name and description without hesitation :)

samhh · 2019-09-01T11:28:04Z

Note to self: Given that there's work ongoing to try and move the filtering logic over to the host, when that happens it may be possible to easily implement fuzzy finding for each regex-matched group using a library like this. fzf (written in Go) is a good benchmark for how intuitive the matching can be.

samhh changed the title ~~Regexps in SearchBox~~ Fuzzy finding Jul 12, 2019

EmilianoCostantini mentioned this issue Jul 15, 2019

Enhanced Regexp Search | Permutations, Fuzzy matching jarun/buku#404

Closed

samhh mentioned this issue Jan 1, 2021

searching: order should not matter #141

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuzzy finding #121

Fuzzy finding #121

EmilianoCostantini commented Jul 9, 2019

samhh commented Jul 10, 2019

EmilianoCostantini commented Jul 12, 2019

samhh commented Jul 12, 2019

EmilianoCostantini commented Jul 15, 2019

samhh commented Jul 15, 2019

EmilianoCostantini commented Jul 17, 2019

samhh commented Sep 1, 2019 •

edited

Loading

Fuzzy finding #121

Fuzzy finding #121

Comments

EmilianoCostantini commented Jul 9, 2019

samhh commented Jul 10, 2019

EmilianoCostantini commented Jul 12, 2019

samhh commented Jul 12, 2019

EmilianoCostantini commented Jul 15, 2019

samhh commented Jul 15, 2019

EmilianoCostantini commented Jul 17, 2019

samhh commented Sep 1, 2019 • edited Loading

samhh commented Sep 1, 2019 •

edited

Loading