Future of Sphinx client-side search? #12419
Replies: 4 comments 4 replies
-
Thanks for the great write up @wlach!
Also @jayaddison what are your general thoughts? |
Beta Was this translation helpful? Give feedback.
-
From some other research: the Sphinx Extension Survey (last updated Y2020) is a nice project and includes a few search-related extensions; however they appear to be unmaintained nowadays: https://sphinxext-survey.readthedocs.io/en/latest/searchfind.html |
Beta Was this translation helpful? Give feedback.
-
I mentioned that I think the search functionality is fairly good, especially in comparison to other in-browser engines, but I didn't mention why. I think that:
I do generally prefer incremental solutions -- and very reluctant to drop any existing functionality -- but I also think that there are some opportunities to improve client-side search performance (ref #12045). At the moment I'm curious about https://github.com/tinysearch/tinysearch/ and wonder whether anyone else has thoughts/experience with that? |
Beta Was this translation helpful? Give feedback.
-
I believe that @kaycebasques has been doing some work on search usability, though I'm not sure it goes as far as the (needed) rewrite that Will mentions. A |
Beta Was this translation helpful? Give feedback.
-
(continuing discussion from #12391)
Earlier this year, @jayaddison and I made various "improvements" to the client-side search implementation. Mostly this just meant fixing bugs and clearly incorrect code. For example:
#11959
#12041
#11957
Unfortunately in at least some cases this actually made client-side search results worse. In particular some of the bugs were hiding bad search results which are now being displayed to the user (as seen in the above issue).
I did a bit of spelunking through the source code again to see what's going on, and I have to say I think the implementation is just not up to modern standards for a feature like this.
The base of it is a very naive term-based search (i.e. just checking to see if a token is in the document at all) where best practice in information retrieval (for decades) is to at least use term frequency–inverse document frequency. On top of that, it looks like people have layered various things to try to augment those results with things like title and object based search but in a way that feels inconsistent and lacks cohesion-- for example, there's two implementations of title-based search (one of which involves magic numbers) which work alongside each other, yielding duplicate results:
sphinx/sphinx/themes/basic/static/searchtools.js
Line 327 in 48cbb43
sphinx/sphinx/themes/basic/static/searchtools.js
Line 490 in 48cbb43
I think part of the reason this functionality has languished for so long is due to its lack of visibility-- most Sphinx sites are hosted at readthedocs, which uses its own server-side search implementation. However, very notably the cpython docs (which many people depend on) use the client-side version included with Sphinx, so it's still important that it work correctly. Other non-public sites which don't use readthedocs also use this functionality (this is actually how I got to be interested in this area: my current employer uses Sphinx extensively for internal documentation)
To be honest I'm a little scared to change more minor things, as doing this type of thing correctly across languages can be hard. Any small change is just as likely to make things worse as it is better (as we saw above). Instead, I think a ground-up rewrite is called for.
An example of this done right might be mkdocs-material, and you can clearly see that the author of that spent probably hundreds of hours tweaking and tuning their search implementation to make it work well:
I suspect the way forward is to build something new as a Sphinx extension (at least initially) so it can be tested and tuned without impacting the main repository. sphinxcontrib-lunrsearch is probably not exactly what we want (it just provides "search as you type" functionality in the search box) but does show that this type of approach is possible.
Beta Was this translation helpful? Give feedback.
All reactions