You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using Sphinx for documentation, and I use a MyST Parser
My documentation is technical, so I often want to search some command line options, like for example --run. However, the "-" (dash, munis) symbols are ignored in search field, and I see lots of unrelated results with just word "run".
I also tried to use quotes, like '--run' in search, that did not help. Found out there is also a request for that:
would be helpful, such as not splitting words in quotes
More of that, the "-" is treated as separator (like a space), and if I search for example --start-program, I get unrelated results with the word "starting" for example.
The feature request is to add the possibility to configure sphinx in that way so it recognizes some symbols as normal letters. For example, in conf.py:
/** * Default splitQuery function. Can be overridden in ``sphinx.search`` with a * custom function per language. * * The regular expression works by splitting the string on consecutive characters * that are not Unicode letters, numbers, underscores, or emoji characters. * This is the same as ``\W+`` in Python, preserving the surrogate pair area. */if(typeofsplitQuery==="undefined"){varsplitQuery=(query)=>query.split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}-]+/gu).filter(term=>term)// remove remaining empty strings}
and also tried to change the regexp in splitting words in search/init.py#L71:
_word_re=re.compile(r'[\w-]+')
Which is used in split method:
defsplit(self, input: str) ->list[str]:
""" This method splits a sentence into words. Default splitter splits input at white spaces, which should be enough for most languages except CJK languages. """returnself._word_re.findall(input)
Seems it is not sufficient. I guess, the "-" are stripped from search line also somewhere else.
Would be glad to hear any suggestions.
Also, in the comment from above:
Default splitQuery function. Can be overridden in sphinx.search with a custom function per language.
My documentation is in English, so I guess that would still require the separate option.
The text was updated successfully, but these errors were encountered:
picnixz
changed the title
Add ability to treat "-" as a normal letter, to not split search term into several words
[search] Add ability to treat "-" as a normal letter, to not split search term into several words
Jun 5, 2024
This is something I would agree, but we could solve this issue by implementing the quoted-based match. However, until we fix the current search algorithm, I don't think we should push for new features (or maybe we can?).
I am using Sphinx for documentation, and I use a MyST Parser
My documentation is technical, so I often want to search some command line options, like for example
--run
. However, the "-" (dash, munis) symbols are ignored in search field, and I see lots of unrelated results with just word "run".I also tried to use quotes, like
'--run'
in search, that did not help. Found out there is also a request for that:More of that, the "-" is treated as separator (like a space), and if I search for example
--start-program
, I get unrelated results with the word "starting" for example.The feature request is to add the possibility to configure sphinx in that way so it recognizes some symbols as normal letters. For example, in conf.py:
I have tried to place the "-" to searchtools.js#L167:
and also tried to change the regexp in splitting words in search/init.py#L71:
Which is used in split method:
Seems it is not sufficient. I guess, the "-" are stripped from search line also somewhere else.
Would be glad to hear any suggestions.
Also, in the comment from above:
My documentation is in English, so I guess that would still require the separate option.
The text was updated successfully, but these errors were encountered: