Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

400 status code / invalid query when using the " character #131

Open
mosheduminer opened this issue Jan 25, 2023 · 5 comments
Open

400 status code / invalid query when using the " character #131

mosheduminer opened this issue Jan 25, 2023 · 5 comments

Comments

@mosheduminer
Copy link

Hitting the indexes/{index}/search endpoint with a query with a " character inside:

{
  "query": {
    "normal": {
      "ctx": "test\""
    }
  },
}

results in the response

{"status":400,"data":"invalid query: SyntaxError(\"test\\\"\")"}

Maybe there's a decoding bug on my end? If so, it may the HTTP library I'm using. I'm using the docker image.

@ChillFish8
Copy link
Collaborator

Hello! Sorry for the long response, I didn't see the notification :)

The issue is because your query is expecting a closing ", the parser will try treat it as a phrase query so you need "hello world" to match exactly hello world but just " on its own isn't a valid query syntax (see https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html)

@mosheduminer
Copy link
Author

mosheduminer commented Feb 15, 2023

Hi @ChillFish8! Thanks for the response. To clarify, does that mean there is no way to match text with quotes?

I'm asking because I have many texts where " is in middle of a word, and this is expected for the texts I am dealing with (they are used to indicate that the word is a contraction of multiple words, similar to how ' is used in English for words like didn't).

@mosheduminer
Copy link
Author

I guess I should open an issue requesting the ability to escape quotes in the tantivy repo?

@ChillFish8
Copy link
Collaborator

Thanks for the response. To clarify, does that mean there is no way to match text with quotes?

So technically you could support it in the parser, but it won't behave how you expect it to.

Under the hood words like that will be split up so say I had didn't or test"ing they'll be split into didn, t and test, ing
The tokenizer will remove any special characters like that.

@ChillFish8
Copy link
Collaborator

If you're looking for a specific word and don't want that behaviour you'd need to use the string field type which doesn't do any tokenizing and then match for the entire value using a term query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants