Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve single language detection when words in other languages are quoted #112

Open
schrmh opened this issue Jan 17, 2023 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@schrmh
Copy link

schrmh commented Jan 17, 2023

When I put in german sentences with japanese words quoted then it might happen, that lingua claims it's 100% japanese.
For example:
Wir stoßen an: "かんぱい". Er lächelte. (in english, if you are interested: »We toasted: "kanpai". He smiled«) leads to a ConfidenceValue of 1.0 of japanese. While Wir stoßen an. Er lächelte. has a ConfidenceValue of 0.6014287047855706 for german and 0.0 for japanese (I included all languages for detection).

The expected result in both should be german, maybe with slight japanese confidence in the first case since a japanese word is quoted but it should not be 100% japanese.

@pemistahl pemistahl changed the title Increase single language detection when words in other languages are quoted Improve single language detection when words in other languages are quoted Jan 19, 2023
@pemistahl pemistahl added the enhancement New feature or request label Jan 19, 2023
@pemistahl
Copy link
Owner

Thanks for reaching out to me. I will try to improve language detection for inputs like yours, even though it's not a trivial problem to solve.

@datatalking
Copy link

@pemistahl If you could point me in the general area I could look at a few options to test adding this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants