Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

characters been replaced from original string #1142

Open
yarnping opened this issue Aug 27, 2024 · 4 comments
Open

characters been replaced from original string #1142

yarnping opened this issue Aug 27, 2024 · 4 comments
Labels

Comments

@yarnping
Copy link

Hi team

we're using this to split text to sentences, but we found that some charater been replaced after splitting

e.g, ASCII 32 and 160
how can I keep the orignal character, I need to do some comparing work with original text

@spencermountain
Copy link
Owner

hey yarnping - sure, I'm happy to help. You're right, it should never miss characters after splitting sentences.
Can you create an example of it failing?
thanks

@yarnping
Copy link
Author

yarnping commented Aug 29, 2024

than you, here 's the pic from sublime text
image

<script>
        const text = "“I . . . maybe. I must say, the line between excellent career choice and critical life screwup is getting a bit blurry.”";
        const doc = nlp(text);
        const sentences = doc.sentences().out('array')
        console.log(text);
        console.log(sentences[0]);
    </script>

@spencermountain
Copy link
Owner

hey yarnping, i think the unicode characters that were giving your trouble may be missing from your example text. This case works as expected for me:

nlp(`I . . . maybe. I must say, the line between `).debug()

maybe the github UI cleaned them up somehow? let me know if I can help reproducing this problem
thanks

@yarnping
Copy link
Author

example.txt
sure, here's the exmple text

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants