Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode error at line #31 in embeddings.py #23

Open
sawan16 opened this issue Mar 27, 2019 · 3 comments
Open

Unicode error at line #31 in embeddings.py #23

sawan16 opened this issue Mar 27, 2019 · 3 comments

Comments

@sawan16
Copy link

sawan16 commented Mar 27, 2019

UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 0: surrogates not allowed

@artetxem
Copy link
Owner

This obviously looks like an encoding problem, but I would need more details to know where it happens. Please report the full stack trace.

@SouravDutta91
Copy link

Sometimes 'utf-8' encoding faces errors while encoding/decoding certain symbols or letters. In those cases, you can either try to ignore such errors by adding errors = 'ignore' with the encoding, or else maybe try some other specific encoding type like latin-1 or ISO-8859-1 for example. Hope this helps.

@suman101112
Copy link

The input embed model is not in correct format. Use model.save_word2vec_format(filename) to save the fasttext or word2vec model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants