Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About text embedding from text tokens. #48

Closed
ZhaojunCP opened this issue Jul 5, 2024 · 1 comment
Closed

About text embedding from text tokens. #48

ZhaojunCP opened this issue Jul 5, 2024 · 1 comment

Comments

@ZhaojunCP
Copy link

ZhaojunCP commented Jul 5, 2024

Thank you for your work! May I ask if you input the tokens of the text into text-embedding-ada-002? As far as I know, text-embedding-ada-002 requires a string rather than a list of integer like tokens. Could I get your explanation? Thank you.
image

@DVampire
Copy link
Collaborator

DVampire commented Jul 6, 2024

Our OpenAI LLM provider primarily refers to LangChain's implementation. The specific reference code and reasons are as follows:

  1. The LangChain code is https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/embeddings/openai.py.
    At line 397, LangChain states that it primarily refers to OpenAI's codebook.
  2. The openai codebook link is https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb. They wrote it this way primarily to address embedding texts that are longer than the model's maximum context length.

Please refer to the above code. If you have any questions, feel free to contact us. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants