About text embedding from text tokens. #48

ZhaojunCP · 2024-07-05T09:29:56Z

Thank you for your work! May I ask if you input the tokens of the text into text-embedding-ada-002? As far as I know, text-embedding-ada-002 requires a string rather than a list of integer like tokens. Could I get your explanation? Thank you.

DVampire · 2024-07-06T07:47:37Z

Our OpenAI LLM provider primarily refers to LangChain's implementation. The specific reference code and reasons are as follows:

The LangChain code is https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/embeddings/openai.py.
At line 397, LangChain states that it primarily refers to OpenAI's codebook.
The openai codebook link is https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb. They wrote it this way primarily to address embedding texts that are longer than the model's maximum context length.

Please refer to the above code. If you have any questions, feel free to contact us. Thanks.

DVampire closed this as completed Jul 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About text embedding from text tokens. #48

About text embedding from text tokens. #48

ZhaojunCP commented Jul 5, 2024 •

edited

Loading

DVampire commented Jul 6, 2024

About text embedding from text tokens. #48

About text embedding from text tokens. #48

Comments

ZhaojunCP commented Jul 5, 2024 • edited Loading

DVampire commented Jul 6, 2024

ZhaojunCP commented Jul 5, 2024 •

edited

Loading