Replies: 1 comment 2 replies
-
Hi @vontainment , TextEmbedding does everything in synchronous mode, it performs cpu-bound operations, it can be considered as a some kind of a primitive. Multiple techniques might be applied in order to increase throughput, e.g. P.S. parallel is useful when you have a large amount of data to compute embeddings for. Regarding the threaded code: I suppose you're writing about thread setting for onnx session, those are internal threads of onnx session and helps to make computations more efficiently in/between operators (more on this) |
Beta Was this translation helpful? Give feedback.
-
So I use this in an async fast API. My question is does using the text embedding work concurrently or does it process one request at a time. I see that there's options for parallel and for threaded in the code. Can these be used to allow concurrent vectors being created¿?
I guess my main question is, using text embedding and calling one of the models means it can process concurrent requests? Or does it require a particular settings to be added. Or does it literally just produce one embedding at a time?. Now I know you can send multiple things to be vectorized at once in a request period but just to clarify what I'm asking is more like if 10 people were pressing embedding from the API given the appropriate resources will it concurrently create those embeddings or does it do one at a time. And if it can do concurrent is it by default or do I have to change your ad a setting somewhere?
Beta Was this translation helpful? Give feedback.
All reactions