nlp.pipe() is not faster than processing by example #5431
-
Hello! I need to process incoming texts from one queue and send the results to another queue. Since processing by one example is too slow I considered using nlp.pipe() function but as it returns a generator, in order to get the results themselves I need to go through each one of them but this doesn't give me any time saving.
results in this output:
And the one here:
results in
Is this expected or I just don't know how to use it right? |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
32 texts probably isn't enough to see a whole lot of benefit from using It does depend a lot on the language/components, but Using a much larger number of texts, try comparing the run time just for spacy and I think you will see a clearer difference:
|
Beta Was this translation helpful? Give feedback.
-
Thank you for your reply @adrianeboyd, but I also tried with the sizes of 512 and 1024 and the time increased linearly just as it does without the |
Beta Was this translation helpful? Give feedback.
-
Even if it is suspiciously slow for a model to process, my main question is still in the difference between batch processing and processing one by one which is none in my case. |
Beta Was this translation helpful? Give feedback.
-
What components do you have in your pipeline? |
Beta Was this translation helpful? Give feedback.
-
@adrianeboyd you were right! The problem was in my custom bpe tokenizer. It wasn't efficient so I replaced it with the one that supported batch processing and now I'm so happy to see the difference. THANK YOU! |
Beta Was this translation helpful? Give feedback.
@adrianeboyd you were right! The problem was in my custom bpe tokenizer. It wasn't efficient so I replaced it with the one that supported batch processing and now I'm so happy to see the difference. THANK YOU!