Fast path for already-vectorized inputs (eg pyarrow string arrays) #13661
NickCrews
started this conversation in
New Features & Project Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am working with text data in duckdb, doing bulk operations to perform NER on text. I'd like to make these as fast as possible.
In duckdb, you can register python UDFs. You can tell duckdb to call your UDF with either
str
in my case)Currently, spacy only support as scalar inference API of
nlp(str)
or a batch API ofnlp(Iterable[str])
. If I want to use the batch API (which I assume will be faster), then I still have to pay the translation cost of pyarrow array to Iterable[str].several questions:
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions