How to use multiple regex patterns with Normalizer spark NLP? #2599
Unanswered
SameekshaS
asked this question in
Q&A
Replies: 1 comment 4 replies
-
Spark NLP Tokenizer has minLength and maxLength parameters, you can set the minLength and filter those less than a certain length. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am working with pyspark dataframe. I need to perform tf-idf and for that I am used prior steps of tokenizing, normalization, etc using spark NLP.
I have df that looks like this after applying tokenizer:
The next step is to apply normalizer:
I want to set multiple clean up patterns:
so far
cleanup = ["[^A-Za-z]"]
fulfils the first condition but I don't understand how to use the second one.I tried this:
Help would be much appreciated !
Beta Was this translation helpful? Give feedback.
All reactions