SparseSpacyFeaturizer #29

koaning · 2020-09-02T07:50:44Z

If you have a look at all the attributes that spaCy generates for their tokens then you can imagine that some of these features can be useful for machine learning pipelines. To name a few:

is_oov: is the token part of the vocabulary/does it have a vector?
is_stop: is the token a stopword?
lemma_: what is the lemma of the token
pos/tag coarse/fine-grained part of speech information
morphological features
grammatical dependency

These can all have a discrete representation and could be added in general to a Rasa pipeline.

The text was updated successfully, but these errors were encountered:

koaning · 2020-10-21T14:23:09Z

It's probably best to wait until spaCy 3.0 before adding this one.

koaning · 2021-01-21T09:06:48Z

We might also just start with is_oov, is_stop and is_numeric.

koaning mentioned this issue Sep 3, 2020

Lemmatization and CountVectorFeaturizers RasaHQ/rasa#6536

Closed

koaning self-assigned this Apr 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SparseSpacyFeaturizer #29

SparseSpacyFeaturizer #29

koaning commented Sep 2, 2020 •

edited

Loading

koaning commented Oct 21, 2020

koaning commented Jan 21, 2021

SparseSpacyFeaturizer #29

SparseSpacyFeaturizer #29

Comments

koaning commented Sep 2, 2020 • edited Loading

koaning commented Oct 21, 2020

koaning commented Jan 21, 2021

koaning commented Sep 2, 2020 •

edited

Loading