Tensors generated by tok2vec for similarity analysis #13396
Unanswered
darioprencipe
asked this question in
Help: Model Advice
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello spaCy team,
I've recently trained
ner
andtextcat
components for a blank-sheet Italian spaCy language model that I use to both extract features (entities) from a very specific type of documents and to classify these documents. The 2 components share atok2vec
layer. I've done pretraining on a 1.5M documents corpus and the overall NER and categorization results for a small, compact model that is easily trainable in 2 hours on CPU (Apple M1, 8 cores and 8GB memory) are good enough. Long story short, no need for transformers (assuming transformers would perform better, given my document type and use case).I've gone through both your docs and this nice primer, but I can't find a clear enough answer to the following questions:
Doc.tensor
objects) as sort-of contextual embeddings?Before venturing into testing tok2vec tensor-based similarity (e.g. storing these tensors somewhere and analysing them), I'd like to understand from you whether this makes sense or not, from both a theoretical and empirical perspective.
Here's my .cfg file setup so you can see how my training task looks like and understand what those tensors really mean.
Many many thanks in advance,
Dario
Beta Was this translation helpful? Give feedback.
All reactions