Is there a way to get vocabulary (vector representation of all words) of the pre-trained model of word embedding? #9295

valmirosjunior · 2022-06-14T14:58:17Z

valmirosjunior
Jun 14, 2022

When I make something like:

WordEmbeddingsModel().pretrained("glove_6B_300", "xx")

Is it possible to make a call for a method to get the whole vocabulary of the model?
Something like a dictionary with all words and their vector representation.

wgeul · 2023-08-19T12:39:11Z

wgeul
Aug 19, 2023

I face the same issue, I want to verify the representativeness of the embeddings by running it through an encoder-decoder model. I need the vocabulary to assess the accuracy.

0 replies

maziyarpanahi · 2023-08-19T12:54:14Z

maziyarpanahi
Aug 19, 2023
Maintainer

There is no public method to make the whole vocab-vector available in WordEmbeddingsModel (specially in Python), we can open a feature request for this.

1 reply

maziyarpanahi Aug 19, 2023
Maintainer

will track the progress here: #13930

maziyarpanahi · 2023-09-13T16:04:49Z

maziyarpanahi
Sep 13, 2023
Maintainer

I believe this is doable for all 3 annotators (WordEmbeddingsModel, Doc2VecModel, and Word2VecModel) in Spark NLP 5.1.1 release: https://github.com/JohnSnowLabs/spark-nlp/releases/tag/5.1.1

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to get vocabulary (vector representation of all words) of the pre-trained model of word embedding? #9295

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Is there a way to get vocabulary (vector representation of all words) of the pre-trained model of word embedding? #9295

valmirosjunior Jun 14, 2022

Replies: 3 comments · 1 reply

wgeul Aug 19, 2023

maziyarpanahi Aug 19, 2023 Maintainer

maziyarpanahi Aug 19, 2023 Maintainer

maziyarpanahi Sep 13, 2023 Maintainer

valmirosjunior
Jun 14, 2022

Replies: 3 comments 1 reply

wgeul
Aug 19, 2023

maziyarpanahi
Aug 19, 2023
Maintainer

maziyarpanahi Aug 19, 2023
Maintainer

maziyarpanahi
Sep 13, 2023
Maintainer