Replies: 1 comment 3 replies
-
If I'm not mistaken the reason is that UMAP - the algo which reduces the BERT embeddings to a manageable dimension (5) for HDBSCAN is stochastic - meaning that it relies on randomly generated values that effect output over multiple runs. This is expected and is likely what is causing the behavior you are seeing. The only way to deal with this is to seed UMAP with a static value to control its random number generator. It doesn't matter what value you use - just that you use the same number across all runs that you want to produce the same output for. You can do this by instantiating a UMAP instance before calling
|
Beta Was this translation helpful? Give feedback.
-
Given a trained model, if the same document(s) is passed to transform, is the result of the
transform
method expected to be the same for each run (assuming same model each time)?For example, if I have trained a model and call
transform
for a set of 10 documents, should the result of those 10 documents be the same EVERY time (of course thetopics
of the 10 documents may differ).There are no side effects of predicting documents correct? And there are no assumptions of the documents correct?
The reason I ask is because currently, I am seeing some documents receiving the
-1
unknown topic_id, but then on subsequent runs, a valid topic_id is predicted. Before digging into further, I want to make sure my expectation is correct.Thank you,
Derek
Here's a snippet of code if this helps:
Beta Was this translation helpful? Give feedback.
All reactions