Optimizing topic extraction from the amount of topics created #1477
Replies: 1 comment 1 reply
-
I just happened to be answering issues, so a quick reply it is!
The Instead, I would advise controlling the number of topics generated with HDBSCAN instead. I think the best source for you would be the page on best practices. It explains the use of HDBSCAN and how to control the number of topics generated. Moreover, it contains a number of best practices to quickly get good results. |
Beta Was this translation helpful? Give feedback.
-
Hello, greetings to you Mr. Maarten. I'm new to bertopic and wanted to ask something about how bertopic works.
I'm currently working with combination of long and short documents and i want to extract the topic for each documents using this bertopic. But the problem i'm facing with is that i have kinda large dataset (~35000 row) and when i run the simple bertopic code
without setting the nr_topics, it will create about 500 topics from the data. But if i set the nr_topic to let's say 150 topics, it mostly creates the topic that i don't really want. After some more experiments, i've found that if i don't set the nr_topics, the topics created is mostly as the same as i wanted so i can just manually merged it using merge_topics to the topic which is makes sense to me to be in the same topic. But, if i set the nr_topics to fewer number, it seems that they auto merged some topics to the topics that i don't want them into the same topic. So my question is, is this something that you usually found when working with bertopic? If yes, then how would you suggest me to do? Should i keep using nr_topic that is unset or you have other method that i should try? because having to look over 500 topics and merge it manually seems really need an extra work for it. But if it is something you have to do with the bertopic, then would you like to explain me how a document that should be in same topic can be divided to some topics by the model? Any kind of help would be greatly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions