Skip to content
This repository has been archived by the owner on Mar 1, 2022. It is now read-only.

新词挖掘的左右邻字丰富程度和内部凝聚程度参数阈值可以自定义吗 #68

Open
zaobao opened this issue Feb 28, 2021 · 1 comment

Comments

@zaobao
Copy link

zaobao commented Feb 28, 2021

我看使用方法里没有设置这两个阈值的参数
这两个参数是不可变的吗

corpus: 必需,file open()、database connection或list
example:corpus = open(file_name, 'r', encoding='utf-8')
corpus = conn.execute(query)
corpus = list(***)
top_k: float or int,表示短语抽取的比例或个数
chunk_size: int,用chunksize分块大小来读取文件
min_n: int,抽取ngram及以上
max_n: int,抽取ngram及以下
min_freq: int,抽取目标的最低词频

top_k是按照词频排序,还是按照左右邻字丰富程度或者内部凝聚程度排序

@zaobao
Copy link
Author

zaobao commented Feb 28, 2021

懒得翻源码了,不知道我理解的对不对:)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant