Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

建字典树时,当词条数目超过1000000时,总是报错"OutOfMemoryError: GC overhead limit exceeded" #38

Open
gaohang opened this issue Jul 7, 2020 · 2 comments

Comments

@gaohang
Copy link

gaohang commented Jul 7, 2020

字典容量有什么限制吗?
机器内存是64G,内存够用应该。

@hankcs
Copy link
Owner

hankcs commented Jul 7, 2020

这个结构以utf16为码表,不适合储存大词典。汉字的Unicode区间为0x4E00--0x9FA5,比较分散。你可以尝试用字节做码表。

@gaohang
Copy link
Author

gaohang commented Jul 18, 2020

Compared with hashmap, DAT consumes less memory. However, hashmap of 100000000 docs can be build in memory, while DAT with 10000000 docs leads to OOM?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants