Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

词典数据超过300万的时候,会报GC overhead limit exceeded #24

Open
yuye2133 opened this issue May 27, 2019 · 4 comments
Open

Comments

@yuye2133
Copy link

RT,200万的时候,构建时间是91997ms,但300万的时候就开始报错了。然后修改了heap大小也没用,请问这个要怎么办啊。。

@lyrachord
Copy link

太多小对象了,考虑使用BufferedObject技巧,创建大缓冲区,将对象直接转换为字面值方式.这个与C中内存管理相似,此时对象使用int表示,所有操作在环境中执行,这个int可以解释为对象,此方式对于GC相当友好.可以找一下github上的项目HugeXXX

@lyrachord
Copy link

or use the option
-XX:-UseGCOverheadLimit

@yuye2133
Copy link
Author

yuye2133 commented Jul 7, 2019

回复楼上,设置了xx不过没用,好像是源码里有个数组的大小,如果32位可以表示的话就很快,否则会变成64位表示,那就很慢了,最后还是直接缩小词典的量级了。。

@hankcs
Copy link
Owner

hankcs commented Jul 7, 2019

目前的双数组实现的确会比较耗内存,可以考虑将常用字符映射为连续整数。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants