We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Learn learn=new Learn(); //训练模型 learn.learnFile(new File("library/xh.txt")); //存储模型 learn.saveModel(new File("library/javaVector")); Word2VEC w1 = new Word2VEC() ; // 加载模型 w1.loadJavaModel("library/javaVector"); System.out.println(w1.distance("朋友")); System.out.println(w1.distance("主席")); System.out.println(w1.distance("邓小平")); System.out.println(w1.distance("魔术队"));
运行结果: Vocab size: 26 Words in train file: 31 sucess train over! 模型加载成功 [] [] [] [] []
The text was updated successfully, but these errors were encountered:
你debug 下map看看是不是全是乱码,或者试试 1 ,2 这种字符是否有结果。。从头到尾必须utf-8编码。。
Sorry, something went wrong.
用作者给的语料终于跑出结果了,比如输入魔术队,结果为奥兰多 0.8990011, 新泽西 0.83124423, 奇才队 0.82303494, 网队 0.6876496等。 做的处理包括如下: 1.确保语料文本文件是UTF-8编码,不是转换即可。 2.作者提供的语料是用制表符切割的词组,一个句子一行,但是代码是根据空格切割,需要将制表符全部替换成空格。或者修改代码:Learn.java 271行,修改成String[] split = temp.split("[\s ]+");支持同时出现多个半角或全角空格,或制表符分隔。 3.发现一个bug Word2Vec中2个distance方法中,min = result.last().score; 应该放在resultSize < result.size()块里吧。 只有当结果数已经大于resultSize,才能将最后一个得分数赋予min,作为以为最小允许得分。结果数不大于resultSize不需要赋予min。
No branches or pull requests
运行结果:
Vocab size: 26
Words in train file: 31
sucess train over!
模型加载成功
[]
[]
[]
[]
[]
The text was updated successfully, but these errors were encountered: