use HanLP or Jieba create word_cloud_cn #430

TianFengshou · 2018-09-10T14:11:18Z

pyhanlp is One of
the most powerful natural language processing libraries in Chinese
today, and it's extremely easy to use.You can use 'PIP install pyhanlp'.
To install it,like Jieba.

Its level of identity of named entity,word segmentation was better than jieba,
and has more ways to do it.You'll save a lot of time when you use it.

And thanks to its excellent performance,
when we handle a large number of Chinese texts
We don't have to use the User-defined dictionaries。

更新源项目代码

pyhanlp is One of the most powerful natural language processing libraries in Chinese today, and it's extremely easy to use.You can use 'PIP install pyhanlp'. To install it,like Jieba. Its level of identity of named entity,word segmentation was better than jieba, and has more ways to do it.You'll save a lot of time when you use it. And thanks to its excellent performance, when we handle a large number of Chinese texts We don't have to use the User-defined dictionaries.

jcfr · 2018-09-10T14:49:42Z

Thanks for the contribution.

Would it be possible to squash the commits together and fix the styling issues ?

To run the style check locally, you could run flask8 from the source directory.

Wordcloud is a very good tools, but if you want to create Chinese wordcloud only wordcloud is not enough. The file shows how to use wordcloud with Chinese. First, you need a Chinese word segmentation library jieba or HanLp.You can use 'PIP install jieba' or 'PIP install pyhanlp' or to install it.As you can see,at the same time using wordcloud with jieba or HanLP very convenient.While jieba is lighter, hanlp requires more downloads, but is more powerfulHanLP's level of identity of named entity,word segmentation was better than jieba,and has more ways to do it.You'll save a lot of time when you use it.

TianFengshou · 2018-09-19T05:13:56Z

What do I need to do now?

amueller · 2018-09-21T21:19:47Z

examples/wordcloud_cn.py

-               max_font_size=100, random_state=42, width=1000, height=860, margin=2,)
+# The function for processing text with HaanLP
+def pyhanlp_processing_txt(text, isUseStopwordsOfHanLP=True):
+    CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary")


JClass is not defined?

If use hanlp must install jpype1 .I forgot about it......

"pip install jpype1"......

Pyhanlp provides a python interface for hanlp. Hanlp is currently the best-performing open source Chinese natural language processing class library, but it is implemented in Java, so we must use jpype to call Java classes.

Ok but you must also import it, right?

You need to make sure the continuous integration passes.

I don't need to import it.The pyhanlp import jpype1 in the __init__.py.

from jpype import JClass, startJVM, getDefaultJVMPath, isThreadAttachedToJVM, attachThreadToJVM

If use pyhanlp,In line 114 there's a line like “from pyhanlp import *“
If use jieba,In line 106 there's a line like “import jieba“。This increases efficiency, and the code is feasible.

If the tests don't pass (see the 8 failing checks below) there's an issue with the code.

从源代码同步

now, you can use pyhanlp to create wordcloud with Chinese.

amueller · 2018-12-13T15:18:18Z

the tests still don't pass, meaning there are issues with your implementation. Are you intending to fix those?

TianFengshou · 2018-12-13T15:27:44Z

I tried it many times, but it always stops the class library called in pyhanlp. I don't quite understand why this is happening?

amueller · 2018-12-13T15:45:47Z

Oh it's actually just flake8 failing now, it looks like.

jcfr · 2018-12-13T15:50:28Z

Indeed, errors are the following:

./examples/wordcloud_cn.py:81: [F405] 'HanLP' may be undefined, or defined from star imports: pyhanlp
    HanLP.Config.ShowTermNature = False
    ^
./examples/wordcloud_cn.py:82: [F405] 'HanLP' may be undefined, or defined from star imports: pyhanlp
    CRFnewSegment = HanLP.newSegment("viterbi")
                    ^
./examples/wordcloud_cn.py:85: [E712] comparison to True should be 'if cond is True:' or 'if cond:'
    if isUseStopwordsByHanLP == True:
                             ^
./examples/wordcloud_cn.py:114: [F403] 'from pyhanlp import *' used; unable to detect undefined names
    from pyhanlp import *
    ^
./examples/wordcloud_cn.py:115: [F401] 'jpype.startJVM' imported but unused
    from jpype import JClass, startJVM, getDefaultJVMPath, isThreadAttachedToJVM, attachThreadToJVM
    ^
./examples/wordcloud_cn.py:115: [F401] 'jpype.getDefaultJVMPath' imported but unused
    from jpype import JClass, startJVM, getDefaultJVMPath, isThreadAttachedToJVM, attachThreadToJVM
    ^
./examples/wordcloud_cn.py:115: [F401] 'jpype.isThreadAttachedToJVM' imported but unused
    from jpype import JClass, startJVM, getDefaultJVMPath, isThreadAttachedToJVM, attachThreadToJVM
    ^
./examples/wordcloud_cn.py:115: [F401] 'jpype.attachThreadToJVM' imported but unused
    from jpype import JClass, startJVM, getDefaultJVMPath, isThreadAttachedToJVM, attachThreadToJVM
    ^

To run flake8 locally:

Activate your environment
Go to source directory
Make sure development requirements are installed
Execute flake8

workon wordcloud # Or similar command to activate your python environment
cd /path/to/src/word_cloud
pip install requirements-dev.txt
flake8

Simplified part of the code Removed some code that references the class library Modified some code that references the class library

TianFengshou · 2018-12-14T02:20:23Z

I downloaded flake8 and made some changes to the code. It may be ok now.

TianFengshou · 2020-07-20T09:13:08Z

With the development of time, Chinese natural language processing technology has become very mature. But jieba is still the fastest native python library, and other libraries need to call other codes or use deep learning models, so I personally think that as an example, maybe jieba is the best choice. As for the issue of optimal performance, it should be left to the user to solve. So it's almost time to close the pull.

删除冲突代码以合并源代码

amueller · 2020-07-20T14:51:38Z

So are you saying you'd leave the example as it was in #329 and close this pull request?

Main

TianFengshou · 2024-10-30T06:35:47Z

I'm sorry for closing this merge request years later. I didn't merge it at the time due to concerns - it was just a demonstration, not an engineering application. Hanlp is still the best statistical based Chinese model to this day, and even has certain advantages over deep learning models in some fields. However, for a demonstration, his model is still too large.

TianFengshou and others added 2 commits September 10, 2018 22:04

Merge pull request #1 from amueller/master

a2c4de3

更新源项目代码

TianFengshou added 2 commits September 19, 2018 12:49

del wordcloud_cn_HanLP

28a5841

TianFengshou changed the title ~~use HanLP replace Jieba~~ use HanLP or Jieba create word_cloud_cn Sep 19, 2018

amueller reviewed Sep 21, 2018

View reviewed changes

TianFengshou and others added 6 commits September 23, 2018 20:58

To fix a bug, you must first install jpype to use pyhanlp

2850f83

修改中文词云示例，以通过检查

f4a1b2d

添加冗余引入，已通过Github合并时候的代码检查

655d7c8

修改代码以通过scikit-ci代码检查

b538c11

Merge pull request #2 from amueller/master

77d3fc5

从源代码同步

change file to merge...

d264006

now, you can use pyhanlp to create wordcloud with Chinese.

Modify the code to pass the check of flake8

9ee9ec7

Simplified part of the code Removed some code that references the class library Modified some code that references the class library

删除冲突代码以合并源代码

9018395

删除冲突代码以合并源代码

TianFengshou added 2 commits October 30, 2024 14:12

Resolved conflict in examples/wordcloud_cn.py

aba7e78

Merge pull request #5 from TianFengshou/main

d75fcd8

Main

TianFengshou closed this Oct 30, 2024

TianFengshou deleted the master branch October 30, 2024 07:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use HanLP or Jieba create word_cloud_cn #430

use HanLP or Jieba create word_cloud_cn #430

TianFengshou commented Sep 10, 2018

jcfr commented Sep 10, 2018

TianFengshou commented Sep 19, 2018

amueller Sep 21, 2018

TianFengshou Sep 23, 2018

TianFengshou Sep 23, 2018

amueller Sep 24, 2018

amueller Sep 24, 2018

TianFengshou Sep 25, 2018 •

edited

Loading

TianFengshou Sep 25, 2018

amueller Sep 25, 2018

amueller commented Dec 13, 2018

TianFengshou commented Dec 13, 2018

amueller commented Dec 13, 2018

jcfr commented Dec 13, 2018 •

edited

Loading

TianFengshou commented Dec 14, 2018

TianFengshou commented Jul 20, 2020

amueller commented Jul 20, 2020

TianFengshou commented Oct 30, 2024

use HanLP or Jieba create word_cloud_cn #430

use HanLP or Jieba create word_cloud_cn #430

Conversation

TianFengshou commented Sep 10, 2018

jcfr commented Sep 10, 2018

TianFengshou commented Sep 19, 2018

amueller Sep 21, 2018

Choose a reason for hiding this comment

TianFengshou Sep 23, 2018

Choose a reason for hiding this comment

TianFengshou Sep 23, 2018

Choose a reason for hiding this comment

amueller Sep 24, 2018

Choose a reason for hiding this comment

amueller Sep 24, 2018

Choose a reason for hiding this comment

TianFengshou Sep 25, 2018 • edited Loading

Choose a reason for hiding this comment

TianFengshou Sep 25, 2018

Choose a reason for hiding this comment

amueller Sep 25, 2018

Choose a reason for hiding this comment

amueller commented Dec 13, 2018

TianFengshou commented Dec 13, 2018

amueller commented Dec 13, 2018

jcfr commented Dec 13, 2018 • edited Loading

TianFengshou commented Dec 14, 2018

TianFengshou commented Jul 20, 2020

amueller commented Jul 20, 2020

TianFengshou commented Oct 30, 2024

TianFengshou Sep 25, 2018 •

edited

Loading

jcfr commented Dec 13, 2018 •

edited

Loading