-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
104 languages, 10 tasks, dual backends, HanLPv2.1
- Loading branch information
Showing
570 changed files
with
45,601 additions
and
5,646 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
language: python | ||
cache: pip | ||
python: | ||
- '3.6' | ||
install: | ||
- pip install . | ||
deploy: | ||
provider: pypi | ||
username: __token__ | ||
password: | ||
secure: KU0S/z54UMdS3rJT0fNndVnvhKB48YBzpwBZQZAOUJafFyqw1Nm366cpn9OdyWPQ54LolQNEKyQZc7xDpV89j1ukKQ1aGgZ5rXD8zrAqivcWEzEWzRpO8uPGbGT0TSJDfd3zX8vHO5UznmW2nNuRJfHFkEmB/27TlZAs2ph/SrEGvuBQOFgQZMShzFWGRKL+kEXX946qlw1EdLe2XvpK7jkWQpG9c8S5mNhbqBMAofVAXyNoHqX3FrPdEvN9MY9iRx3FxusHBqHeRLwrPHK2aQLVUE5D0WE1NzKwNZ4UxbY4PfiESYDueqGR8O/awpuLwg+6itk6FbtExAIAZyDLvGS4o88AGks6VJlJKwdT0LZ6cR1+WOGXyewSjHiJmjdBnFCtvyjn/O6sDEIDmku4FINuNIcmXy2bYwns9D3lNzb2EYpSTu5A9Q4EAAWZ4t0DsWBSRJmuauv6VNTHOENPRXR3fA9honp6GWiEh+4b/yfIaT9p0VnkR7D3KoN27eNmouU4s68hAfnFVPnB/OWU/DNoWs2PbLo4ztficmGOcOyDbS4BjrLjxuyU3aAHYIeXAff6A3I/a1tz+QknYCOJz/ZnQ3e4FC+2lm/cCGzPTfi+IVQ7QJryAY8hbblDX48PHCzVLa0PPer+v2NZVrnfddMZoLd1ox65hM2gHuy6NkQ= | ||
on: | ||
python: 3.6 | ||
branch: master | ||
env: | ||
global: | ||
- HANLP_VERBOSE=0 | ||
script: | ||
- python -m unittest discover ./tests | ||
notifications: | ||
email: false |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line, and also | ||
# from the environment for the first two. | ||
SPHINXOPTS ?= | ||
SPHINXBUILD ?= sphinx-build | ||
SOURCEDIR = . | ||
BUILDDIR = _build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
<!-- | ||
# ======================================================================== | ||
# Copyright 2020 hankcs | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
# The above copyright notice and this permission notice shall be included in all | ||
# copies or substantial portions of the Software. | ||
# ======================================================================== | ||
--> | ||
|
||
# Chinese Tree Bank | ||
|
||
See also [The Bracketing Guidelines for the Penn Chinese Treebank (3.0)](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1040&context=ircs_reports). | ||
|
||
| Tag | Definition | 定义 | 例子 | | ||
|------|----------------------------------------------|----------------------------------------------------|-------------------| | ||
| ADJP | adjective phrase | 形容词短语,以形容词为中心词 | 不完全、大型 | | ||
| ADVP | adverbial phrase headed by AD (adverb) | 副词短语,以副词为中心词 | 非常、很 | | ||
| CLP | classifier phrase | 由量词构成的短语 | 系列、大批 | | ||
| CP | clause headed by C (complementizer) | 从句,通过带补语(如“的”、“吗”等) | 张三喜欢李四吗? | | ||
| DNP | phrase formed by ‘‘XP + DEG’’ | 结构为XP + DEG(的)的短语,其中XP可以是ADJP、DP、QP、PP等等,用于修饰名词短语。 | 大型的、前几年的、五年的、在上海的 | | ||
| DP | determiner phrase | 限定词短语,通常由限定词和数量词构成 | 这三个、任何 | | ||
| DVP | phrase formed by ‘‘XP + DEV’’ | 结构为XP+地的短评,用于修饰动词短语VP | 心情失落地、大批地 | | ||
| FRAG | fragment | 片段 | (完) | | ||
| INTJ | interjection | 插话,感叹语 | 哈哈、切 | | ||
| IP | simple clause headed by I (INFL) | 简单子句或句子,通常不带补语(如“的”、“吗”等) | 张三喜欢李四。 | | ||
| LCP | phrase formed by ‘‘XP + LC’’ | 用于表本地点+方位词(LC)的短语 | 生活中、田野上 | | ||
| LST | list marker | 列表短语,包括标点符号 | 一. | | ||
| MSP | some particles | 其他小品词 | 所、而、来、去 | | ||
| NN | common noun | 名词 | HanLP、技术 | | ||
| NP | noun phrase | 名词短语,中心词通常为名词 | 美好生活、经济水平 | | ||
| PP | preposition phrase | 介词短语,中心词通常为介词 | 在北京、据报道 | | ||
| PRN | parenthetical | 插入语 | ,(张三说), | | ||
| QP | quantifier phrase | 量词短语 | 三个、五百辆 | | ||
| ROOT | root node | 根节点 | 根节点 | | ||
| UCP | unidentical coordination phrase | 不对称的并列短语,指并列词两侧的短语类型不致 | (养老、医疗)保险 | | ||
| VCD | coordinated verb compound | 复合动词 | 出版发行 | | ||
| VCP | verb compounds formed by VV + VC | VV + VC形式的动词短语 | 看作是 | | ||
| VNV | verb compounds formed by A-not-A or A-one-A | V不V形式的动词短语 | 能不能、信不信 | | ||
| VP | verb phrase | 动词短语,中心词通常为动词 | 完成任务、努力工作 | | ||
| VPT | potential form V-de-R or V-bu-R | V不R、V得R形式的动词短语 | 打不赢、打得过 | | ||
| VRD | verb resultative compound | 动补结构短语 | 研制成功、降下来 | | ||
| VSB | verb compounds formed by a modifier + a head | 修饰语+中心词构成的动词短语 | 拿来支付、仰头望去 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Constituency Parsing | ||
|
||
```{toctree} | ||
ctb | ||
ptb | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
<!-- | ||
# ======================================================================== | ||
# Copyright 2020 hankcs | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
# The above copyright notice and this permission notice shall be included in all | ||
# copies or substantial portions of the Software. | ||
# ======================================================================== | ||
--> | ||
|
||
# Penn Treebank | ||
|
||
| Tag | Description | | ||
|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| ADJP | Adjective Phrase. | | ||
| ADVP | Adverb Phrase. | | ||
| CONJP | Conjunction Phrase. | | ||
| FRAG | Fragment. | | ||
| INTJ | Interjection. Corresponds approximately to the part-of-speech tag UH. | | ||
| LST | List marker. Includes surrounding punctuation. | | ||
| NAC | Not a Constituent; used to show the scope of certain prenominal modifiers within an NP. | | ||
| NP | Noun Phrase. | | ||
| NX | - Used within certain complex NPs to mark the head of the NP. Corresponds very roughly to N-bar level but used quite differently. | | ||
| PP | Prepositional Phrase. | | ||
| PRN | Parenthetical | | ||
| PRT | Particle. Category for words that should be tagged RP. | | ||
| QP | Quantifier Phrase (i.e. complex measure/amount phrase); used within NP. | | ||
| ROOT | No description | | ||
| RRC | Reduced Relative Clause. | | ||
| S | conjunction or a wh-word and that does not exhibit subject-verb inversion. | | ||
| SBAR | Clause introduced by a (possibly empty) subordinating conjunction. | | ||
| SBARQ | - Direct question introduced by a wh-word or a wh-phrase. Indirect questions and relative clauses should be bracketed as SBAR, not SBARQ. | | ||
| SINV | - Inverted declarative sentence, i.e. one in which the subject follows the tensed verb or modal. | | ||
| SQ | Inverted yes/no question, or main clause of a wh-question, following the wh-phrase in SBARQ. | | ||
| UCP | Unlike Coordinated Phrase. | | ||
| VP | Vereb Phrase. | | ||
| WHADJP | Wh-adjective Phrase. Adjectival phrase containing a wh-adverb, as in how hot. | | ||
| WHADVP | - Wh-adverb Phrase. Introduces a clause with an NP gap. May be null (containing the 0 complementizer) or lexical, containing a wh-adverb such as how or why. | | ||
| WHNP | - Wh-noun Phrase. Introduces a clause with an NP gap. May be null (containing the 0 complementizer) or lexical, containing some wh-word, e.g. who, which book, whose daughter, none of which, or how many leopards. | | ||
| WHPP | - Wh-prepositional Phrase. Prepositional phrase containing a wh-noun phrase (such as of which or by whose authority) that either introduces a PP gap or is contained by a WHNP. | | ||
| X | - Unknown, uncertain, or unbracketable. X is often used for bracketing typos and in bracketing the…the-constructions. | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Dependency Parsing | ||
|
||
```{toctree} | ||
sd | ||
ud | ||
``` | ||
|
Oops, something went wrong.