Chinese Constituency Parsing

Background

Given a sentence, consituency parsing produces a parse tree whose internal nodes are constituents and whose leaf nodes are words.

Input:

柴犬是一种像精灵一样的犬种。

Output:

(IP (NP-SBJ (NN 柴犬)) (VP (VC 是) (NP-PRD (QP (CD 一) (CLP (M 种))) (DVP (IP (VP (PP (P 像) (NP (NN 精灵))) (VP (VA 一样)))) (DEV 的)) (VP (VA 犬种)))) (PU 。))

Exact match (EM): the percentage of predicted parse trees that match the ground truth exactly.
F1 score of constituents in the predicted parse tree.
Labeled precision (LP): precision of constituents in the predicted parse tree.
Labeled precall (LR): recall of constituents in the predicted parse tree.

Released by LDC. Requires LDC licence to acquire the datasets
Link: https://verbs.colorado.edu/chinese/ctb.html
First used for consituency parsing by Liu and Zhang (2017).
Prior works have adopted the preprocessing in distance-parser, which selects a subset of CTB 8.0 that corresponds to CTB 5.1.

Dataset	# sentences (train)	# sentences (dev)	# sentences (test)
CTB 5.1	17,544	352	348

EM, F1, LP and LR can be calculated using the Evalb tool.

System	EM	F1	LP	LR	code
Liu and Zhang (2017)	44.94	91.81	-	-	Github
Zhou and Zhao (2019)	-	92.18	92.33	92.03	Github
Mrini et al. (2020)	-	92.64	93.45	91.85	Github
Yang and Deng (2020)	49.72	93.59	93.80	93.40	Github

Suggestions? Changes? Please send email to chinesenlp.xyz@gmail.com