en_core_web_trf version 3.7.3 versus 3.7.2 #13375
-
I am seeing the quality of nouns in noun_chunks and dependency graph regress in many of my examples when I went from 3.7.2 to 3.7.3. Is this just for my examples, or does anyone else have a similar experience? For example, for the following sentence: "Our aim is to harness retinal pigment epithelium (RPE) ADAR enzymes, especially ADAR2.", the noun_chunks in version 3.7.2 were [Our aim, retinal pigment epithelium (RPE) ADAR enzymes, especially ADAR2]. In version 3.7.3 I see [Our aim, (RPE, especially ADAR2] My sample size is only about 20 sentences, and I see a regression on this small sample. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I've had a look and there's nothing that should have changed in the actual noun chunking, so this difference must be due to bad luck on parses that are the same or better on average but worse on your sample. It's possible the difference might be systematic on your domain, or it could just be the few examples you happen to be looking at. If you are digging deeper and want to verify, you can double-check that the parses returned are actually different. If they're the same then it's in the chunking rules, which would be surprising. If it's differences in the parse structure, it's hard for us to resolve easily. The models are statistical, and it will always be rather up and down. |
Beta Was this translation helpful? Give feedback.
I've had a look and there's nothing that should have changed in the actual noun chunking, so this difference must be due to bad luck on parses that are the same or better on average but worse on your sample. It's possible the difference might be systematic on your domain, or it could just be the few examples you happen to be looking at.
If you are digging deeper and want to verify, you can double-check that the parses returned are actually different. If they're the same then it's in the chunking rules, which would be surprising. If it's differences in the parse structure, it's hard for us to resolve easily. The models are statistical, and it will always be rather up and down.