Entity linker returns NIL if no aliases are defined #5597
Replies: 7 comments
-
Aliases are definitely required as input. The KB does not know anything else about the entities other than their unique ID, so it's not able to produce sensible candidates if there are no aliases. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot for the answer, I thought the EL would learn to produce the candidates through the training examples. |
Beta Was this translation helpful? Give feedback.
-
That does seem like a reasonable approach, but internally it would mean quite some changes. Right now there are two steps: First, the KB generates the candidates. Then, the EL scores each candidate. Currently, only the latter step is trained; the candidate generation itself isn't. You can ofcourse use your training data to define the aliases and their prior probabilities, but you'd do this only once, and the KB would remain the same throughout further training of the EL algorithm. |
Beta Was this translation helpful? Give feedback.
-
I'm thinking about this some more, and it would definitely be less efficient to adjust the aliases / prior probabilities on the fly. Basically when encountering the combination entity |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot for your answer. Indeed your explanation makes a lot of sense. Actually my use-case is a little bit different, and maybe not the actual intended use case of EL; I'm trying to disambiguate medical terms (similar to scispaCy), and I ideally I would like the model to learn to identify medical terms even if I had not added an alias for it in the exact same words. |
Beta Was this translation helpful? Give feedback.
-
After a little bit of digging it seems this is something that has been discussed before, #4981, #4572, #4988. Do you think we may get official support for this kind of thing in the near future? Thanks |
Beta Was this translation helpful? Give feedback.
-
Yep your use-case sounds a lot like what @kabirkhan was working on. I'll close this issue and let's continue our discussion at Issue #4981 and the related PR #4988 to keep the discussion in one place :-) And yes, we do want to support this in the near future. We probably want to get a stable release of v.3 out first, then focus on additional features like this. The ongoing work on v.3 has been holding up that PR, but it would be good to get back to that soonish. |
Beta Was this translation helpful? Give feedback.
-
How to reproduce the behaviour
I was trying to train a NEL and was getting NIL for every token every time. I tried the examples from https://spacy.io/usage/examples and it worked fine. I realized the difference is that I don't have any aliases defined in my KB. I then proceeded to test the same examples "Creating a Knowledge Base for Named Entity Linking" and then "Training spaCy’s Named Entity Linker" but removing this line:
Without this line I reproduce the issue on these simple examples, getting this as a result:
I guess I'm probably misunderstanding something, but I thought aliases were optional, sort of a way to define synonyms or something. Do I need to add aliases for every entry in my KB?
Your Environment
Beta Was this translation helpful? Give feedback.
All reactions