token attribute to return containing ent #4898
Replies: 4 comments
-
I'm not sure we'd want to have this as part of the core library, because the function would be pretty inefficient as it needs to loop through all However what I would suggest is that if you want this kind of functionality for your specific use-case, you could loop through all entities ONCE and set a custom attribute that keeps track of the information you want. That should keep things efficient instead of looping through them each time. |
Beta Was this translation helpful? Give feedback.
-
Thanks Sofie, You're right about the inefficiency of the function I suggested, of course - I was just trying to show the desired functionality. If I'm reading your suggestion correctly, I would loop through the entities, and for each entity, loop through the tokens in doc[ent.start : ent.end] and for each of those tokens, I would set my custom attribute. But what exactly would the attribute be? I'd like for it to be analogous to Span.ents, which gives a list of the ents (i.e., the Span objects associated to the ents) contained by the span in question. If I store the span on the token, then I cannot serialize my docs to disk (not sure how to put doc.to_disk(path, exclude=[<token._.ent>]) ? It does seem like this would be a good use case for the .ent_id attribute, which is (for some reason?) unfortunately not writable from the span. I hope to be able to use ent.id eventually. |
Beta Was this translation helpful? Give feedback.
-
Hi @jack-rory-staunton, I was assuming that you want to access some sort of property from the entities on the token level, such as the NER type orso, and that you wanted to do something like If you really want to be able to access the actual Span object from the token, perhaps the best workaround is to just store a mapping in your code. That will prevent copying over too much information or inefficient looping. |
Beta Was this translation helpful? Give feedback.
-
Thanks again So I think this should work:
It seems like a fair enough solution. I'll still need to rebuild the mapping whenever I change an entity (e.g. when retokenizing or over-writing a span). Perhaps it's not directly related, but why are Token.ent_id and Span.ent_id not writable? It would seem that the functionality I want is already somewhere inside spaCy lying dormant. |
Beta Was this translation helpful? Give feedback.
-
Feature description
Converting things back and forth between tokens, spans and entities is often laborious. A function like
simply returns the entity of which the supplied token is a part. I submit that Token should have Token.ent as an attribute that would supply the entity's span if it exists or return [] is tok.ent_type == 0.
Beta Was this translation helpful? Give feedback.
All reactions