Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify EL batching to doc-wise streaming approach #12367

Merged
merged 109 commits into from
Apr 9, 2024

Conversation

rmitsch
Copy link
Contributor

@rmitsch rmitsch commented Mar 6, 2023

Description

Modify EL batching to work doc-based instead of a mention-based. For prior discussion as to why this is useful see #11669 (comment).

Review and merge after #12341.
Split off of #11669.

Types of change

Enhancement.

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

rmitsch and others added 23 commits February 28, 2023 13:49
Co-authored-by: Sofie Van Landeghem <[email protected]>
Co-authored-by: Sofie Van Landeghem <[email protected]>
…ntions

# Conflicts:
#	spacy/ml/models/entity_linker.py
#	website/docs/api/inmemorylookupkb.mdx
Co-authored-by: Sofie Van Landeghem <[email protected]>
@rmitsch rmitsch self-assigned this Mar 6, 2023
@rmitsch rmitsch added 🔜 v4.0 Related to upcoming v4.0 feat / nel Feature: Named Entity linking labels Mar 6, 2023
@svlandeg svlandeg reopened this Jan 29, 2024
@svlandeg svlandeg changed the base branch from v4 to v5 January 29, 2024 09:38
@svlandeg svlandeg changed the base branch from v5 to main January 29, 2024 09:39
Copy link
Member

@svlandeg svlandeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, had some comments that still need addressing. Was the rewrite sufficiently tested on some sample docs to determine that it all works well?

spacy/kb/kb.pyx Outdated
@@ -30,26 +30,15 @@ cdef class KnowledgeBase:
self.entity_vector_length = entity_vector_length
self.mem = Pool()

def get_candidates_batch(self, mentions: SpanGroup) -> Iterable[Iterable[Candidate]]:
def get_candidates(self, mentions: Iterator[SpanGroup]) -> Iterator[Iterable[Iterable[Candidate]]]:
"""
Return candidate entities for a specified Span mention. Each candidate defines at least the entity and the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still requires updating - the text refers to a single specified Span Mention.

spacy/ml/models/entity_linker.py Outdated Show resolved Hide resolved
spacy/pipeline/entity_linker.py Show resolved Hide resolved
website/docs/api/entitylinker.mdx Show resolved Hide resolved
spacy/ml/models/entity_linker.py Outdated Show resolved Hide resolved
spacy/pipeline/entity_linker.py Outdated Show resolved Hide resolved
website/docs/usage/v3.mdx Outdated Show resolved Hide resolved
spacy/pipeline/entity_linker.py Show resolved Hide resolved
Copy link
Member

@svlandeg svlandeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this in v4 😎

@svlandeg svlandeg merged commit 304b933 into explosion:v4 Apr 9, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / nel Feature: Named Entity linking 🔜 v4.0 Related to upcoming v4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants