Integrating Custom Entity Extraction with POS Tagging and Parsing in spaCy: Seeking Advice and Clarifications #13483
Unanswered
ANoubani
asked this question in
Help: Installation
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm not exactly sure what question to ask, as I'm uncertain if I'm approaching this correctly. I'll explain my case and then pose my questions. I want to train a spaCy NER model to automatically extract custom entities (UML entities: ACTOR, USECASE, RELATION). From my understanding, I need to prepare annotated data that specifies these new labels to train the NER component. However, I also want the trained model to perform additional processing like applying POS tagging, parsing, and lemmatization. I believe this will enhance the accuracy of the newly trained model's predictions. For instance, I want it to more likely recognize names as actors and verbs as use cases. Is this assumption correct?
I've read the spaCy documentation and understand that if I want to use components without updating their weights, I need to freeze them. In this case, should I list them in both the main pipeline and the frozen components, or just in the frozen components? Whenever I add components to both areas, the training command fails and throws this error:
ValueError: [E143] Labels for component 'tagger' not initialized. This can be fixed by calling add_label, or by providing a representative batch of examples to the component's initialize method.
Questions:
1- If I only include ["tok2vec", "ner"] in the pipeline, will the other components be trained as well? If not, how can I train them in both scenarios: updating their weights and not updating their weights?
2- How do I initialize a component properly?
3- For my purposes, do I need to use en_core_web_trf or en_core_web_lg?
4- which component I need to train on extracting relations between actors and entities, like I want my application to specify that this ACTOR performs this USECASE?
spaCy version 3.7.4
Platform Windows-11-10.0.22631-SP0
Python version 3.12.3
Pipelines en_core_web_trf (3.7.3)
THANKYOU!!
Beta Was this translation helpful? Give feedback.
All reactions