Extracting Consumer information from 10-k text #2122
Replies: 2 comments
-
Hi @yoonchanheee , It's not always easy to tag arbitrary text with the NER model. The NER model looks for information in the surrounding context to figure out what the tag is. Tags like "organization" are good because the context provides good clues about whether the word is an organization. But how is the model supposed to predict whether something is a customer? I think you're best off focussing on getting high accuracy on labelling organizations, and then having a follow-up process figure out whether it's a customer. I think this will probably look like a list of organizations that are customers. You might need another process to match the name of the organization into your list, e.g. if you need some name normalization. For improving the training of the ORG model, you might find our annotation tool Prodigy useful: https://prodi.gy . Hope that helps! |
Beta Was this translation helpful? Give feedback.
-
Hi!
I am so surprised by how efficient this spaCy module is!
Thanks for the great help!
I'm trying to extract customers information from the 10-k data.
Below are example sentences.
I want to extract Staples, Office Max, United Stationers from this text.
At first, I thought NER can deal with this problem.
However, there are entities that are not customer in some of the sentences.
For example,
Since NER pipe line classifies both Fleetwood and Home Depot as organizations it can only solve the problem partly.
So Next, I thought dependency parser would help me. However, there are many other forms and verbs that characterizes whether a entity is a customer or not...
To deal with those problems, I tried to train NER pipeline. I marked 2000+ sentences whether each word in sentences is customer or not in IOB format. However, when I try to train NER pipeline with those texts the amount of Loss did not goes down, and it seems like the overall accuracy is bad.
I suspect that this is because NER pipeline cannot catch the context in text.
So my question is, is there any way that I could deal with problems. (I'm thinking extracting features(entities, dependency parser feature) by Spacy and try machine learning with these features.)
Do you have any suggestion?
Your Environment
Beta Was this translation helpful? Give feedback.
All reactions