Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subject-verb-object extraction as part of narrative network needs to be more accurate #91

Open
ThijsVroegh opened this issue Sep 9, 2024 · 0 comments

Comments

@ThijsVroegh
Copy link
Collaborator

ThijsVroegh commented Sep 9, 2024

The current subject-verb-object extraction as part of narrative network needs to be more accurate.

  • Account for verb combinations that are the 'action'. E.g.: "ik ben blij dat ik met hem ook een nieuwe vorm heb gevonden".
    subject: ik
    the verb should be : heb gevonden (combined group)
    object: een nieuwe vorm

  • Account voor 'bijzinnen' ; split results for those. Currently, it appears that only for the final part after the comma, results are retrieved.

  • E.g., "Vanuit een diep dal zijn we herrezen, we hebben onze draken in de ogen gekeken" -> results are
    we - gekeken - onze draken
    we - gekeken - de ogen

No results on the first part of the sentence (before the comma)

Consider using a slightly different approach to extract the SVO's, e.g. ,with:

create a function and apply to the text column of interest as shown:

import textacy
import spacy
nlp = spacy.load('en_core_web_sm')
import pandas as pd

def extract_SVOs(text):
doc = nlp(text)
tuples = textacy.extract.subject_verb_object_triples(doc)
return list(tuples)

df['new_column_with_SVOs'] = df['my_text_colum'].apply(extract_SVOs)

see https://stackoverflow.com/questions/50906510/how-to-build-a-subject-verb-object-extraction-model-in-python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant