Subject-verb-object extraction as part of narrative network needs to be more accurate #91

ThijsVroegh · 2024-09-09T11:04:00Z

The current subject-verb-object extraction as part of narrative network needs to be more accurate.

Account for verb combinations that are the 'action'. E.g.: "ik ben blij dat ik met hem ook een nieuwe vorm heb gevonden".
subject: ik
the verb should be : heb gevonden (combined group)
object: een nieuwe vorm
Account voor 'bijzinnen' ; split results for those. Currently, it appears that only for the final part after the comma, results are retrieved.
E.g., "Vanuit een diep dal zijn we herrezen, we hebben onze draken in de ogen gekeken" -> results are
we - gekeken - onze draken
we - gekeken - de ogen

No results on the first part of the sentence (before the comma)

Consider using a slightly different approach to extract the SVO's, e.g. ,with:

create a function and apply to the text column of interest as shown:

import textacy
import spacy
nlp = spacy.load('en_core_web_sm')
import pandas as pd

def extract_SVOs(text):
doc = nlp(text)
tuples = textacy.extract.subject_verb_object_triples(doc)
return list(tuples)

df['new_column_with_SVOs'] = df['my_text_colum'].apply(extract_SVOs)

Provide feedback