You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The sentence span is wrong if there are sentences containing only space tokens
>>> import spacy
>>> import spacy_udpipe
>>> spacy_udpipe.download("nl")
Already downloaded a model for the 'nl' language
>>> nlp = spacy_udpipe.load("nl")
>>>
>>> def line_splitter(x):
... text = str(x)
... text = text.split(sep = "\n")
... text = [sent + "\n" for sent in text]
... return text
...
>>> text_raw = "We gingen naar Brussel \n\n \nen kochten op 13/12/2021 veel eten. Jullie ook?"
>>> text = line_splitter(text_raw)
>>> text
['We gingen naar Brussel \n', '\n', ' \n', 'en kochten op 13/12/2021 veel eten. Jullie ook?\n']
>>> doc = nlp(text)
>>> for sent_i, sent in enumerate(doc.sents):
... print(sent.start_char, sent.end_char)
...
0 22
23 70
>>> text_raw[0:(22+1)]
'We gingen naar Brussel '
>>> text_raw[23:(70+1)]
'\n\n \nen kochten op 13/12/2021 veel eten. Jullie o'
>>>
The text was updated successfully, but these errors were encountered:
The sentence span is wrong if there are sentences containing only space tokens
The text was updated successfully, but these errors were encountered: