-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flavio/issue 15 #28
Flavio/issue 15 #28
Conversation
when user does not explicitly select a number, it is set to 0 by default, causing errors in the tagger. now, just replace a 0 with 1.
Instead of a new dataframe, store the segment_id in a new column in the dataframe with the tags. |
The output of the tagger now differs from the output without story segmentation: to order of rows is different. @kodymoodley , if this is an issue, let me know and I can try to fix it. |
@f-hafner By 'output' do you mean the dataframe? And by 'differs' do you mean solely with the additional column indicating the story segment number? If so then there is no issue. Just to be clear, the intention (as per our offline discussion yesterday) is still to retain a single dataframe as the output for the tagger, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks @f-hafner !
Issue
Closes #15
Description of changes
Open questions
comboBox
? currently I have to transform fromstr
toint
in line 142 and line 147 oforangecontrib/storynavigation/widgets/OWSNTagger.py
. Ideally we change the type when when the tagger is instantiated and the input value changesThe number of segments are computed with the
np.array_split
. Because the number of segments is now defined at a global level (for all stories), it can create highly unequal segment sizes when there is large variability in the length across stories (and thus, statistical conclusions from comparing segments within a story will be more or less accurate depending on the size of the segment).One way to deal with this is to represent this uncertainty to the user and write a clear documentation about it, perhaps including a hint that the user should inspect the segment length in their stories.
Another way could be to let the user define, instead of (or in addition to?) the number of segments, the minimum segment size they want.
Includes