Flavio/issue 15 #28

f-hafner · 2024-01-23T09:09:56Z

Issue

Closes #15

Description of changes

added the drop-down menu to the gui
segment the story by sentence; created second dataframe with one row per sentence

Open questions

the resulting dataframe is not outputted anywhere. one solution would be to have a data structure with a set of relational dataframes/tables, and this datastructure can then be used as input in downstream widgets.
where to define the type of the comboBox? currently I have to transform from str to int in line 142 and line 147 of orangecontrib/storynavigation/widgets/OWSNTagger.py. Ideally we change the type when when the tagger is instantiated and the input value changes

The number of segments are computed with the np.array_split. Because the number of segments is now defined at a global level (for all stories), it can create highly unequal segment sizes when there is large variability in the length across stories (and thus, statistical conclusions from comparing segments within a story will be more or less accurate depending on the size of the segment).
One way to deal with this is to represent this uncertainty to the user and write a clear documentation about it, perhaps including a hint that the user should inspect the segment length in their stories.
Another way could be to let the user define, instead of (or in addition to?) the number of segments, the minimum segment size they want.

Includes

Code changes
Tests
Documentation

when user does not explicitly select a number, it is set to 0 by default, causing errors in the tagger. now, just replace a 0 with 1.

f-hafner · 2024-01-23T13:32:26Z

Instead of a new dataframe, store the segment_id in a new column in the dataframe with the tags.

f-hafner · 2024-01-23T13:41:10Z

The output of the tagger now differs from the output without story segmentation: to order of rows is different. @kodymoodley , if this is an issue, let me know and I can try to fix it.

kodymoodley · 2024-01-24T09:01:06Z

The output of the tagger now differs from the output without story segmentation: to order of rows is different. @kodymoodley , if this is an issue, let me know and I can try to fix it.

@f-hafner By 'output' do you mean the dataframe? And by 'differs' do you mean solely with the additional column indicating the story segment number? If so then there is no issue. Just to be clear, the intention (as per our offline discussion yesterday) is still to retain a single dataframe as the output for the tagger, right?

kodymoodley

Looks good, thanks @f-hafner !

f-hafner added 4 commits January 17, 2024 14:23

add N segments dropdown to tagger gui

17ca807

use n_segments when processing stories

17c9b4d

use single dataframe for tagger output

233d3d7

fix problem with n_segments

f7df021

when user does not explicitly select a number, it is set to 0 by default, causing errors in the tagger. now, just replace a 0 with 1.

add docs

ff9f50d

f-hafner marked this pull request as ready for review January 23, 2024 13:41

f-hafner requested a review from kodymoodley January 23, 2024 13:41

kodymoodley approved these changes Jan 24, 2024

View reviewed changes

kodymoodley merged commit dfd39e8 into master Jan 24, 2024
0 of 9 checks passed

kodymoodley deleted the flavio/issue-15 branch January 24, 2024 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flavio/issue 15 #28

Flavio/issue 15 #28

f-hafner commented Jan 23, 2024 •

edited

Loading

f-hafner commented Jan 23, 2024

f-hafner commented Jan 23, 2024

kodymoodley commented Jan 24, 2024 •

edited

Loading

kodymoodley left a comment

Flavio/issue 15 #28

Flavio/issue 15 #28

Conversation

f-hafner commented Jan 23, 2024 • edited Loading

Issue

Description of changes

Includes

f-hafner commented Jan 23, 2024

f-hafner commented Jan 23, 2024

kodymoodley commented Jan 24, 2024 • edited Loading

kodymoodley left a comment

Choose a reason for hiding this comment

f-hafner commented Jan 23, 2024 •

edited

Loading

kodymoodley commented Jan 24, 2024 •

edited

Loading