Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flavio/issue 15 #28

Merged
merged 5 commits into from
Jan 24, 2024
Merged

Flavio/issue 15 #28

merged 5 commits into from
Jan 24, 2024

Conversation

f-hafner
Copy link
Collaborator

@f-hafner f-hafner commented Jan 23, 2024

Issue

Closes #15

Description of changes
  • added the drop-down menu to the gui
  • segment the story by sentence; created second dataframe with one row per sentence

Open questions

  • the resulting dataframe is not outputted anywhere. one solution would be to have a data structure with a set of relational dataframes/tables, and this datastructure can then be used as input in downstream widgets.
  • where to define the type of the comboBox? currently I have to transform from str to int in line 142 and line 147 of orangecontrib/storynavigation/widgets/OWSNTagger.py. Ideally we change the type when when the tagger is instantiated and the input value changes

The number of segments are computed with the np.array_split. Because the number of segments is now defined at a global level (for all stories), it can create highly unequal segment sizes when there is large variability in the length across stories (and thus, statistical conclusions from comparing segments within a story will be more or less accurate depending on the size of the segment).
One way to deal with this is to represent this uncertainty to the user and write a clear documentation about it, perhaps including a hint that the user should inspect the segment length in their stories.
Another way could be to let the user define, instead of (or in addition to?) the number of segments, the minimum segment size they want.

Includes
  • Code changes
  • Tests
  • Documentation

when user does not explicitly select a number, it is set to 0 by
default, causing errors in the tagger. now, just replace a 0 with 1.
@f-hafner
Copy link
Collaborator Author

Instead of a new dataframe, store the segment_id in a new column in the dataframe with the tags.

@f-hafner
Copy link
Collaborator Author

The output of the tagger now differs from the output without story segmentation: to order of rows is different. @kodymoodley , if this is an issue, let me know and I can try to fix it.

@f-hafner f-hafner marked this pull request as ready for review January 23, 2024 13:41
@kodymoodley
Copy link
Contributor

kodymoodley commented Jan 24, 2024

The output of the tagger now differs from the output without story segmentation: to order of rows is different. @kodymoodley , if this is an issue, let me know and I can try to fix it.

@f-hafner By 'output' do you mean the dataframe? And by 'differs' do you mean solely with the additional column indicating the story segment number? If so then there is no issue. Just to be clear, the intention (as per our offline discussion yesterday) is still to retain a single dataframe as the output for the tagger, right?

Copy link
Contributor

@kodymoodley kodymoodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @f-hafner !

@kodymoodley kodymoodley merged commit dfd39e8 into master Jan 24, 2024
0 of 9 checks passed
@kodymoodley kodymoodley deleted the flavio/issue-15 branch January 24, 2024 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Story segmentation
2 participants