Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disaggregated stats wrong total #191

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

noliveleger
Copy link
Contributor

Fixes #190.

@noliveleger noliveleger self-assigned this Dec 20, 2018
@noliveleger noliveleger requested a review from jnm December 20, 2018 21:51
@coveralls
Copy link

coveralls commented Dec 20, 2018

Coverage Status

Coverage increased (+0.02%) to 83.865% when pulling 43e3aa0 on 190-get-disaggregated-stats-bad-total into 2d68ce0 on master.

@noliveleger
Copy link
Contributor Author

Per discussion with @jnm, we may need to show stats for None values of the grouping field (when field is not restricted).
Perhaps we should open another issue for this. It won't be added to this PR because of this:
https://github.com/kobotoolbox/formpack/blob/master/src/formpack/schema/fields.py#L308-L310.

Copy link
Member

@jnm jnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you include a unit test with this PR?

if val is None:
not_provided += sum(counter.values())
else:
# `counter['__submissions__'] contains the count of all submissions,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble figuring out where this happens. I found these:

if value is not None:
counters['__submissions__'] += 1

counter.update(values)
counter['__submissions__'] += 1

...but they both increment counter['__submissions__'] only when the value is not None.

Copy link
Contributor Author

@noliveleger noliveleger Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnm,

Let's say, we have a form with two questions (select one), not mandatory:

Favorite coffee:

  • Nespresso
  • Keurig
  • Tim Horton

Type of coffee:

  • Regular
  • Espresso
  • Latte

Three users submit data (first case):
First one:
- Favorite coffee: Nespresso
- Type of coffee: Regular

Second one:
- Favorite coffee: Tim Horton
- Type of coffee: Latte

Third one:
- Favorite coffee: Keurig
- Type of coffee: No choices checked

Disaggregating the stats by grouping by Favorite coffee should return 3 Counters

  • None => {'keuring': 1}
  • Latte => {'tim_horton': 1}
  • Regular => {'nespresso': 1}

Now, a fourth user submits this (second case):
- Favorite coffee: No choices checked
- Type of coffee: latte

It should still return the 3 sames counters, except that Latte Counter should have changed for
{'None': 1, 'tim_horton': 1}

Having that said, the if condition handles the first case, the else handles the second one.

@jnm
Copy link
Member

jnm commented Feb 18, 2021

@noliveleger, sorry for abandoning this for so long. I'm trying once again to understand it, but I've found that the unit test passes without any of the other changes:

(e) john@world:~/Documents/kobo/dockdev/formpack$ git log origin/master...
commit 95fe7eeda02130df5e476a9421f78ddbb4eca161 (HEAD -> cherry-pick-191-test, origin/cherry-pick-191-test, master)
Author: Olivier Léger <[email protected]>
Date:   Mon Feb 11 16:02:57 2019 -0500

    Added a unittest for disaggregated stats with empty answers
(e) john@world:~/Documents/kobo/dockdev/formpack$ pytest -v | grep test_disaggregate_without
tests/test_autoreport.py::TestAutoReport::test_disaggregate_without_answers PASSED [ 22%]

Here is the branch with the unit test commit cherry-picked on top of master: https://github.com/kobotoolbox/formpack/tree/cherry-pick-191-test.

Did we implement a fix for #190 somewhere along the line after you opened this PR, or does the test need to be modified?

@noliveleger
Copy link
Contributor Author

noliveleger commented Feb 19, 2021

@jnm, no. Unfortunately, my test is totally wrong. The issue is that total_count exceeds the number of submissions when data is submitted with null values. It does not test this at all. The fixture seems wrong too. Let me correct that.

@noliveleger noliveleger requested a review from jnm February 26, 2021 20:24
@noliveleger noliveleger removed their assignment Feb 26, 2021
@joshuaberetta
Copy link
Member

@noliveleger what's the status of this PR?

@noliveleger
Copy link
Contributor Author

@noliveleger what's the status of this PR?

I've applied requested changes ;-)
It's pretty old, we should discussed (as the back-end team) if it's still an improvement and merge if it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

get_disaggregated_stats() returns wrong counts
4 participants