-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle a question whose type changes but name doesn't. #187
base: main
Are you sure you want to change the base?
Conversation
…51-question-with-different-types
… with different types
… but different type
src/formpack/reporting/autoreport.py
Outdated
raw_value = entry.get(field.path) | ||
field_contextual_name = "{}_{}_{}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self: revise this) i guess this is the expected contextual name, used for comparison not assignment?
possible to use the create_unique_name()
method to avoid hard-coding this here?
https://github.com/kobotoolbox/formpack/pull/187/files#diff-7845025cfa4cd9a06d7a4399e1ba9c0bR58
or, maybe if field.contextual_name.startswith("{}_{}_v".format(
below could be changed to something like if field.use_unique_name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnm: I now remember why I did this comparison this way.
We have
if field.contextual_name.startswith("{}_{}_v".format(
field.name,
field.data_type
)):
# We have a match, we won't evaluate other fields with same name
# for this submission.
if field.contextual_name == field_contextual_name:
fields_to_skip.append(field.name)
else:
continue
Let's explain:
All fields that are added by get_fields_for_versions
(because another type has been detected in the latest version) match the pattern name
_data_type
_version_id
.
- Trivial case: Pattern doesn't match. Data belongs to field. Skip first if statement.
- Other case: We have a match with this pattern
version_id
is the same. Data belongs to this field. All other fields with same name will be skipped for this submission.version_id
is not the same, we want to iterate over loop immediately because the data
should be appended to the corresponding field (or column in the export) only - which is a further occurrence of the loop.
The {}_{}_v
pattern is used to avoid matching another field name.
For example:
- Question 1: restaurant
- Question 2: restaurant_text
Adding the "_v" in the pattern decreases the risk of matching the wrong field.
Obviously if all these rules are true:
- Form has several versions
- Form has a question named
restaurant
has multiple types among versions - One of its types is
text
- Form has a question name
restaurant_text_v
Export will have data shifted again. The comparison would be a regex to decrease the risk a little bit more.
Something like _v[\d\w]{21,}
. What do you think?
Having this said: Your suggestion to use create_unique_name
is really good idea.
About the idea of maybe if field.contextual_name.startswith("{}_{}_v".format( below could be changed to something like if field.use_unique_name?
Actually it can't.
In case the form has more than 3 versions and each version contains a question with the same name but 3 different types.
In that case fields
should be:
[<FormField type="text" contextual_name="question_text_v123456">,
<FormField type="integer" contextual_name="question_integer_v234567">,
<FormField type="date_time" contextual_name="question">]
Let's say, we have 1 submission for each version where value of the question question
would be
- Submission 1:
answer
- Submission 2:
1
- Submission 3:
2018-12-18 00:00:00
Using field.use_unique_name
, submission 1 and 2 would be matched with field question_text_v123456
because both fields question_text_v123456
and question_integer_v234567
have use_unique_name
property equals True
. Using the string comparison submission 2
can't be matched with question_text_v123456
(because its version_id
is not the same)
src/formpack/schema/datadef.py
Outdated
self.has_stats = has_stats | ||
|
||
def __repr__(self): | ||
return "<%s name='%s'>" % (self.__class__.__name__, self.name) | ||
return "<%s name='%s'>" % (self.__class__.__name__, self.contextual_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: everywhere else, this is changed to contextual_name='%s'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have hesitated between using self.name
for FormDataDef
or changing the representation string because contextual_name
is more related to FormField
than its parent class.
Still wondering...
There's still a problem when disaggregating (using "group by") in the autoreport after I changed an
Sentry has the details, but make sure to switch to "Full" view instead of "App Only": https://sentry.kbtdev.org/kobo/kpi-backend/issues/18374/ I'm guessing Bonus: what the heck causes |
@jnm: Nice catch. |
…rder to match data with correct field
@jnm: Sentry is savior but he is perhaps the bad guy as well. |
whitespace and `contextual_name` changes
@noliveleger what's the way forward on this one? |
@joshuaberetta last discussion about that with @jnm was, we wanted to think about it a little bit more. |
Fixes #151.
Fixes kobotoolbox/kpi#2109
Covers #183 and #185.