Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Survey Responses to Public Dashboard #124

Merged
merged 75 commits into from
May 8, 2024

Conversation

Abby-Wheelis
Copy link
Member

Starting to add the survey responses to the public dashboard - want to allow for any survey configuration, including multiple conditional surveys, see discussion in the issue

Abby Wheelis added 30 commits March 13, 2024 11:57
notebook currently uses the survey_info part of the config
uses config to get the surveys, and use the xlsx files to translate between data and readable labels
generates a plot for every question present in the data

hopefully will work with multiple surveys, but only tested with 1 so far
starting to read via xml instead of spreadsheet, more support across languages
there was a bug (duplicate code) in how I was creating the dictionaries
We only want to display question that are still in the survey, as those are the questions we will actually have a response for
added code to the index.html file to fetch surveys, add each of the options for the survey questions, and display the charts by default on the dashboard
use labels except for likert, use values
These will never have answers, so no need to chart them. Example was "Please rate the following statements: "
all question names begin with a capital letter, so we can safely drop all columns that begin lowercase

had to change from a hardcoded list because different survey data had different extra columns, and we need this to be as general as possible to prevent extra maintanence in the future
now properly showing the debug df and the alt text on all charts - including the ones that don't have enough data
before, I was dropping all columns that were not survey questions

no longer needed because we are generating charts directly from the list of questions (not the data) - this also matches better with the frontend!
Filter off the "input" type questions -- such as "other, please specify" so that we could choose the best way to display them later
In order to properly generate the quality text, we need to know how many trips a particular survey was presented for

To accomplish this, we can use the strings in the config that determine what survey to show the user (or a python version thereof)

We need to use the composite trips to have the sections, and perform some other data manipulations in order to have this work properly

Next is finding all of the responses that are actually present for each survey ... and associating that with each question ...
Each survey will have a different "denominator" of quality text which should be presented to the user to represent what percentage of total eligible trips have survey responses

using "showsIfPy" as a backup for strings that are too complicated, changing access notation for dictionaries in e-mission/nrel-openpath-deploy-configs#88, and replacing && and ! allows us to evaluate the strings in python

needed to track the survey names in order to know which denominator to use, so turned the list of urls into a dictionary instead of a list, and turned the dictionary of questions into a dictionary of dictionaries, one for each question
if there is only one survey, it isn't conditional, so the denominator is all composite trips
when preparing to evaluate the string, I had pulled individual values as a workaround

using row.to_dict() instead is more generalizable
editing this filtering process to explicitly pass the eval string rather than rely on the fact the function shares scope with where it is called - for clarity
if there is no data - we catch and create empty dataframes
removed empty cell and changed "None" inputs to dummy values
removing these from the html in the same way I did in the notebook -- maybe we would add these back and a different chart type later
polishing the way quality text is handled
Even though there are no labels, we can still display charts with sensed mode and other generic metrics
emulating the comment for the study case
similar to the mode-specific notebooks, we ONLY want to run this notebook if the trip labels are surveys (ENKETO)
@Abby-Wheelis
Copy link
Member Author

Some of the charts are turning out a little squished - the only change I've made is to increase their bounding boxes to 10x4, which seems to be what the other stacked bar charts get, but I'll investigate that further
Screenshot 2024-05-07 at 8 10 40 AM

make stacked bar plots larger, add the surveys to the list of stacked metrics so they can have "more info" buttons
@Abby-Wheelis
Copy link
Member Author

Starting to test ... most of the baseline charts are failing to generate ... going back to the notebooks to try and reproduce ...

Also failing in the notebooks ... in generic_metrics the only charts that generated were number of trips by day and number of trips by weekday - the rest either have an error displayed or a debug df table, but there is data, shouldn't the charts be showing up with just sensed?

Actually, to answer my own question, maybe not - the sensed mode is different (bluetooth), so we may need a workaround to introduce it ... but we do have the data (see below) - we'll want to update this to be the matched mode, but maybe that can be a next step

Screenshot 2024-05-07 at 11 26 59 AM

So maybe it's the "labels"? We have user input but not traditional labels ...

expanded_ct.groupby("Mode_confirm").agg({distance_col: 'count'} throws aKeyError, which is why we show the debug dataframe ... I bet the single bar could be plotted. Yep, commented out the line to plot the labeled data and the sensed data plots just fine.

So, there is an issue with all the charts that expect labeled data not being able to plot the sensed data because of an error with the labeled data. We have a couple choices here:

  • remove the charts from display for survey configs (for now) until we debug & add the bluetooth sensed modes
  • debug now so that the plain sensed modes show
  • debug & add the bluetooth modes ... can we just replace the primary_mode with the primary sensed mode?

If I can do it quickly, I think just debugging is the best plan, given that the bluetooth sensed mode is also buggy (why we took it out of the quality text for now) - else just cut all broken charts for the moment being. @shankari what do you think?

@shankari
Copy link
Contributor

shankari commented May 7, 2024

@Abby-Wheelis it might be more meaningful to create a separate notebook for the fleet case. We won't have mode_confirm, and also the sensed mode for the BLE is stored separate (ble_sensed_mode instead of sensed_mode). So both the bars in the standard notebooks would be different. We can just change the index.html to show different charts if it in "fleet mode" We should revisit this once we start incorporating bluetooth into regular OpenPATH, but that should be good enough for an alpha version.

To plot the `ble_sensed_mode,

can we just replace the primary_mode with the primary sensed mode?

Yes, as in, you would need to compute the primary_ble_sensed_mode like we compute the primary mode now, but then you can just use it in the same way.

We may want to also group the bars differently for the fleet case - since we won't have labeled versus confirmed, we may want to do trip count versus distance. So basically only one figure but with two bars.

removed stale lines that were breaking notebook syntax!
@Abby-Wheelis
Copy link
Member Author

it might be more meaningful to create a separate notebook for the fleet case. We won't have mode_confirm, and also the sensed mode for the BLE is stored separate (ble_sensed_mode instead of sensed_mode). So both the bars in the standard notebooks would be different. We can just change the index.html to show different charts if it in "fleet mode"

I think I could handle the dfc-fermata changes this way, but it would still be broken for all other survey cases (likewashingtoncommons). Maybe a metrics notebook for survey cases - which handled both fleet and non-fleet cases (difference would be displaying sensed mode or ble sensed mode) would be best?

We may want to also group the bars differently for the fleet case - since we won't have labeled versus confirmed, we may want to do trip count versus distance. So basically only one figure.

I think this would work, we might face duplicating or losing some of the charts, though:

  • number of trips + distance of trips + distance (land [omit in fleet case?]) - 1 chart
  • loose trips under 80%

That's fairly minimal loss and fairly minimal duplication, that feels OK to me, will proceed with this

@shankari
Copy link
Contributor

shankari commented May 7, 2024

loose trips under 80%

We can retain trips under 80% and land trips as two more figures with count and distance. I am not sure they are as meaningful for the fleet case, but might be for the other survey cases (like washington commons)

@Abby-Wheelis
Copy link
Member Author

Three charts for the sensed modes: count/distance, under 80%, and land modes:
image

image

image

We may want to refine the text on these axis/titles in particular - going to pipe through to the frontend first

create a dedicated notebook for handling metrics on survey deployments

currently option to show the "bluetooth sensed" mode is commented out -- this just shows a lot of unkowns, would this one day be the exact vehicles and how much they are used? or just car/ecar?
naming the charts from the specific notebook so they don't get overwritten by the other nbs with a _survey suffix
errors encountered while testing
@Abby-Wheelis
Copy link
Member Author

Testing done with dfc-fermata (May 6th) dataset - can run everything from the command line with no breaking!

When dashboard loads:
Screenshot 2024-05-07 at 4 11 47 PM

Non-survey metrics:
Screenshot 2024-05-07 at 4 12 23 PM

More survey examples:
Screenshot 2024-05-07 at 4 13 04 PM

More info examples:
Screenshot 2024-05-07 at 4 29 54 PM

Surveys for May are showing the debug df since there are no responses

Testing with washingtoncommons as well, getting 1 weird behavior:
image
The last segment is failing to plot for some reason ....

@Abby-Wheelis
Copy link
Member Author

Ah, it was the Other color not present in the map for survey responses - patch incoming

@Abby-Wheelis
Copy link
Member Author

fixed!
better

@Abby-Wheelis Abby-Wheelis marked this pull request as ready for review May 7, 2024 23:51
Copy link
Contributor

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to fix a couple of the issues and then merge. Let's fix the other issues in a cleanup PR.

@@ -69,6 +69,16 @@ def load_all_participant_trips(program, tq, load_test_users):
disp.display(participant_ct_df.head())
return participant_ct_df

def filter_composite_trips(all_comp_trips, program, load_test_users):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you have removed load_composite_trips, you can also remove filter_composite_trips

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that filter_composite_trips is not identical to filter_labeled_trips because it doesn't filter by blank user input. However, I don't understand why you are not filtering by blank user input. It should be possible to just call load_all_participant_trips instead of load_all_confirmed_trips followed by filter_composite_trips to achieve the same result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I don't understand why you are not filtering by blank user input.

I do, later, but I need all of the trips in order to create the "all trips for which a survey was prompted" information

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes to the data loading (including pushing it to scaffolding) in #135 - I have the filtered, unfiltered, and file suffix all returned from a function in scaffolding now - I don't have the debug_df or quality_text included since it varies chart to chart

viz_scripts/docker/environment36.dashboard.additions.yml Outdated Show resolved Hide resolved
Comment on lines +243 to +247
"tq = scaffolding.get_time_query(year, month)\n",
"all_confirmed_trips = scaffolding.load_all_confirmed_trips(tq)\n",
"#we need to filter out trips (based on if including test users)\n",
"all_confirmed_trips = scaffolding.filter_composite_trips(all_confirmed_trips, program, include_test_users)\n",
"\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should all go into scaffolding (similar to load_viz_notebook_sensor_inference_data)

Comment on lines +329 to +332
"#merge any cols with the same name into 1 col -- should have different values in their survey_name col\n",
"#https://stackoverflow.com/questions/24390645/python-pandas-merge-samed-name-columns-in-a-dataframe\n",
"def sjoin(x): return ';'.join(x[x.notnull()].astype(str))\n",
"df_responses = df_responses.groupby(level=0, axis=1).apply(lambda x: x.apply(sjoin, axis=1))"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for? If the responses are in different surveys, should they be combined into a single column?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this was in #124 (comment) but I don't fully understand why the column is duplicated. Shouldn't the survey id be part of the column name as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands, the survey id is not part of the column name - I could refactor to include it as a part of the column name and update the way that I translate the column names to be the chart titles

They should end up in different charts since we group and filter by the survey_name, which is its own column

viz_scripts/docker/crontab Show resolved Hide resolved
We do not need to manually install em-common since it was already installed in the base server image as part of
e-mission/e-mission-server#965
and
e-mission/e-mission-server@d33ce2e

So we removed it in 334c95b
But we do need to bump up the base image to include it
- use `.get` to check whether this is an enketo survey so that it works for
  older deployments that predate the `survey_info` functionality as well. This
  has no functional difference since, if there is no `survey_info`, this is not
  a survey. But we get a better exception and avoid confusion later.
- improve error handling for survey_responses, by splitting the attribute and
  name errors from the more complex errors
@shankari
Copy link
Contributor

shankari commented May 8, 2024

Testing done:

  • metrics work
    image
  • survey responses don't work, but that is because there are only TripConfirm entries in the dataset ?!
    image

@shankari shankari merged commit 5aa3434 into e-mission:main May 8, 2024
@Abby-Wheelis
Copy link
Member Author

survey responses don't work, but that is because there are only TripConfirm entries in the dataset ?!

What dataset were you using? 336 trips is a lot less than the 990 I have in the May 8th dataset, it it's old enough I would expect only to see TripConfirm survey entries, since I believe that is what we were using for our very preliminary internal testing - then we went to Roam/Return/Gas around the end of March

@Abby-Wheelis
Copy link
Member Author

This broke normal deployments on staging!!!

Screenshot 2024-05-08 at 9 49 21 AM

I'll work on a cleanup for this issue and the points mentioned above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Address code reviews with "future"
Development

Successfully merging this pull request may close these issues.

3 participants