Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change from pie charts to 100% stacked bar chart #123

Merged

Conversation

iantei
Copy link
Contributor

@iantei iantei commented Feb 26, 2024

  • Changes to transform single pie charts into collective 100% Stacked Bar Charts.
  • Merge generic_metrics_sensed and mode_specific_metrics into generic_metrics notebook. Delete generic_metrics_sensed and mode_specific_metrics notebook, also from crontab.

…data-sizex to 10 from 4 to adjust horizonatal length of the stacked bar charts. Update with new option values.
@iantei
Copy link
Contributor Author

iantei commented May 4, 2024

Comparing Laos-Chart comment to the above charts.
Few observations:

  1. All the label values for Labeled by user Trip Types are identical except the Other label's value which has been condensed in above charts.
  2. All the sensed values are identical.
  3. There is difference in Chart values for "Number of Trips (under 80th percentile of total trips)" since the cutoff values has been changed to become same as sensed ones in above case.

Overall, looks good.

@iantei
Copy link
Contributor Author

iantei commented May 4, 2024

@shankari
Encountered few issues:
Different variation of error description table/text loading for charts when the Month is selected as: 5/2024

Dataset used: openpath-prod-usaid-laos-ev-snapshot-dec-20

Chart Name Chart
Number of Trips by Purpose Number of Trips by Purpose
Number of Commute Trips Commute_Trip_Error
Under 80th Percentile Under 80th Percentile
Total Trip Length Total_Trip_Length_Error

@iantei
Copy link
Contributor Author

iantei commented May 4, 2024

UPDATE: @shankari I have lost access to the NREL system account.

@shankari
Copy link
Contributor

shankari commented May 4, 2024

@iantei That is largely expected because there is no data for 5/2024. As you can see, there were no trips in that month. We should not be displaying the backtrace instead of the missing data (see upcoming commit) but it is not really a showstopper since it wouldn't have worked anyway.

A bigger showstopper is the one that I highlighted here: #123 (comment)

While I am fine with working on code structure, I was really really hoping to not have to tweak pandas horizontal bar generation and spacing. My hope was that the visualization intern(s) would make the charts look pretty and I only needed to make the code look pretty.

@Abby-Wheelis
Copy link
Member

I'm starting to catch up with these changes to hopefully be able to incorporate the surveys smoothly, and on this part of the showstopper mentioned above -

The bar extends beyond 100% with a gray color that is not in the legend

I think this is the plot itself and not part of the bar (this is the background for the chart area and you can see a faint white gridline) so if we want it removed for aesthetic reasons we can do that but I don't think it represents a value/needs to be in the legend

@shankari
Copy link
Contributor

shankari commented May 4, 2024

I think this is the plot itself and not part of the bar (this is the background for the chart area and you can see a faint white gridline) so if we want it removed for aesthetic reasons we can do that but I don't think it represents a value/needs to be in the legend

I agree that it should not be in the legend. But I am not sure that it so easy to fix because we are computing the blocks to generate the horizontal bar. Is there a reason why we didn't just use pandas DataFrame.bar with stacked=True?

Also, the second point in the showstopper (the legend for the labeled trips is too big and overlaps the sensed) will be even more true for the survey changes, correct? Do you have any thoughts on how to change that? Maybe a horizontal legend instead of vertical?

@Abby-Wheelis
Copy link
Member

Also, the second point in the showstopper (the legend for the labeled trips is too big and overlaps the sensed) will be even more true for the survey changes, correct? Do you have any thoughts on how to change that? Maybe a horizontal legend instead of vertical?

I think that we'll put it horizontally, below the chart itself since the labels may have much longer text also, I anticipate this being the most difficult to get worked out visually when it comes to the surveys.

As far as using pandas Dataframe.bar I'm not sure if there is any particular reason why we did it this way, I have a general sense that the matplotlib method is a little easier to customize, but I haven't tried the pandas method, maybe that would work.

@shankari
Copy link
Contributor

shankari commented May 5, 2024

After improving error handing, testing done: set the date to one for which we have no data, including for the ones which showed a backtrace in #123 (comment)

image
image

Both alt_text and alt_html files have been created for the newly created stacked bar charts, not for the older bar charts

$ ls -al plots/*_2020_11* | grep "May *4"
-rw-r--r--  1 kshankar  staff  85045 May  4 18:59 plots/average_miles_mode_confirm_2020_11_default.png
-rw-r--r--  1 kshankar  staff    342 May  4 18:59 plots/average_miles_mode_confirm_2020_11_default.txt

-rw-r--r--  1 kshankar  staff    355 May  4 18:59 plots/ntrips_commute_mode_confirm_2020_11_default.html
-rw-r--r--  1 kshankar  staff  88215 May  4 18:59 plots/ntrips_commute_mode_confirm_2020_11_default.png
-rw-r--r--  1 kshankar  staff    355 May  4 18:59 plots/ntrips_commute_mode_confirm_2020_11_default.txt

-rw-r--r--  1 kshankar  staff  81393 May  4 18:59 plots/ntrips_per_day_2020_11_default.png
-rw-r--r--  1 kshankar  staff    320 May  4 18:59 plots/ntrips_per_day_2020_11_default.txt

-rw-r--r--  1 kshankar  staff  82616 May  4 18:59 plots/ntrips_per_weekday_2020_11_default.png
-rw-r--r--  1 kshankar  staff    324 May  4 18:59 plots/ntrips_per_weekday_2020_11_default.txt

-rw-r--r--  1 kshankar  staff    350 May  4 18:59 plots/ntrips_purpose_2020_11_default.html
-rw-r--r--  1 kshankar  staff  86380 May  4 18:59 plots/ntrips_purpose_2020_11_default.png
-rw-r--r--  1 kshankar  staff    350 May  4 18:59 plots/ntrips_purpose_2020_11_default.txt

-rw-r--r--  1 kshankar  staff    369 May  4 18:58 plots/ntrips_total_2020_11_default.html
-rw-r--r--  1 kshankar  staff  90649 May  4 18:58 plots/ntrips_total_2020_11_default.png
-rw-r--r--  1 kshankar  staff    369 May  4 18:58 plots/ntrips_total_2020_11_default.txt

-rw-r--r--  1 kshankar  staff    372 May  4 18:59 plots/ntrips_under80_2020_11_default.html
-rw-r--r--  1 kshankar  staff  91797 May  4 18:59 plots/ntrips_under80_2020_11_default.png
-rw-r--r--  1 kshankar  staff    370 May  4 18:59 plots/ntrips_under80_2020_11_default.txt

-rw-r--r--  1 kshankar  staff    339 May  4 18:59 plots/total_trip_length_2020_11_default.html
-rw-r--r--  1 kshankar  staff  84806 May  4 18:59 plots/total_trip_length_2020_11_default.png
-rw-r--r--  1 kshankar  staff    339 May  4 18:59 plots/total_trip_length_2020_11_default.txt

-rw-r--r--  1 kshankar  staff    374 May  4 18:59 plots/total_trip_length_land_2020_11_default.html
-rw-r--r--  1 kshankar  staff  93236 May  4 18:59 plots/total_trip_length_land_2020_11_default.png
-rw-r--r--  1 kshankar  staff    374 May  4 18:59 plots/total_trip_length_land_2020_11_default.txt

In f7f3590, we added try/catch blocks for
`AttributeError` and `pd.errors.UndefinedVariableError` to handle errors in
pre-processing. However, apparently these were removed during subsequent
changes/refactors.

Since we have now moved the pre-processing to the cell in
c16e4eb, this is even more important since
errors are very likely outside of the plot function.

In this change, we:
- recreate the merged_debug_df to use in cells where we plot both labeled and
  unlabeled data
- standardize the error handling by:
    - catching `KeyError` in addition to the others (consistent with observed behavior)
    - clearing the existing figure first so that we don't get two blank axes
    - plotting and generating the alt text with the same `plot_title_no_quality`
    - making sure to add alt_html to all of the except clauses for the new stacked bar charts

This fixes e-mission#123 (comment)

Testing done:
e-mission#123 (comment)
@shankari
Copy link
Contributor

shankari commented May 5, 2024

Unable to generate
Bar chart of Number of trips for each purpose (selected by users).
Reason: Number of trips is 0. Participant_with_at_least_one_labeled_trip is 0. Participants_with_at_least_one_trip is 0. Registered_participants is 15. Trips_with_at_least_one_label is 0. Trips_with_mode_confirm_label is 0. Trips_with_trip_purpose_label is 0. month is 11. year is 2020.

After

        <html>
        <body>
        <h2>Unable to generate
Bar chart of Number of trips for each purpose (selected by users). Reason:</h2>

    <table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Number of trips</th>
      <td>0</td>
    </tr>
    <tr>
      <th>Participant_with_at_least_one_labeled_trip</th>
      <td>0</td>
    </tr>
    <tr>
      <th>Participants_with_at_least_one_trip</th>
      <td>0</td>
    </tr>
    <tr>
      <th>Registered_participants</th>
      <td>15</td>
    </tr>
...

Screenshot:
Screenshot 2024-05-04 at 7 13 41 PM

In the heavy lift commit (12b00e3)
there were several TODOs that I hoped would be handled by the public dashboard team.
Since that wasn't possible due to internship timelines, I have handled at least
one, user visible TODO by displaying the missing HTML as HTML instead of text.

Testing done:
e-mission#123 (comment)
@shankari
Copy link
Contributor

shankari commented May 5, 2024

The solution to have the legend items side by side is to use nCols:
https://stackoverflow.com/a/54870776

@Abby-Wheelis I'm thinking that we might want to keep the legend for the regular bar charts in the same place, but just increase the number of columns. For surveys, we may want to put the legend below the bar (to accommodate longer options) but still have only one column. We could specify that as an option to the plot function.

Thoughts?

@shankari
Copy link
Contributor

shankari commented May 5, 2024

Here's the option with 'lower right'
image

Here's the option with 'lower left' and the anchor at (1,0) which is not too bad
image

Here's the option with the ncols
image

Since this will likely change as we include survey results, and then the inferred results, I am going to punt on this. I will leave in the n_cols calculation, but switch to lower left and (1,0) to unblock us now.

A temporary fix for now is to pin the bbox to the bottom and not the top.

I think that the real fix is to use ncols, because then it will work for both
sensed and labeled legends. What if we have more than 5 sensed values?

But we will need to revisit this anyway when we have inferred modes, and when
we display surveys, and when we use pandas barh, so punting on this and
implementing the temporary fix for now

Testing done:
e-mission#123 (comment)
@shankari
Copy link
Contributor

shankari commented May 5, 2024

We can't use pandas' barh because stacked doesn't see to work easily

>>> expanded_ct.groupby("Mode_confirm").agg({distance_col: 'count'}).reset_index().set_axis(["label", "value"], axis="columns").plot.barh(x="label", y="value", stacked=True)

generates
image

In lieu of investigating this further, let's just try to fix the gray value. This is displayed to avoid RuntimeError: Unknown return type

I tried to set the xlim to 100, because we know that it will never be more than that. However, 100, 100.01 and 100.1 and 101 all returned the same error. Leaving this for the pandas investigation...

@shankari
Copy link
Contributor

shankari commented May 5, 2024

One final fix, because the quality text annoys me and is confusing IMHO. Saying "For Labeled & Sensed: Based on X confirmed trips from Y users of X' trips from Y' users" is technically correct but requires cogitive load to determine which set of numbers goes with which bar, and what is labeled and what is confirmed. Going to try to fix that before declaring that this is done.

@Abby-Wheelis
Copy link
Member

The quality text is updated in the inferred changes! That puts it on each bar, where it says something like "Confirmed trips, 100 trips from 10 users 15%" or similar like this: comment - I agree that this quality text is confusing, I like what ended up on the inferred trips more, I think we had decided to hold off on that change in this PR to limit the changes here - but that was a while ago when the timeline wasn't as crunched

@shankari
Copy link
Contributor

shankari commented May 5, 2024

I don't want to change the base quality_text methods since they are used in other plots as well. Let's hack together a method to pull out only the fields that we want for now. The correct fix would be to return the quality text as a tuple and format it in the notebook, which is also why one should not put presentation layer logic in the libraries.

Used some regular expression magic hacking to pull these out.
I now have
image
image

While testing this with include_test_users=True I also found a regression in the scaffolding code.

def get_quality_text_sensed(df, cutoff_text="", include_test_users=False):

now has a cutoff_text argument. However, the invocation from scaffolding was still

    quality_text = get_quality_text_sensed(expanded_ct, include_test_users)

So the boolean True was interpreted as the cutoff_text and so the text that was generated was "Based on 2728 trips (True) from 13 users".

After fixing that, this option seems to work.

…d participants

```
def get_quality_text_sensed(df, cutoff_text="", include_test_users=False):
```

now has a `cutoff_text` argument. However, the invocation from scaffolding was still

```
    quality_text = get_quality_text_sensed(expanded_ct, include_test_users)
```

So the boolean `True` was interpreted as the cutoff_text and so the text that was generated was "Based on 2728 trips (True) from 13 users".

After fixing that, this option seems to work.

Testing done:
e-mission#123 (comment)
@shankari
Copy link
Contributor

shankari commented May 5, 2024

Made the appropriate changes to all the plots, although I kept some single-bar plots unchanged

image
image
image
image
image
image

Note that I had to remove the y-axis label "Trip Types" because:
a) it didn't add anything beyond what the axis label already had
b) it was overlapping with some of the label text, particular while including testers

image

@shankari
Copy link
Contributor

shankari commented May 5, 2024

Also does not crash for missing data
image

This makes is more clear what parts of the quality text correspond to which
bar.
- I left the commute plot untouched because there is only one bar, and
  recreating the quality text would be a bit complicated.
- I also left the non stacked bar chart plots untouched

Testing done:
e-mission#123 (comment)
e-mission#123 (comment)
This was fairly straightforward.
Since this now introduces pre-processing into the cell, we also need to copy
over the proper error handling.

I have chosen to keep the quality text untouched here, since all of these are
single-bar plots.

Testing done:
- Ran with and without data, notebook ran with no errors
@shankari
Copy link
Contributor

shankari commented May 5, 2024

I am going to declare that this PR is done, at least to the extent that I am able to spend time on it at this time.
Unless @iantei or @Abby-Wheelis has any objections, I plan to merge and push a release to staging

@shankari
Copy link
Contributor

shankari commented May 5, 2024

#128 (comment) does look cool! We should revisit the quality text construction code when we implement inferred modes. I am not super happy with that code.

@shankari
Copy link
Contributor

shankari commented May 6, 2024

I have now tested with three different configurations:

  • NREL commute (study, default config)

image
image

  • USAID Laos (study, custom config)

image
image
image

  • Ride2own (program, default config)

image
image
image
image

All run without errors. @iantei has already tested the core code extensively before. Since I only restructured the code, I have tested that my changes generate the same results as his. I think this is ready to merge.

@shankari
Copy link
Contributor

shankari commented May 6, 2024

I have been debating a bit on whether to squash merge this or just merge it with a merge commit.
Squash merging will have a cleaner merge history, but will also result in a signficantly more complex giant PR and will also lose some important back and forth around the discussions in the structure.

Let's go ahead and non-squash merge (we didn't squash merge for the UI rewrite either). But we should really be more careful about the plethora of commits in the future.

@shankari shankari merged commit a963ba0 into e-mission:main May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants