Change from pie charts to 100% stacked bar chart #123

iantei · 2024-02-26T02:50:20Z

Changes to transform single pie charts into collective 100% Stacked Bar Charts.
Merge generic_metrics_sensed and mode_specific_metrics into generic_metrics notebook. Delete generic_metrics_sensed and mode_specific_metrics notebook, also from crontab.

… Count and Proportion.

…etrics notebook.

…ed mode.

…metrics notebook.

…etrics notebook.

…e structure.

…data-sizex to 10 from 4 to adjust horizonatal length of the stacked bar charts. Update with new option values.

iantei · 2024-05-04T04:04:31Z

Comparing Laos-Chart comment to the above charts.
Few observations:

All the label values for Labeled by user Trip Types are identical except the Other label's value which has been condensed in above charts.
All the sensed values are identical.
There is difference in Chart values for "Number of Trips (under 80th percentile of total trips)" since the cutoff values has been changed to become same as sensed ones in above case.

Overall, looks good.

iantei · 2024-05-04T04:32:24Z

@shankari
Encountered few issues:
Different variation of error description table/text loading for charts when the Month is selected as: 5/2024

Dataset used: openpath-prod-usaid-laos-ev-snapshot-dec-20

Chart Name	Chart
Number of Trips by Purpose
Number of Commute Trips
Under 80th Percentile
Total Trip Length

iantei · 2024-05-04T06:10:55Z

UPDATE: @shankari I have lost access to the NREL system account.

shankari · 2024-05-04T19:28:29Z

@iantei That is largely expected because there is no data for 5/2024. As you can see, there were no trips in that month. We should not be displaying the backtrace instead of the missing data (see upcoming commit) but it is not really a showstopper since it wouldn't have worked anyway.

A bigger showstopper is the one that I highlighted here: #123 (comment)

While I am fine with working on code structure, I was really really hoping to not have to tweak pandas horizontal bar generation and spacing. My hope was that the visualization intern(s) would make the charts look pretty and I only needed to make the code look pretty.

Abby-Wheelis · 2024-05-04T19:38:58Z

I'm starting to catch up with these changes to hopefully be able to incorporate the surveys smoothly, and on this part of the showstopper mentioned above -

The bar extends beyond 100% with a gray color that is not in the legend

I think this is the plot itself and not part of the bar (this is the background for the chart area and you can see a faint white gridline) so if we want it removed for aesthetic reasons we can do that but I don't think it represents a value/needs to be in the legend

shankari · 2024-05-04T20:48:51Z

I think this is the plot itself and not part of the bar (this is the background for the chart area and you can see a faint white gridline) so if we want it removed for aesthetic reasons we can do that but I don't think it represents a value/needs to be in the legend

I agree that it should not be in the legend. But I am not sure that it so easy to fix because we are computing the blocks to generate the horizontal bar. Is there a reason why we didn't just use pandas DataFrame.bar with stacked=True?

Also, the second point in the showstopper (the legend for the labeled trips is too big and overlaps the sensed) will be even more true for the survey changes, correct? Do you have any thoughts on how to change that? Maybe a horizontal legend instead of vertical?

Abby-Wheelis · 2024-05-04T20:54:06Z

Also, the second point in the showstopper (the legend for the labeled trips is too big and overlaps the sensed) will be even more true for the survey changes, correct? Do you have any thoughts on how to change that? Maybe a horizontal legend instead of vertical?

I think that we'll put it horizontally, below the chart itself since the labels may have much longer text also, I anticipate this being the most difficult to get worked out visually when it comes to the surveys.

As far as using pandas Dataframe.bar I'm not sure if there is any particular reason why we did it this way, I have a general sense that the matplotlib method is a little easier to customize, but I haven't tried the pandas method, maybe that would work.

To be consistent with e-mission#86 (comment)

shankari · 2024-05-05T01:41:05Z

After improving error handing, testing done: set the date to one for which we have no data, including for the ones which showed a backtrace in #123 (comment)

Both alt_text and alt_html files have been created for the newly created stacked bar charts, not for the older bar charts

$ ls -al plots/*_2020_11* | grep "May *4"
-rw-r--r--  1 kshankar  staff  85045 May  4 18:59 plots/average_miles_mode_confirm_2020_11_default.png
-rw-r--r--  1 kshankar  staff    342 May  4 18:59 plots/average_miles_mode_confirm_2020_11_default.txt

-rw-r--r--  1 kshankar  staff    355 May  4 18:59 plots/ntrips_commute_mode_confirm_2020_11_default.html
-rw-r--r--  1 kshankar  staff  88215 May  4 18:59 plots/ntrips_commute_mode_confirm_2020_11_default.png
-rw-r--r--  1 kshankar  staff    355 May  4 18:59 plots/ntrips_commute_mode_confirm_2020_11_default.txt

-rw-r--r--  1 kshankar  staff  81393 May  4 18:59 plots/ntrips_per_day_2020_11_default.png
-rw-r--r--  1 kshankar  staff    320 May  4 18:59 plots/ntrips_per_day_2020_11_default.txt

-rw-r--r--  1 kshankar  staff  82616 May  4 18:59 plots/ntrips_per_weekday_2020_11_default.png
-rw-r--r--  1 kshankar  staff    324 May  4 18:59 plots/ntrips_per_weekday_2020_11_default.txt

-rw-r--r--  1 kshankar  staff    350 May  4 18:59 plots/ntrips_purpose_2020_11_default.html
-rw-r--r--  1 kshankar  staff  86380 May  4 18:59 plots/ntrips_purpose_2020_11_default.png
-rw-r--r--  1 kshankar  staff    350 May  4 18:59 plots/ntrips_purpose_2020_11_default.txt

-rw-r--r--  1 kshankar  staff    369 May  4 18:58 plots/ntrips_total_2020_11_default.html
-rw-r--r--  1 kshankar  staff  90649 May  4 18:58 plots/ntrips_total_2020_11_default.png
-rw-r--r--  1 kshankar  staff    369 May  4 18:58 plots/ntrips_total_2020_11_default.txt

-rw-r--r--  1 kshankar  staff    372 May  4 18:59 plots/ntrips_under80_2020_11_default.html
-rw-r--r--  1 kshankar  staff  91797 May  4 18:59 plots/ntrips_under80_2020_11_default.png
-rw-r--r--  1 kshankar  staff    370 May  4 18:59 plots/ntrips_under80_2020_11_default.txt

-rw-r--r--  1 kshankar  staff    339 May  4 18:59 plots/total_trip_length_2020_11_default.html
-rw-r--r--  1 kshankar  staff  84806 May  4 18:59 plots/total_trip_length_2020_11_default.png
-rw-r--r--  1 kshankar  staff    339 May  4 18:59 plots/total_trip_length_2020_11_default.txt

-rw-r--r--  1 kshankar  staff    374 May  4 18:59 plots/total_trip_length_land_2020_11_default.html
-rw-r--r--  1 kshankar  staff  93236 May  4 18:59 plots/total_trip_length_land_2020_11_default.png
-rw-r--r--  1 kshankar  staff    374 May  4 18:59 plots/total_trip_length_land_2020_11_default.txt

In f7f3590, we added try/catch blocks for `AttributeError` and `pd.errors.UndefinedVariableError` to handle errors in pre-processing. However, apparently these were removed during subsequent changes/refactors. Since we have now moved the pre-processing to the cell in c16e4eb, this is even more important since errors are very likely outside of the plot function. In this change, we: - recreate the merged_debug_df to use in cells where we plot both labeled and unlabeled data - standardize the error handling by: - catching `KeyError` in addition to the others (consistent with observed behavior) - clearing the existing figure first so that we don't get two blank axes - plotting and generating the alt text with the same `plot_title_no_quality` - making sure to add alt_html to all of the except clauses for the new stacked bar charts This fixes e-mission#123 (comment) Testing done: e-mission#123 (comment)

shankari · 2024-05-05T02:13:52Z

Unable to generate
Bar chart of Number of trips for each purpose (selected by users).
Reason: Number of trips is 0. Participant_with_at_least_one_labeled_trip is 0. Participants_with_at_least_one_trip is 0. Registered_participants is 15. Trips_with_at_least_one_label is 0. Trips_with_mode_confirm_label is 0. Trips_with_trip_purpose_label is 0. month is 11. year is 2020.

After

        <html>
        <body>
        <h2>Unable to generate
Bar chart of Number of trips for each purpose (selected by users). Reason:</h2>

    <table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Number of trips</th>
      <td>0</td>
    </tr>
    <tr>
      <th>Participant_with_at_least_one_labeled_trip</th>
      <td>0</td>
    </tr>
    <tr>
      <th>Participants_with_at_least_one_trip</th>
      <td>0</td>
    </tr>
    <tr>
      <th>Registered_participants</th>
      <td>15</td>
    </tr>
...

Screenshot:

In the heavy lift commit (12b00e3) there were several TODOs that I hoped would be handled by the public dashboard team. Since that wasn't possible due to internship timelines, I have handled at least one, user visible TODO by displaying the missing HTML as HTML instead of text. Testing done: e-mission#123 (comment)

shankari · 2024-05-05T02:39:06Z

The solution to have the legend items side by side is to use nCols:
https://stackoverflow.com/a/54870776

@Abby-Wheelis I'm thinking that we might want to keep the legend for the regular bar charts in the same place, but just increase the number of columns. For surveys, we may want to put the legend below the bar (to accommodate longer options) but still have only one column. We could specify that as an option to the plot function.

Thoughts?

shankari · 2024-05-05T03:08:04Z

Here's the option with 'lower right'

Here's the option with 'lower left' and the anchor at (1,0) which is not too bad

Here's the option with the ncols

Since this will likely change as we include survey results, and then the inferred results, I am going to punt on this. I will leave in the n_cols calculation, but switch to lower left and (1,0) to unblock us now.

A temporary fix for now is to pin the bbox to the bottom and not the top. I think that the real fix is to use ncols, because then it will work for both sensed and labeled legends. What if we have more than 5 sensed values? But we will need to revisit this anyway when we have inferred modes, and when we display surveys, and when we use pandas barh, so punting on this and implementing the temporary fix for now Testing done: e-mission#123 (comment)

shankari · 2024-05-05T03:28:15Z

We can't use pandas' barh because stacked doesn't see to work easily

>>> expanded_ct.groupby("Mode_confirm").agg({distance_col: 'count'}).reset_index().set_axis(["label", "value"], axis="columns").plot.barh(x="label", y="value", stacked=True)

generates

In lieu of investigating this further, let's just try to fix the gray value. This is displayed to avoid RuntimeError: Unknown return type

I tried to set the xlim to 100, because we know that it will never be more than that. However, 100, 100.01 and 100.1 and 101 all returned the same error. Leaving this for the pandas investigation...

shankari · 2024-05-05T03:31:34Z

One final fix, because the quality text annoys me and is confusing IMHO. Saying "For Labeled & Sensed: Based on X confirmed trips from Y users of X' trips from Y' users" is technically correct but requires cogitive load to determine which set of numbers goes with which bar, and what is labeled and what is confirmed. Going to try to fix that before declaring that this is done.

Abby-Wheelis · 2024-05-05T03:41:18Z

The quality text is updated in the inferred changes! That puts it on each bar, where it says something like "Confirmed trips, 100 trips from 10 users 15%" or similar like this: comment - I agree that this quality text is confusing, I like what ended up on the inferred trips more, I think we had decided to hold off on that change in this PR to limit the changes here - but that was a while ago when the timeline wasn't as crunched

shankari · 2024-05-05T04:41:51Z

I don't want to change the base quality_text methods since they are used in other plots as well. Let's hack together a method to pull out only the fields that we want for now. The correct fix would be to return the quality text as a tuple and format it in the notebook, which is also why one should not put presentation layer logic in the libraries.

Used some regular expression magic hacking to pull these out.
I now have

While testing this with include_test_users=True I also found a regression in the scaffolding code.

def get_quality_text_sensed(df, cutoff_text="", include_test_users=False):

now has a cutoff_text argument. However, the invocation from scaffolding was still

    quality_text = get_quality_text_sensed(expanded_ct, include_test_users)

So the boolean True was interpreted as the cutoff_text and so the text that was generated was "Based on 2728 trips (True) from 13 users".

After fixing that, this option seems to work.

…d participants ``` def get_quality_text_sensed(df, cutoff_text="", include_test_users=False): ``` now has a `cutoff_text` argument. However, the invocation from scaffolding was still ``` quality_text = get_quality_text_sensed(expanded_ct, include_test_users) ``` So the boolean `True` was interpreted as the cutoff_text and so the text that was generated was "Based on 2728 trips (True) from 13 users". After fixing that, this option seems to work. Testing done: e-mission#123 (comment)

shankari · 2024-05-05T05:18:06Z

Made the appropriate changes to all the plots, although I kept some single-bar plots unchanged

Note that I had to remove the y-axis label "Trip Types" because:
a) it didn't add anything beyond what the axis label already had
b) it was overlapping with some of the label text, particular while including testers

shankari · 2024-05-05T05:20:26Z

Also does not crash for missing data

This makes is more clear what parts of the quality text correspond to which bar. - I left the commute plot untouched because there is only one bar, and recreating the quality text would be a bit complicated. - I also left the non stacked bar chart plots untouched Testing done: e-mission#123 (comment) e-mission#123 (comment)

This was fairly straightforward. Since this now introduces pre-processing into the cell, we also need to copy over the proper error handling. I have chosen to keep the quality text untouched here, since all of these are single-bar plots. Testing done: - Ran with and without data, notebook ran with no errors

shankari · 2024-05-05T05:48:07Z

I am going to declare that this PR is done, at least to the extent that I am able to spend time on it at this time.
Unless @iantei or @Abby-Wheelis has any objections, I plan to merge and push a release to staging

shankari · 2024-05-05T06:11:40Z

#128 (comment) does look cool! We should revisit the quality text construction code when we implement inferred modes. I am not super happy with that code.

shankari · 2024-05-06T00:54:20Z

I have now tested with three different configurations:

NREL commute (study, default config)

USAID Laos (study, custom config)

Ride2own (program, default config)

All run without errors. @iantei has already tested the core code extensively before. Since I only restructured the code, I have tested that my changes generate the same results as his. I think this is ready to merge.

shankari · 2024-05-06T01:30:14Z

I have been debating a bit on whether to squash merge this or just merge it with a merge commit.
Squash merging will have a cleaner merge history, but will also result in a signficantly more complex giant PR and will also lose some important back and forth around the discussions in the structure.

Let's go ahead and non-squash merge (we didn't squash merge for the UI rewrite either). But we should really be more careful about the plethora of commits in the future.

iantei added 30 commits February 25, 2024 18:29

Function to create 100% Stacked Bar Chart

581b492

Add calculate_pct(labels, values) to calculate the proportion.

d6f1b82

Introduce a process_trip_data() to create a dataframe with cols Mode,…

5ddc396

… Count and Proportion.

Merge list of all dataframes into one.

1165562

Remove functions related with Pie Chart from plots.py.

2e9ad7d

Remove code and markdown blocks related with Pie Chart from generic_m…

a55e9f6

…etrics notebook.

Add sensed_algo_prefix for generic_metrics notebook.

7e71308

Added load_viz_notebook_sensor_inference_data() function in generic m…

45db963

…etrics notebook.

Filter for mode_of_interest.

ec9037b

Remove Generic Metrics markdown.

ebeb9dd

Markdown and code for 100% Stacked Bar Charts based on Number of Trips.

a1283bb

2. 100% Stacked Bar Charts representing 80th of number of trips.

441019c

3. 100% Stacked Bar Charts representing commute trips.

e53c0cc

Clean up.

eb3dcfc

4. 100% Stacked Bar Charts representing Distance by Mode.

e582011

5. 100% Stacked Bar Charts representing Count by Purpose.

5fedbb0

6. 100% Stacked Bar Charts representing Replaced mode.

f31e43e

Update markdown for 6.

7b6cef2

7. 100% Stacked Bar Charts representing Trip Distance based on Replac…

428982c

…ed mode.

Update Markdown text.

119d707

Copied the bar plots from generic_metrics_sensed notebook to generic_…

1c5e00a

…metrics notebook.

Copied the bar plots from mode_specific_metrics notebook to generic_m…

4d58d4c

…etrics notebook.

Added Store alt text function for stacked bar charts. Added html tabl…

7f31c5d

…e structure.

Added alt_text for stacked bar charts in generic metric notebook.

3388258

Retrieve htmlFile in index.html.

11345f6

Update data-sizex to 8 from 4 for ntrips_mode_confirm.

2f7b0da

Add X and Y axis labels.

c17027d

Update the quality text for sensed in scaffolding.py

5d83605

Update the default loaded configs.

f91f379

Remove older file names and option values associated with it. Update …

339f0cd

…data-sizex to 10 from 4 to adjust horizonatal length of the stacked bar charts. Update with new option values.

Abby-Wheelis mentioned this pull request May 4, 2024

Adding Survey Responses to Public Dashboard #124

Merged

♻️ More clearly delineate the pre-processing and plotting

c16e4eb

To be consistent with e-mission#86 (comment)

shankari added 2 commits May 4, 2024 22:28

shankari merged commit a963ba0 into e-mission:main May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change from pie charts to 100% stacked bar chart #123

Change from pie charts to 100% stacked bar chart #123

iantei commented Feb 26, 2024 •

edited

Loading

iantei commented May 4, 2024

iantei commented May 4, 2024 •

edited

Loading

iantei commented May 4, 2024

shankari commented May 4, 2024 •

edited

Loading

Abby-Wheelis commented May 4, 2024

shankari commented May 4, 2024

Abby-Wheelis commented May 4, 2024

shankari commented May 5, 2024 •

edited

Loading

shankari commented May 5, 2024

shankari commented May 5, 2024

shankari commented May 5, 2024

shankari commented May 5, 2024

shankari commented May 5, 2024

Abby-Wheelis commented May 5, 2024

shankari commented May 5, 2024

shankari commented May 5, 2024

shankari commented May 5, 2024

shankari commented May 5, 2024

shankari commented May 5, 2024

shankari commented May 6, 2024

shankari commented May 6, 2024

Change from pie charts to 100% stacked bar chart #123

Change from pie charts to 100% stacked bar chart #123

Conversation

iantei commented Feb 26, 2024 • edited Loading

iantei commented May 4, 2024

iantei commented May 4, 2024 • edited Loading

iantei commented May 4, 2024

shankari commented May 4, 2024 • edited Loading

Abby-Wheelis commented May 4, 2024

shankari commented May 4, 2024

Abby-Wheelis commented May 4, 2024

shankari commented May 5, 2024 • edited Loading

shankari commented May 5, 2024

shankari commented May 5, 2024

shankari commented May 5, 2024

shankari commented May 5, 2024

shankari commented May 5, 2024

Abby-Wheelis commented May 5, 2024

shankari commented May 5, 2024

shankari commented May 5, 2024

shankari commented May 5, 2024

shankari commented May 5, 2024

shankari commented May 5, 2024

shankari commented May 6, 2024

shankari commented May 6, 2024

iantei commented Feb 26, 2024 •

edited

Loading

iantei commented May 4, 2024 •

edited

Loading

shankari commented May 4, 2024 •

edited

Loading

shankari commented May 5, 2024 •

edited

Loading