Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BedDays rework and slight performance gains #1392

Draft
wants to merge 47 commits into
base: master
Choose a base branch
from

Conversation

willGraham01
Copy link
Collaborator

@willGraham01 willGraham01 commented Jun 3, 2024

This PR both attempts to reduce the time-footprint of the BedDays class on the overall simulation, and reworks the class so that it can be tested as a standalone object and does not contain any circular references to other instances.

TODO:

  • Translate the old tests/test_beddays.py integration tests into either unit tests, or retain them as integration tests using the new BedDays class. File can be deleted or incorporated into test_bed_days.py.
  • The tests/test_HealthSystem.py file contains some references to now-removed attributes of the BedDays class, which will need to be cleared up.

Previous BedDays Tracking Method

The previous method of tracking bed days was:

  • 1 bool column in the population dataframe (hs_in_inpatient) which flagged if a person was an inpatient (they are occupying, or are scheduled to occupy, a bed).
  • 2 * (number of bed types) further Date (hs_next_{first/last}_day_in_bed_{bed_type}) columns in the population dataframe, which tracked when a person would next occupy/free a bed space.
  • (number of bed types) DataFrames ("trackers") within the BedDays class, each tracking 150 days into the future (with each day being stored as a row) and columns corresponding to the facility IDs of all facilities in the simulation. Entries in these dataframes were the number of beds available of the given type, on the given day, at the given facility.

This necessitated an expensive update step to the population dataframe every day (to update the hs_is_inpatient column by a multi-column operation), and also meant that the trackers were continually having rows appended, deleted, and reordered. Furthermore, facilities with no beds still occupied places in said trackers, when they would never be assigned to.

New Method of Tracking

The new method of tracking bed occupations introduces a dataclass BedOccupancy, and the BedDays class now tracks a list of such BedOccupancies rather than tracking all facilities a fixed number of days into the future. This means:

  • There is no longer a need for the hs_next_{first/last}_day_in_bed_{bed_type} columns in the population dataframe.
  • The hs_is_inpatient column can be updated slightly quicker. We may even be able to avoid performing the update step entirely on days that it is not requested.
  • There is no need for the "trackers". Instead, the BedDays class can produce a forecast for the number of available beds, any number of days into the future, based on the current occupancies that are scheduled.

Since BedDays is now tracking a list of individual objects, rather than using DataFrames, it would be worthwhile running the re-worked class in a large initial-population, large bed-capacity setting to check for any changes in performance due to the change in storage structure.

Other Updates

  • BedDays now contains unit tests in tests/test_bed_days.py, which can be conducted without tying the class to a simulation object.
  • The BedDaysFootprint class has been added. It provides a convenient wrapper for creating footprints, converting them to BedOccupancies, and also prevents modules from adding non-existent bed types to a requested footprint. The latter concern was previously being checked for in multiple places across the codebase, and these checks can (and have) been removed.

Performance

  • Profiling on edcced1 indicates that the footprint we were seeing from the BedDays class is now gone. The proportional time spent in HeathSystemScheduler.apply (which wraps all bed-days related code) has also fallen, implying the new functionality is indeed faster than the HEAD of master.
  • As expected, the population dataframe uses 8 fewer columns and ~5MB less memory.

@willGraham01
Copy link
Collaborator Author

/run profiling

@ucl-comment-bot
Copy link

Profiled run of the model failed ❌

🆔 25740868490
⏲️ 34.1 minutes
#️⃣ f3720c7

@willGraham01
Copy link
Collaborator Author

/run profiling

@ucl-comment-bot
Copy link

Profiled run of the model failed ❌

🆔 25933059488
⏲️ 29.9 minutes
#️⃣ 73c6c1a

@willGraham01
Copy link
Collaborator Author

/run profiling

@ucl-comment-bot
Copy link

Profiled run of the model succeeded ✅

🆔 25947662985
⏲️ 474 minutes
#️⃣ edcced1

@willGraham01 willGraham01 changed the title Wgraham/beddays rework BedDays rework and slight performance gains Jun 10, 2024
@tamuri tamuri added this to In progress in PR priorities Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
PR priorities
In progress
Development

Successfully merging this pull request may close these issues.

None yet

1 participant