Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8pt] Upgrade agg_by_huc logging #1325

Open
RobHanna-NOAA opened this issue Oct 20, 2024 · 0 comments
Open

[8pt] Upgrade agg_by_huc logging #1325

RobHanna-NOAA opened this issue Oct 20, 2024 · 0 comments
Labels
enhancement New feature or request FIM4

Comments

@RobHanna-NOAA
Copy link
Contributor

We see weird performance issues with aggregate_by_huc.py, but it is not yet clear on why.

I would like to add a very specific logging upgrade aimed at two things: Watching for key variables at key times, upgrade on how we track errors and add a separate huc processing duration log.

  1. Let's bolt in the new src/utilts/fim_logger.py system. It is now running in the CatFIM system. I can show you how to plug it in. Inherently, it automatically creates a basic logging file, an separate warning and a separate error log files.

  2. We also want to add a separate "duration-by-huc" csv file which is not a feature of the logging system. agg by huc uses multi proc. We will want to add a system called a "futures" system into the multi-proc. I don't think we have it anywhere else but it could help in other places. It exactly solves a problem we have here.

    • For each multi-proc, you can have that process send back values. From each future (return from each huc mp), we want it to return the huc number, a true/false if error exists and run time duration in seconds.
    • ensure that there is a try/except inside the function used for MP (aka. one per HUC). I can show you some tricks to ensure we always get the logs we need plus the right futures return value. In the try / except, if the error is huc related, trap it, log it and continue to return the "futures" values. If it is catastrophic where we need to stop the entire app, you can re-throw the error to the processpool, who can shut down the entire tool. I can show you how.
    • For each future returned (from each HUC mp), we add it to a list of dictionaries outside the MP block.
  3. When Process Pool MP set is done, output a independent csv file purely based on those durations. Columns requested:

    • HUC (watch for zero padding)
    • Success (True / False or some variant of that)
    • duration based in time of mm:ss (skip hours)
    • duration based on 10 base of time, ie) time as a percentage. Where normal time would show 4:30 (4 min 30 seconds), this column would show 4.50. This percent based makes it easy for doing averaging and summing later if we want.

The additional "duration-by-huc" part will help tell us which HUCs are being slower then others. There is also a suspicion the problem might not be so much HUC related but something about the code itself. ie) it is possible that it slows down as each huc progresses if there is a memory leak or object that is not cleaning up fast enough.

@RobHanna-NOAA RobHanna-NOAA self-assigned this Oct 20, 2024
@RobHanna-NOAA RobHanna-NOAA added enhancement New feature or request FIM4 labels Oct 20, 2024
@RobHanna-NOAA RobHanna-NOAA removed their assignment Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request FIM4
Projects
None yet
Development

No branches or pull requests

1 participant