Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] IDSSE-306: round_half_away utility #7

Closed
wants to merge 6 commits into from

Conversation

mackenzie-grimes-noaa
Copy link
Contributor

@mackenzie-grimes-noaa mackenzie-grimes-noaa commented Aug 11, 2023

Linear Task

IDSSE-306

Changes

  • Add round_half_away() function to util.py
    • Accepts optional parameter precision to control how many decimal places to preserve
    • Uses math.floor() or math.ceil() to enforce "round half away from zero", instead of the default behavior of python's built-in round(), which is "round half to even", a.k.a. "banker's rounding".
    • For example:
      • round_half_up(2.5) = 3, where round(2.5) = 2
      • round_half_up(-2.5) = -3, where round(-2.5) = 2
    • Why? Python is attempting to reduce bias towards larger numbers when rounding.
      • Floating point numbers have innate imprecision in how they are stored in computers. Floating point guide describes the cause in more detail
  • Unit tests included, over 80% coverage for this file
    • Had to add tests for a few pre-existing utils functions too

@mackenzie-grimes-noaa
Copy link
Contributor Author

Adding test coverage report here, since our CI doesn't have it replicating to the README or anything yet:

(.venv) mackenzie.grimes@macgrimes-lt idsse-engine-commons % cd python/idsse_common/idsse/common
(.venv) mackenzie.grimes@macgrimes-lt common % pytest ../../test --cov=. --cov-report=term-missing

Name                 Stmts   Miss  Cover   Missing
--------------------------------------------------
__init__.py              0      0   100%
aws_utils.py            78      6    92%   136, 145-146, 168-169, 187
config.py               74      9    88%   72-74, 84, 89-91, 100, 126
json_message.py         22     22     0%   12-70
log_util.py             38     38     0%   12-76
path_builder.py        127     10    92%   41, 44, 95, 100, 253, 256, 259, 323, 343-344
publish_confirm.py     141    141     0%   4-345
utils.py                82     11    87%   57, 60, 67, 70-71, 92-94, 97, 100-101
--------------------------------------------------
TOTAL                  562    237    58%

Copy link
Contributor

@Geary-Layne Geary-Layne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -166,3 +167,19 @@ def datetime_gen(dt_: datetime,
for i in range(0, max_num):
logger.debug('dt generator %d/%d', i, max_num)
yield dt_ + time_delta * i


def round_half_up(number: float, precision: int = 0) -> Union[int, float]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a required task, but do you know the timing for this function vs the ceil/floor approach vs the built in round? I understand that the ceil/floor only works with precision=0, but that is way well use round for GridProj. Most important is that we have function that behaves the way we need, I'm just curious to know if there will be any real impact in speed.

Copy link
Contributor Author

@mackenzie-grimes-noaa mackenzie-grimes-noaa Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decimal has been called out to be a bit slower, yes (for example in this Stack Overflow discussion). Although I didn't find it to be 100x slower than round() like it may have once been; in python 3.11 it seems to be only 1.5-2x times slower on average.

  • This was with 1 million randomly generated floats between negative and positive 1 million, and I timed each function rounding all the floats to 6 decimal places. Code snippet here
% py round_benchmark.py
[INFO] Running benchmark with dataset size 100000
[INFO] 
[INFO] |    Method    |   Avg (ns)   |   Min (ns)   |   Max (ns)   | Std Dev (ns) 
[INFO] | ------------ | ------------ | ------------ | ------------ | ------------ 
[INFO] | round        | 1020         | 708          | 19208        | 245          
[INFO] | math.ceil    | 667          | 459          | 33417        | 338          
[INFO] | decimal      | 1431         | 1167         | 26542        | 325     

For the sake of having the precision parameter functionality I think it might be acceptable, unless there's a way to use math.ceil and also support decimal precision, which looks to be faster than anything else.

Copy link
Contributor Author

@mackenzie-grimes-noaa mackenzie-grimes-noaa Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: added a hybrid function that uses both Decimal if we need precision and math.ceil/floor if we don't. I also tweaked my benchmarking to round to 0 decimal places 80% of the time--I had previously had the functions always round to 6 decimal places.

The hybrid option is now even faster than the built-in round, on average:

[INFO] Running benchmark with dataset size 100000
[INFO] Dataset was rounded to precision 0: 80.12%, precision 6: 19.88% of tests
[INFO] 
[INFO] |    Method    |   Avg (ns)   |   Min (ns)   |   Max (ns)   | Std Dev (ns) 
[INFO] | ------------ | ------------ | ------------ | ------------ | ------------ 
[INFO] | round        | 608          | 375          | 22625        | 272          
[INFO] | math.ceil    | 421          | 250          | 18833        | 266          
[INFO] | decimal      | 1035         | 750          | 19500        | 349          
[INFO] | hybrid       | 509          | 208          | 14125        | 465          

This is of course "faster" due to the assumption I made--that actual usage of the function will be "round to the nearest int" more often than it will be "round to x decimal places".

If in reality we never used this function to round to the nearest int, the hybrid would be just as slow as the regular decimal function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to the update:

Unfortunately this benchmarking was all a bit of a waste. I figured out how to round with precision using only math.ceiling() and math.floor().

The pseudocode is essentially this:

1. Multiply by 10 ^ <decimal_places> to slide decimal point over
2. If decimal place is now at least 0.5, round away from zero. Otherwise round towards zero
3. Divide by 10 ^ < decimal_places> to slide the decimal point back to its original spot

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for add enough tests to get coverage up.

@mackenzie-grimes-noaa mackenzie-grimes-noaa changed the title [feat] IDSSE-306: round_half_up utility [feat] IDSSE-306: round_half_away utility Aug 15, 2023
@mackenzie-grimes-noaa
Copy link
Contributor Author

Pylint is failing in the main branch (another task exists to resolve all the warnings), not the fault of these changes.

@mackenzie-grimes-noaa
Copy link
Contributor Author

Closing this PR since it has been made redundant by PR #5 (these changes were merged to main by that PR already)

@mackenzie-grimes-noaa mackenzie-grimes-noaa deleted the feat/idsse-306/round-half-up branch September 18, 2023 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants