Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replacing median with 50th percentile #97

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

oliversus
Copy link
Contributor

Method correct_times_median should provide median values for year, doy, and milliseconds to correct invalid timestamps. However, np.median will return floats, which could even be fractions due to np.median's internal interpolation method of the two closest values. I'd suggest to use np.percentile instead, where one can explicitly set the interpolation method. Setting this to "nearest" will always return the value closest to the statistical median, thus not changing its type.

np.median([1, 2, 3, 4, 5, 6])
returns 3.5
np.percentile([1, 2, 3, 4, 5, 6], 50, interpolation="nearest")
returns 3

This will also avoid a ValueError in method to_datetime64, line 449, where conversion to datetime64 is only possible with year as an integer.

@codecov
Copy link

codecov bot commented Sep 29, 2021

Codecov Report

Merging #97 (07d4de7) into main (a8b68a3) will increase coverage by 0.05%.
The diff coverage is 90.90%.

@@            Coverage Diff             @@
##             main      #97      +/-   ##
==========================================
+ Coverage   74.39%   74.44%   +0.05%     
==========================================
  Files          32       32              
  Lines        2882     2888       +6     
==========================================
+ Hits         2144     2150       +6     
  Misses        738      738              
Impacted Files Coverage Δ
pygac/reader.py 81.85% <85.71%> (+0.07%) ⬆️
pygac/tests/test_reader.py 98.36% <100.00%> (+0.01%) ⬆️

Copy link
Member

@sfinkens sfinkens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks! Your proposed solution looks good to me!

Regarding code cleanliness, I think it would make sense to create a helper method for the 50th percentile, so that the keyword args don't have to be repeatet.

def fiftieth_percentile(data):
    return np.percentile(data, 50, interpolation='nearest')

And please make sure to add a test ;)

@mraspaud
Copy link
Member

@oliversus do you have time to fix this soon?

@oliversus
Copy link
Contributor Author

@oliversus do you have time to fix this soon?

Hi Martin and Stephan,
sorry for ignoring this for so long. I do have a bit of time right now and will try to implement this as Stephan suggested.
Cheers
Olli

Copy link
Member

@sfinkens sfinkens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thanks a lot!

@@ -820,7 +820,26 @@ def get_angles(self):
arr[self.mask] = np.nan

return sat_azi, sat_zenith, sun_azi, sun_zenith, rel_azi

def _get_true_median(self, input_array, q=50, interpolation='nearest'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait :D If you have some spare time left...

This method doesn't access any class attribute, so I'd propose to move it out of the class to the module level. Or maybe to pygac.utils?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants