Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: skipna parameter for averager #582

Open
lee1043 opened this issue Dec 14, 2023 · 6 comments · May be fixed by #655
Open

[Feature]: skipna parameter for averager #582

lee1043 opened this issue Dec 14, 2023 · 6 comments · May be fixed by #655
Labels
type: enhancement New enhancement request

Comments

@lee1043
Copy link
Collaborator

lee1043 commented Dec 14, 2023

Is your feature request related to a problem?

skipna=None parameter is being used in xarray's mean function to allow user to decide whether skip NaN values in averaging (thus average will be calculated using non-NaN values) or just return NaN for average when there are any NaN values used.

https://docs.xarray.dev/en/stable/generated/xarray.DataArray.mean.html

Describe the solution you'd like

Convey skipna key to here:

weighted_mean = data_var.cf.weighted(weights).mean(dim=dim)

Similar to temporal average functions when .mean being used.

Describe alternatives you've considered

No response

Additional context

It would be even more helpful if users could set some criteria. For example, letting the user decide the fraction of NaN values.

Let's say, I have 10 values, which include 2 NaNs. I want to get an average with skipna=True. But when having 3 NaN values, I want to average to be NaN.

This is going to help the obs4MIPs process when handling with time-varying NaN values due to missed observation points.

@tomvothecoder
Copy link
Collaborator

Thanks for this feature suggestion Jiwoo. I agree, we should have a skipna flag to replicate what Xarray offers.

Additional context

It would be even more helpful if users could set some criteria. For example, letting the user decide the fraction of NaN values.

Let's say, I have 10 values, which include 2 NaNs. I want to get an average with skipna=True. But when having 3 NaN values, I want to average to be NaN.

This is going to help the obs4MIPs process when handling with time-varying NaN values due to missed observation points.

This sounds similar to the weight_threshold feature mentioned here #531.

Can you provide some pseudo-code? Better yet, a prototype Python implementation would be great.

@tomvothecoder tomvothecoder added the type: enhancement New enhancement request label Dec 19, 2023
@tomvothecoder
Copy link
Collaborator

I think an alternative solution to skipna is for the user to drop nan values before calculating the average. @pochedls any thoughts for this specific enhancement?

@pochedls
Copy link
Collaborator

I'm wondering if this would work. If we were dealing with time series:

ds.time = ["2010-01-01", "2010-02-01", "2010-03-01", "2010-04-01"]
ds.ts = [1, 2, np.nan, 4]

I think dropping the NaN would also drop the time point, which would create problems for a lot of applications. If I instead had a [lat, lon] matrix:

ts = [[1, 2, 3],
      [4, 5, 6],
      [7, np.nan, 9]]

I'm not sure how this would work. What would the ts matrix shape be – it would no longer be a [lat, lon] grid?

Or am I thinking about this the wrong way?

@lee1043
Copy link
Collaborator Author

lee1043 commented May 28, 2024

@tomvothecoder @pochedls sorry that I haven't fully followed this, but just wondering if there to be any chance to follow upon this as Celine reached out for the same issue -- she wants to operate a spatial average while the data has NaN included.

@pochedls
Copy link
Collaborator

@lee1043 – I don't think I can work on this soon. This could be an easy PR (or "dev day" issue) depending on the complexity of the implementation. There might also be work arounds using get_weights (and the computing the mean yourself).

@tomvothecoder
Copy link
Collaborator

@tomvothecoder @pochedls sorry that I haven't fully followed this, but just wondering if there to be any chance to follow upon this as Celine reached out for the same issue -- she wants to operate a spatial average while the data has NaN included.

If you or somebody else can provide pseudo-code or a prototype Python implementation it can help speed up the implementation process for whenever @pochedls or I (or somebody else) has time. My dev time for new xCDAT features will be limited for the next few months because of conferences and other priorities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New enhancement request
Projects
Status: In Review
Development

Successfully merging a pull request may close this issue.

3 participants