Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch from _linregress to da.polyfit? #177

Open
semvijverberg opened this issue Nov 14, 2023 · 3 comments
Open

switch from _linregress to da.polyfit? #177

semvijverberg opened this issue Nov 14, 2023 · 3 comments

Comments

@semvijverberg
Copy link
Member

  • Firstly, We noticed that the current implementation of preprocess does not support dask arrays.
  • Secondly, I'm now noticing that preprocess._linregress returns NaNs while da.polyfit does work. That might be related to a floating point precision issue.

This is the input data before it enters preprocess._detrend():
image

This is what I tried:

slope, intercept = xr.apply_ufunc(
        preprocess._linregress,
        data["time"].astype(float),
        data.astype(float),
        input_core_dims=[["time"], ["time"]],
        output_core_dims=[[], []],
        vectorize=True,
    )

The slope & intercept is all NaN.

While:

coeffs = data.polyfit(dim="time", deg=1, skipna=True)["polyfit_coefficients"]

coeffs.sel(degree=1).plot():
image

I don't know why this goes wrong..
It was able to fit at this floating precision before.
These are some trend values for daily soil moisture in EU:
image

@semvijverberg
Copy link
Member Author

Oeh, I found a bug, some years were all NaN, this means there no timeseries at all with no NaNs and then the linregress with return all NaNs as well.

Maybe nice to add a check for this type of error.

@geek-yang
Copy link
Member

Hi Sem, not sure if I get it correctly. I just tested the old implementation of preprocess (still using scipy.stats.linregress(x, y)) with a sst dataset, where I manually turn all values to np.nan for one time step. The function still gives me meaningful values for both slope and intercept. 😅I will put my minimum example here.

@geek-yang
Copy link
Member

Here is an example:

import numpy as np
import urllib
import xarray as xr
from s2spy import preprocess
import matplotlib.pyplot as plt

# URL of the dataset from zenodo
sst_url = "https://zenodo.org/record/8186914/files/sst_daily_1959-2021_5deg_Pacific_175_240E_25_50N.nc"
sst_field = "sst_daily_1959-2021_5deg_Pacific_175_240E_25_50N.nc"
urllib.request.urlretrieve(sst_url, sst_field)

data = xr.open_dataset(sst_field)

# manually turn all values to np.nan for one time step.
data_subset['sst'].loc[dict(time="2020-12-20")] = np.ones(data_subset['sst'].isel(time=100).values.shape) * np.nan

preprocessor = preprocess.Preprocessor(
    rolling_window_size=25,
    timescale="daily",
    detrend="linear",
    subtract_climatology=True,
)

preprocessor.fit(data_subset)

preprocessor.trend["slope"].sst.plot()

The values I got are not all nan:

Slope:

image

Intercept:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants