Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLForecast, Lags] Global model. Predicting with lags, individual time series, multiple time series #455

Closed
mkrech opened this issue Nov 24, 2024 · 2 comments
Labels

Comments

@mkrech
Copy link

mkrech commented Nov 24, 2024

What happened + What you expected to happen

Thank you for your fantastic library.

I have a question or bug report about lags and a global model using MLForecast.

My expectation is that in a global model, the results should be identical whether predicting an individual time series or multiple time series, as long as all features are provided correctly and consistently.

However, when I use the lags feature, the forecast for an individual time series (e.g., one time series) differs from predicting the same time series together with multiple other time series.

Versions / Dependencies

mlforecast 0.15.0

Reproduction script

import pandas as pd
import numpy as np
from mlforecast import MLForecast
from sklearn.ensemble import RandomForestRegressor

print("#  Create data")
def create_data():
    np.random.seed(42)
    data = []
    for ts_id in range(3):  # Drei Zeitreihen
        for t in range(30):
            data.append({
                'unique_id': f'ts_{ts_id}',
                'ds': t,
                'y': np.sin(t / 5) + ts_id + np.random.normal(0, 0.1)
            })
    return pd.DataFrame(data)

df = create_data()

print("# single time series")
df_single = df[df['unique_id'] == 'ts_0']

print("# lag features")
lags = [1, 2, 3]

print("# mlforecast model with RandomForestRegressor")
forecast = MLForecast(
    models=RandomForestRegressor(),
    freq=1,  
    lags=lags
)

print("# Prediction single time series")
forecast.fit(df_single, id_col='unique_id', time_col='ds', target_col='y')
single_forecast = forecast.predict(5)  # 5 Zeitschritte vorhersagen

print("# prediction multiple time series")
forecast.fit(df, id_col='unique_id', time_col='ds', target_col='y')
multi_forecast = forecast.predict(5)

print("# filter single time series from multiple time series")
multi_forecast_single = multi_forecast[multi_forecast['unique_id'] == 'ts_0']

print("# results comparison")
print("Single Time Series Forecast:")
print(single_forecast)

print("\nSame Time Series Forecast (with others):")
print(multi_forecast_single)
single_forecast

print("#  difference between forecasts")
difference = single_forecast.set_index('ds')['RandomForestRegressor'] - multi_forecast_single.set_index('ds')['RandomForestRegressor']
print("\nDifference between forecasts:")
print(difference)

Issue Severity

Medium: It is a significant difficulty but I can work around it.

@mkrech mkrech added the bug label Nov 24, 2024
@jmoralez
Copy link
Member

Hey. You're not "predicting" one or many series, you're training a model with different datasets, so it's expected to produce different results. If you're interested in training one model per serie you can use statsforecast.

@mkrech
Copy link
Author

mkrech commented Nov 25, 2024

Sorry, you are absolutely right. I can no longer reproduce the issue this way either.
Thank you very much for your support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants