[MLForecast, Lags] Global model. Predicting with lags, individual time series, multiple time series #455

mkrech · 2024-11-24T18:57:01Z

What happened + What you expected to happen

Thank you for your fantastic library.

I have a question or bug report about lags and a global model using MLForecast.

My expectation is that in a global model, the results should be identical whether predicting an individual time series or multiple time series, as long as all features are provided correctly and consistently.

However, when I use the lags feature, the forecast for an individual time series (e.g., one time series) differs from predicting the same time series together with multiple other time series.

Versions / Dependencies

mlforecast 0.15.0

Reproduction script

import pandas as pd
import numpy as np
from mlforecast import MLForecast
from sklearn.ensemble import RandomForestRegressor

print("#  Create data")
def create_data():
    np.random.seed(42)
    data = []
    for ts_id in range(3):  # Drei Zeitreihen
        for t in range(30):
            data.append({
                'unique_id': f'ts_{ts_id}',
                'ds': t,
                'y': np.sin(t / 5) + ts_id + np.random.normal(0, 0.1)
            })
    return pd.DataFrame(data)

df = create_data()

print("# single time series")
df_single = df[df['unique_id'] == 'ts_0']

print("# lag features")
lags = [1, 2, 3]

print("# mlforecast model with RandomForestRegressor")
forecast = MLForecast(
    models=RandomForestRegressor(),
    freq=1,  
    lags=lags
)

print("# Prediction single time series")
forecast.fit(df_single, id_col='unique_id', time_col='ds', target_col='y')
single_forecast = forecast.predict(5)  # 5 Zeitschritte vorhersagen

print("# prediction multiple time series")
forecast.fit(df, id_col='unique_id', time_col='ds', target_col='y')
multi_forecast = forecast.predict(5)

print("# filter single time series from multiple time series")
multi_forecast_single = multi_forecast[multi_forecast['unique_id'] == 'ts_0']

print("# results comparison")
print("Single Time Series Forecast:")
print(single_forecast)

print("\nSame Time Series Forecast (with others):")
print(multi_forecast_single)
single_forecast

print("#  difference between forecasts")
difference = single_forecast.set_index('ds')['RandomForestRegressor'] - multi_forecast_single.set_index('ds')['RandomForestRegressor']
print("\nDifference between forecasts:")
print(difference)

Issue Severity

Medium: It is a significant difficulty but I can work around it.

jmoralez · 2024-11-25T15:21:05Z

Hey. You're not "predicting" one or many series, you're training a model with different datasets, so it's expected to produce different results. If you're interested in training one model per serie you can use statsforecast.

mkrech · 2024-11-25T18:38:20Z

Sorry, you are absolutely right. I can no longer reproduce the issue this way either.
Thank you very much for your support.

mkrech added the bug label Nov 24, 2024

jmoralez added the awaiting response label Nov 25, 2024

mkrech closed this as completed Nov 25, 2024

github-actions bot removed the awaiting response label Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLForecast, Lags] Global model. Predicting with lags, individual time series, multiple time series #455

[MLForecast, Lags] Global model. Predicting with lags, individual time series, multiple time series #455

mkrech commented Nov 24, 2024 •

edited by jmoralez

Loading

jmoralez commented Nov 25, 2024

mkrech commented Nov 25, 2024

[MLForecast, Lags] Global model. Predicting with lags, individual time series, multiple time series #455

[MLForecast, Lags] Global model. Predicting with lags, individual time series, multiple time series #455

Comments

mkrech commented Nov 24, 2024 • edited by jmoralez Loading

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

jmoralez commented Nov 25, 2024

mkrech commented Nov 25, 2024

mkrech commented Nov 24, 2024 •

edited by jmoralez

Loading