MLForecast and negative boosted tree predictions #457

koaning · 2024-11-25T10:10:14Z

What happened + What you expected to happen

During the probabl livestream last week (YT link here, notebook here), I may have stumbled on a bug. Figured that I should report it.

The short story is that while the input dataset has no negative values, some of the predicted values are negative. For a linear model this could make sense, but for a boosted tree model it does not. Tree models, after all, can only interpolate the training data. It is something that became a talking point during this segment of the livestream.

Possible cause

After diving a bit deeper I may have found a good lead on the cause too. My dataset has hourly data but there are a few timeslots missing. I am predicting number of people that leave a subway station and these stations can be closed during a few hours in the day. These rows do not show up in my original dataset. When I was using mlforecast this didn't give me any warnings but when I gave the dataset to TimeGPT I was prompted to use fill_gaps to make sure that there are no missing rows.

When I apply fill_gaps to my data before passing it to MLForecast the results do not show negative numbers for the boosted tree model anymore. This suggests to me that it might be good to throw a similar warning message here? I am not completely aware of the Nixtla internals, so I might be missing an important detail here, but since silent warnings can be painful I figured I should at least write up this report here.

Versions / Dependencies

mlforecast version 0.15.0

Reproduction script

I added a notebook link in the above description, as well as a YT link that shows the error. While reproduction could be useful, my current impression is that the main issue here is the fact that an error message is missing.

I figured setting a medium issue on this one. Silent failures can make the whole stack crumble but I have technically found a work-around.

Issue Severity

Medium: It is a significant difficulty but I can work around it.

The text was updated successfully, but these errors were encountered:

jmoralez · 2024-11-25T16:53:25Z

Hey @koaning, thanks for raising this. Is there a place where I can download the data?

koaning · 2024-11-26T11:53:07Z

The notebook links to this repository. It was originally found on Kaggle.

jmoralez · 2024-11-26T17:58:10Z

Thanks, sorry I missed that. I re-read the issue and the statement about boosting not being able to produce predictions out of the original target isn't true, it's true for regular decision trees and random forests, but boosting is an additive algorithm, so it can definitely produce values outside the original range. Here's an example:

import numpy as np
from sklearn.ensemble import HistGradientBoostingRegressor

rng = np.random.default_rng(seed=0)
X = rng.random((10_000, 4))
y = rng.choice([0, 1, 2], size=10_000, replace=True, p=[0.8, 0.1, 0.1])
model = HistGradientBoostingRegressor().fit(X, y)
preds = model.predict(X)
assert y.min() == 0
assert preds.min() < 0

koaning · 2024-11-26T21:26:42Z

d0h! @jmoralez yeah, you're right. Thanks for the example!

It might still be a good idea to warn folks about the fill_gaps utility. But I will leave it up to you to make a new issue for that or to rename this one.

jmoralez · 2024-11-26T21:30:23Z

I'll open a new issue for that. Thanks!

koaning added the bug label Nov 25, 2024

jmoralez added the awaiting response label Nov 26, 2024

github-actions bot removed the awaiting response label Nov 26, 2024

jmoralez closed this as completed Nov 26, 2024

jmoralez mentioned this issue Nov 27, 2024

[core] warn about gaps in series #458

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLForecast and negative boosted tree predictions #457

MLForecast and negative boosted tree predictions #457

koaning commented Nov 25, 2024 •

edited

Loading

jmoralez commented Nov 25, 2024

koaning commented Nov 26, 2024

jmoralez commented Nov 26, 2024

koaning commented Nov 26, 2024

jmoralez commented Nov 26, 2024

MLForecast and negative boosted tree predictions #457

MLForecast and negative boosted tree predictions #457

Comments

koaning commented Nov 25, 2024 • edited Loading

What happened + What you expected to happen

Possible cause

Versions / Dependencies

Reproduction script

Issue Severity

jmoralez commented Nov 25, 2024

koaning commented Nov 26, 2024

jmoralez commented Nov 26, 2024

koaning commented Nov 26, 2024

jmoralez commented Nov 26, 2024

koaning commented Nov 25, 2024 •

edited

Loading