Skip to content

Commit

Permalink
migrate to coreforecast (#311)
Browse files Browse the repository at this point in the history
  • Loading branch information
jmoralez authored Mar 4, 2024
1 parent 02a2a85 commit eba4921
Show file tree
Hide file tree
Showing 39 changed files with 2,465 additions and 2,118 deletions.
22 changes: 19 additions & 3 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,22 @@ jobs:
- name: Run forecast notebook
run: nbdev_test --path nbs/forecast.ipynb

efficiency-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- uses: actions/setup-python@v4
with:
python-version: '3.10'
cache: 'pip'

- name: Install dependencies
run: pip install . pytest pytest-benchmark

- name: Run efficiency tests
run: pytest tests/test_pipeline.py --benchmark-group-by=func --benchmark-sort=fullname

performance-tests:
runs-on: ubuntu-latest
steps:
Expand All @@ -119,7 +135,7 @@ jobs:
cache: 'pip'

- name: Install dependencies
run: pip install ".[lag_transforms]" pytest pytest-benchmark
run: pip install . datasetsforecast lightgbm pytest

- name: Run performance tests
run: pytest --benchmark-group-by=func --benchmark-sort=fullname
- name: Run m4 performance tests
run: pytest tests/test_m4.py
109 changes: 42 additions & 67 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# mlforecast  
# mlforecast
[![Tweet](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/intent/tweet?text=Statistical%20Forecasting%20Algorithms%20by%20Nixtla%20&url=https://github.com/Nixtla/statsforecast&via=nixtlainc&hashtags=StatisticalModels,TimeSeries,Forecasting)
 [![Slack](https://img.shields.io/badge/Slack-4A154B?&logo=slack&logoColor=white.png)](https://join.slack.com/t/nixtlacommunity/shared_invite/zt-1pmhan9j5-F54XR20edHk0UtYAPcW4KQ)
[![Slack](https://img.shields.io/badge/Slack-4A154B?&logo=slack&logoColor=white.png)](https://join.slack.com/t/nixtlacommunity/shared_invite/zt-1pmhan9j5-F54XR20edHk0UtYAPcW4KQ)

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

Expand Down Expand Up @@ -43,23 +43,6 @@ page](https://nixtla.github.io/mlforecast/docs/getting-started/install.html).

## Quick Start

**Minimal Example**

``` python
import lightgbm as lgb

from mlforecast import MLForecast
from sklearn.linear_model import LinearRegression

mlf = MLForecast(
models = [LinearRegression(), lgb.LGBMRegressor()],
lags=[1, 12],
freq = 'M'
)
mlf.fit(df)
mlf.predict(12)
```

**Get Started with this [quick
guide](https://nixtla.github.io/mlforecast/docs/getting-started/quick_start_local.html).**

Expand All @@ -78,17 +61,19 @@ for best practices.**

Current Python alternatives for machine learning models are slow,
inaccurate and don’t scale well. So we created a library that can be
used to forecast in production environments. `MLForecast` includes
efficient feature engineering to train any machine learning model (with
`fit` and `predict` methods such as
used to forecast in production environments.
[`MLForecast`](https://Nixtla.github.io/mlforecast/forecast.html#mlforecast)
includes efficient feature engineering to train any machine learning
model (with `fit` and `predict` methods such as
[`sklearn`](https://scikit-learn.org/stable/)) to fit millions of time
series.

## Features

- Fastest implementations of feature engineering for time series
forecasting in Python.
- Out-of-the-box compatibility with Spark, Dask, and Ray.
- Out-of-the-box compatibility with pandas, polars, spark, dask, and
ray.
- Probabilistic Forecasting with Conformal Prediction.
- Support for exogenous variables and static covariates.
- Familiar `sklearn` syntax: `.fit` and `.predict`.
Expand Down Expand Up @@ -162,53 +147,44 @@ series.head()

### Models

Next define your models. If you want to use the local interface this can
be any regressor that follows the scikit-learn API. For distributed
training there are `LGBMForecast` and `XGBForecast`.
Next define your models. These can be any regressor that follows the
scikit-learn API.

``` python
import lightgbm as lgb
import xgboost as xgb
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
```

``` python
models = [
lgb.LGBMRegressor(verbosity=-1),
xgb.XGBRegressor(),
RandomForestRegressor(random_state=0),
lgb.LGBMRegressor(random_state=0, verbosity=-1),
LinearRegression(),
]
```

### Forecast object

Now instantiate a `MLForecast` object with the models and the features
that you want to use. The features can be lags, transformations on the
lags and date features. The lag transformations are defined as
[numba](http://numba.pydata.org/) *jitted* functions that transform an
array, if they have additional arguments you can either supply a tuple
(`transform_func`, `arg1`, `arg2`, …) or define new functions fixing the
arguments. You can also define differences to apply to the series before
fitting that will be restored when predicting.
Now instantiate an
[`MLForecast`](https://Nixtla.github.io/mlforecast/forecast.html#mlforecast)
object with the models and the features that you want to use. The
features can be lags, transformations on the lags and date features. You
can also define transformations to apply to the target before fitting,
which will be restored when predicting.

``` python
from mlforecast import MLForecast
from mlforecast.lag_transforms import ExpandingMean, RollingMean
from mlforecast.target_transforms import Differences
from numba import njit
from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean


@njit
def rolling_mean_28(x):
return rolling_mean(x, window_size=28)

```

``` python
fcst = MLForecast(
models=models,
freq='D',
lags=[7, 14],
lag_transforms={
1: [expanding_mean],
7: [rolling_mean_28]
1: [ExpandingMean()],
7: [RollingMean(window_size=28)]
},
date_features=['dayofweek'],
target_transforms=[Differences([1])],
Expand All @@ -224,7 +200,7 @@ To compute the features and train the models call `fit` on your
fcst.fit(series)
```

MLForecast(models=[LGBMRegressor, XGBRegressor, RandomForestRegressor], freq=<Day>, lag_features=['lag7', 'lag14', 'expanding_mean_lag1', 'rolling_mean_28_lag7'], date_features=['dayofweek'], num_threads=1)
MLForecast(models=[LGBMRegressor, LinearRegression], freq=D, lag_features=['lag7', 'lag14', 'expanding_mean_lag1', 'rolling_mean_lag7_window_size28'], date_features=['dayofweek'], num_threads=1)

### Predicting

Expand All @@ -239,21 +215,21 @@ predictions

<div>

| | unique_id | ds | LGBMRegressor | XGBRegressor | RandomForestRegressor |
|-----|-----------|------------|---------------|--------------|-----------------------|
| 0 | id_00 | 2000-04-04 | 299.923771 | 309.664124 | 298.424164 |
| 1 | id_00 | 2000-04-05 | 365.424147 | 382.150085 | 365.816014 |
| 2 | id_00 | 2000-04-06 | 432.562441 | 453.373779 | 436.360620 |
| 3 | id_00 | 2000-04-07 | 495.628000 | 527.965149 | 503.670100 |
| 4 | id_00 | 2000-04-08 | 60.786223 | 75.762299 | 62.176080 |
| ... | ... | ... | ... | ... | ... |
| 275 | id_19 | 2000-03-23 | 36.266780 | 29.889120 | 34.799780 |
| 276 | id_19 | 2000-03-24 | 44.370984 | 34.968884 | 39.920982 |
| 277 | id_19 | 2000-03-25 | 50.746222 | 39.970238 | 46.196266 |
| 278 | id_19 | 2000-03-26 | 58.906524 | 45.125305 | 51.653060 |
| 279 | id_19 | 2000-03-27 | 63.073949 | 50.682716 | 56.845384 |

<p>280 rows × 5 columns</p>
| | unique_id | ds | LGBMRegressor | LinearRegression |
|-----|-----------|------------|---------------|------------------|
| 0 | id_00 | 2000-04-04 | 299.923771 | 311.432371 |
| 1 | id_00 | 2000-04-05 | 365.424147 | 379.466214 |
| 2 | id_00 | 2000-04-06 | 432.562441 | 460.234028 |
| 3 | id_00 | 2000-04-07 | 495.628000 | 524.278924 |
| 4 | id_00 | 2000-04-08 | 60.786223 | 79.828767 |
| ... | ... | ... | ... | ... |
| 275 | id_19 | 2000-03-23 | 36.266780 | 28.333215 |
| 276 | id_19 | 2000-03-24 | 44.370984 | 33.368228 |
| 277 | id_19 | 2000-03-25 | 50.746222 | 38.613001 |
| 278 | id_19 | 2000-03-26 | 58.906524 | 43.447398 |
| 279 | id_19 | 2000-03-27 | 63.073949 | 48.666783 |

<p>280 rows × 4 columns</p>
</div>

### Visualize results
Expand All @@ -264,7 +240,6 @@ from utilsforecast.plotting import plot_series

``` python
fig = plot_series(series, predictions, max_ids=4, plot_random=False)
fig.savefig('figs/index.png', bbox_inches='tight')
```

![](https://raw.githubusercontent.com/Nixtla/mlforecast/main/nbs/figs/index.png)
Expand Down
2 changes: 1 addition & 1 deletion action_files/lint
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#!/usr/bin/env bash
ruff mlforecast || exit -1
ruff check mlforecast || exit -1
mypy mlforecast || exit -1
6 changes: 3 additions & 3 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: mlforecast
channels:
- conda-forge
dependencies:
- coreforecast>=0.0.4
- coreforecast>=0.0.7
- dask<2023.1.1
- fsspec
- gitpython
Expand All @@ -21,7 +21,7 @@ dependencies:
- shap
- statsmodels
- window-ops
- xgboost
- py-xgboost-cpu
- pip:
- datasetsforecast
- duckdb<0.8
Expand All @@ -31,5 +31,5 @@ dependencies:
- polars
- ray<2.8
- triad==0.9.1
- utilsforecast>=0.0.24
- utilsforecast>=0.0.27
- xgboost_ray
8 changes: 5 additions & 3 deletions local_environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@ name: mlforecast
channels:
- conda-forge
dependencies:
- coreforecast>=0.0.4
- coreforecast>=0.0.7
- fsspec
- holidays<0.21
- lightgbm
- matplotlib
- nbformat
- nomkl
- numba
- pandas
- pip
Expand All @@ -16,9 +18,9 @@ dependencies:
- shap
- statsmodels
- window-ops
- xgboost
- py-xgboost-cpu
- pip:
- datasetsforecast
- nbdev
- polars
- utilsforecast>=0.0.24
- utilsforecast>=0.0.27
Loading

0 comments on commit eba4921

Please sign in to comment.