Skip to content

Commit

Permalink
Added more information to the tutorials.
Browse files Browse the repository at this point in the history
  • Loading branch information
MMenchero committed Feb 1, 2024
1 parent 686057e commit d34708d
Show file tree
Hide file tree
Showing 4 changed files with 62 additions and 11 deletions.
Binary file modified man/figures/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 20 additions & 5 deletions vignettes/anomaly-detection.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,23 +27,38 @@ library(nixtlar)
```

## 1. Anomaly detection
text
Anomaly detection plays a crucial role in time series analysis and forecasting. Anomalies, also known as outliers, are unusual observations that don't follow the expected time series patterns. They can be caused by a variety of factors, including errors in the data collection process, unexpected events, or sudden changes in the patterns of the time series. Anomalies can provide critical information about a system, like a potential problem or malfunction. After identifying them, it is important to understand what caused them, and then decide whether to remove, replace, or keep them.

TimeGPT has a method for detecting anomalies, and users can call it from `nixtlar`. This vignette will explain how to do this. It assumes you have already set up your TimeGPT token. If you haven't done this, please read the [Get Started](https://nixtla.github.io/nixtlar/articles/anomaly-detection.html) vignette first.

## 2. Load data
For this vignette, we'll use the electricity consumption dataset that is included in `nixtlar`, which contains the hourly prices of five different electricity markets.

```{r}
df <- nixtlar::electricity
head(df)
```

## 3. Detect anomalies
To detect anomalies, use `nixtlar::timegpt_anomaly_detection`. The key parameters of this method are:

- **df**: The dataframe or tsibble with the time series data. It should include at least a column with the datestamps and a column with the observations. Default names for these columns are `ds` and `y`. If different, please specify their names.
- **id_col**: If the data contains multiple ids, as in this case, please specify the column that contains them. If working with a single series, leave it as it is by default (NULL).

```{r}
timegpt_anomalies <- timegpt_anomaly_detection(df, id_col = "unique_id")
timegpt_anomalies <- nixtlar::timegpt_anomaly_detection(df, id_col = "unique_id")
head(timegpt_anomalies)
```

## 2. Plot TimeGPT forecast
`nixtlar` includes a function to plot the historical data and any output from `timegpt_forecast`, `timegpt_historic`, `timegpt_anomaly_detection` and `timegpt_cross_validation`. If you have long series, you can use `max_insample_length` to only plot the last N historical values (the forecast will always be plotted in full).
The `anomaly_detection` method from TimeGPT evaluates each observation and uses a prediction interval to determine if it is an anomaly or not. By default, `nixtlar::timegpt_anomaly_detection` uses a 99% prediction interval. Observations that fall outside this interval will be considered anomalies and will have a value of 1 in the `anomaly` column (zero otherwise). To change the prediction interval, for example to 95%, use the argument `level=c(95)`. Keep in mind that multiple levels are not allowed, so when given several values, `nixtlar::timegpt_anomaly_detection` will use the maximum.

## 4. Plot anomalies
`nixtlar` includes a function to plot the historical data and any output from `nixtlar::timegpt_forecast`, `nixtlar::timegpt_historic`, `nixtlar::timegpt_anomaly_detection` and `nixtlar::timegpt_cross_validation`. If you have long series, you can use `max_insample_length` to only plot the last N historical values (the forecast will always be plotted in full).

When using `nixtlar::timegpt_plot` with the output of `nixtlar::timegpt_anomaly_detection`, set `plot_anomalies=TRUE` to plot the anomalies.

```{r}
timegpt_plot(df, timegpt_anomalies, id_col = "unique_id", plot_anomalies = TRUE)
nixtlar::timegpt_plot(df, timegpt_anomalies, id_col = "unique_id", plot_anomalies = TRUE)
```

```{r, include=FALSE}
Expand Down
26 changes: 21 additions & 5 deletions vignettes/cross-validation.Rmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "Cross Validation"
title: "Cross-Validation"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Cross Validation}
Expand All @@ -26,21 +26,37 @@ knitr::opts_chunk$set(
library(nixtlar)
```

## 1. Cross Validation
text
## 1. Time series cross-validation
Cross-validation is a method for evaluating the performance of a forecasting model. Given a time series, it is carried out by defining a sliding window across the historical data and then predicting the period following it. The accuracy of the model is computed by averaging the accuracy across all the cross-validation windows. This method results in a better estimation of the model’s predictive abilities, since it considers multiple periods instead of just one, while respecting the sequential nature of the data.

TimeGPT has a method for performing time series cross-validation, and users can call it from `nixtlar`. This vignette will explain how to do this. It assumes you have already set up your TimeGPT token. If you haven't done this, please read the [Get Started](https://nixtla.github.io/nixtlar/articles/anomaly-detection.html) vignette first.

## 2. Load data
For this vignette, we'll use the electricity consumption dataset that is included in `nixtlar`, which contains the hourly prices of five different electricity markets.

```{r}
df <- nixtlar::electricity
head(df)
```

## 3. Perform time series cross-validation
To perform time series cross-validation using TimeGPT, use `nixtlar::timegpt_cross_validation`. The key parameters of this method are:

- **df**: The dataframe or tsibble with the time series data. It should include at least a column with the datestamps and a column with the observations. Default names for these columns are `ds` and `y`. If different, please specify their names.
- **h**: The forecast horizon.
- **id_col**: If the data contains multiple ids, as in this case, please specify the column that contains them. If working with a single series, leave it as it is by default (NULL).
- **n_windows**: The number of windows to evaluate. Default value is 1.
- **step_size**: The gap between each cross-validation window. Default value is NULL.

```{r}
timegpt_cv <- timegpt_cross_validation(df, h = 8, id_col = "unique_id", n_windows = 5)
head(timegpt_cv)
```

## 2. Plot TimeGPT forecast
`nixtlar` includes a function to plot the historical data and any output from `timegpt_forecast`, `timegpt_historic`, `timegpt_anomaly_detection` and `timegpt_cross_validation`. If you have long series, you can use `max_insample_length` to only plot the last N historical values (the forecast will always be plotted in full).
## 4. Plot cross-validation results
`nixtlar` includes a function to plot the historical data and any output from `nixtlar::timegpt_forecast`, `nixtlar::timegpt_historic`, `nixtlar::timegpt_anomaly_detection` and `nixtlar::timegpt_cross_validation`. If you have long series, you can use `max_insample_length` to only plot the last N historical values (the forecast will always be plotted in full).

When using `nixtlar::timegpt_plot` with the output of `nixtlar::timegpt_cross_validation`, each cross-validation window is visually represented with vertical dashed lines. For any given pair of these lines, the data before the first line forms the training set. This set is then used to forecast the data between the two lines.

```{r}
timegpt_plot(df, timegpt_cv, id_col = "unique_id", max_insample_length = 200)
Expand Down
22 changes: 21 additions & 1 deletion vignettes/historical-forecast.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,18 +27,38 @@ library(nixtlar)
```

## 1. TimeGPT Historical Forecast
When generating a forecast, sometimes you might be interested in forecasting the historical observations. These predictions, known as **fitted values**, can help you better understand and evaluate a model's performance over time.

TimeGPT has a method for generating fitted values, and users can call it from `nixtlar`. This vignette will explain how to do this. It assumes you have already set up your TimeGPT token. If you haven't done this, please read the [Get Started](https://nixtla.github.io/nixtlar/articles/anomaly-detection.html) vignette first.

## 2. Load data
For this vignette, we'll use the electricity consumption dataset that is included in `nixtlar`, which contains the hourly prices of five different electricity markets.

```{r}
df <- nixtlar::electricity
head(df)
```

## 3. Forecast historical data
To generate a forecast for the historical data, use `nixtlar::timegpt_historic`. The key parameters of this method are:

- **df**: The dataframe or tsibble with the time series data. It should include at least a column with the datestamps and a column with the observations. Default names for these columns are `ds` and `y`. If different, please specify their names.
- **id_col**: If the data contains multiple ids, as in this case, please specify the column that contains them. If working with a single series, leave it as it is by default (NULL).
- **level**: The prediction intervals for the forecast. Defaults are 80 and 95%.

```{r}
timegpt_fitted_values <- timegpt_historic(df, id_col = "unique_id", level = c(80,95))
head(timegpt_fitted_values)
```

## 2. Plot TimeGPT forecast
Notice that there are no fitted values for some of the initial observations. This is because TimeGPT requires a minimum number of values to generate reliable forecasts.

All the fitted values are generated using a rolling window, meaning that the fitted value for observation $T$ was generated using the first $T-1$ observations.

### 3.1 Fitted values from `nixtlar::timegpt_forecast`
`nixtlar::timegpt_historic` is the dedicated function that calls TimeGPT's method for generating fitted values. However, you can also use `nixtlar::timegpt_forecast` with `add_history=TRUE`. This will generate both a forecast for the historical data and for the next $h$ future observations.

## 4. Plot historical forecast
`nixtlar` includes a function to plot the historical data and any output from `timegpt_forecast`, `timegpt_historic`, `timegpt_anomaly_detection` and `timegpt_cross_validation`. If you have long series, you can use `max_insample_length` to only plot the last N historical values (the forecast will always be plotted in full).

```{r}
Expand Down

0 comments on commit d34708d

Please sign in to comment.