Added more information to the tutorials.

Nixtla · Feb 1, 2024 · d34708d · d34708d
1 parent 686057e
commit d34708d
Show file tree

Hide file tree

Showing 4 changed files with 62 additions and 11 deletions.
diff --git a/man/figures/logo.png b/man/figures/logo.png
diff --git a/vignettes/anomaly-detection.Rmd b/vignettes/anomaly-detection.Rmd
@@ -27,23 +27,38 @@ library(nixtlar)
 ```
 
 ## 1. Anomaly detection
-text 
+Anomaly detection plays a crucial role in time series analysis and forecasting. Anomalies, also known as outliers, are unusual observations that don't follow the expected time series patterns. They can be caused by a variety of factors, including errors in the data collection process, unexpected events, or sudden changes in the patterns of the time series. Anomalies can provide critical information about a system, like a potential problem or malfunction. After identifying them, it is important to understand what caused them, and then decide whether to remove, replace, or keep them.
+
+TimeGPT has a method for detecting anomalies, and users can call it from `nixtlar`. This vignette will explain how to do this. It assumes you have already set up your TimeGPT token. If you haven't done this, please read the [Get Started](https://nixtla.github.io/nixtlar/articles/anomaly-detection.html) vignette first. 
+
+## 2. Load data 
+For this vignette, we'll use the electricity consumption dataset that is included in `nixtlar`, which contains the hourly prices of five different electricity markets. 
 
 ```{r}
 df <- nixtlar::electricity
 head(df)
 ```
 
+## 3. Detect anomalies 
+To detect anomalies, use `nixtlar::timegpt_anomaly_detection`. The key parameters of this method are: 
+
+- **df**: The dataframe or tsibble with the time series data. It should include at least a column with the datestamps and a column with the observations. Default names for these columns are `ds` and `y`. If different, please specify their names. 
+- **id_col**: If the data contains multiple ids, as in this case, please specify the column that contains them. If working with a single series, leave it as it is by default (NULL).    
+
 ```{r}
-timegpt_anomalies <- timegpt_anomaly_detection(df, id_col = "unique_id") 
+timegpt_anomalies <- nixtlar::timegpt_anomaly_detection(df, id_col = "unique_id") 
 head(timegpt_anomalies)
 ```
 
-## 2. Plot TimeGPT forecast 
-`nixtlar` includes a function to plot the historical data and any output from `timegpt_forecast`, `timegpt_historic`, `timegpt_anomaly_detection` and `timegpt_cross_validation`. If you have long series, you can use `max_insample_length` to only plot the last N historical values (the forecast will always be plotted in full). 
+The `anomaly_detection` method from TimeGPT evaluates each observation and uses a prediction interval to determine if it is an anomaly or not. By default, `nixtlar::timegpt_anomaly_detection` uses a 99% prediction interval. Observations that fall outside this interval will be considered anomalies and will have a value of 1 in the `anomaly` column (zero otherwise). To change the prediction interval, for example to 95%, use the argument `level=c(95)`. Keep in mind that multiple levels are not allowed, so when given several values, `nixtlar::timegpt_anomaly_detection` will use the maximum. 
+
+## 4. Plot anomalies 
+`nixtlar` includes a function to plot the historical data and any output from `nixtlar::timegpt_forecast`, `nixtlar::timegpt_historic`, `nixtlar::timegpt_anomaly_detection` and `nixtlar::timegpt_cross_validation`. If you have long series, you can use `max_insample_length` to only plot the last N historical values (the forecast will always be plotted in full). 
+
+When using `nixtlar::timegpt_plot` with the output of `nixtlar::timegpt_anomaly_detection`, set `plot_anomalies=TRUE` to plot the anomalies. 
 
 ```{r}
-timegpt_plot(df, timegpt_anomalies, id_col = "unique_id", plot_anomalies = TRUE)
+nixtlar::timegpt_plot(df, timegpt_anomalies, id_col = "unique_id", plot_anomalies = TRUE)
 ```
 
 ```{r, include=FALSE}

diff --git a/vignettes/cross-validation.Rmd b/vignettes/cross-validation.Rmd
@@ -1,5 +1,5 @@
 ---
-title: "Cross Validation"
+title: "Cross-Validation"
 output: rmarkdown::html_vignette
 vignette: >
   %\VignetteIndexEntry{Cross Validation}
@@ -26,21 +26,37 @@ knitr::opts_chunk$set(
 library(nixtlar)
 ```
 
-## 1. Cross Validation 
-text 
+## 1. Time series cross-validation 
+Cross-validation is a method for evaluating the performance of a forecasting model. Given a time series, it is carried out by defining a sliding window across the historical data and then predicting the period following it. The accuracy of the model is computed by averaging the accuracy across all the cross-validation windows. This method results in a better estimation of the model’s predictive abilities, since it considers multiple periods instead of just one, while respecting the sequential nature of the data.
+
+TimeGPT has a method for performing time series cross-validation, and users can call it from `nixtlar`. This vignette will explain how to do this. It assumes you have already set up your TimeGPT token. If you haven't done this, please read the [Get Started](https://nixtla.github.io/nixtlar/articles/anomaly-detection.html) vignette first.  
+
+## 2. Load data 
+For this vignette, we'll use the electricity consumption dataset that is included in `nixtlar`, which contains the hourly prices of five different electricity markets. 
 
 ```{r}
 df <- nixtlar::electricity
 head(df)
 ```
 
+## 3. Perform time series cross-validation
+To perform time series cross-validation using TimeGPT, use `nixtlar::timegpt_cross_validation`. The key parameters of this method are: 
+
+- **df**: The dataframe or tsibble with the time series data. It should include at least a column with the datestamps and a column with the observations. Default names for these columns are `ds` and `y`. If different, please specify their names. 
+- **h**: The forecast horizon. 
+- **id_col**: If the data contains multiple ids, as in this case, please specify the column that contains them. If working with a single series, leave it as it is by default (NULL).    
+- **n_windows**: The number of windows to evaluate. Default value is 1. 
+- **step_size**: The gap between each cross-validation window. Default value is NULL. 
+
 ```{r}
 timegpt_cv <- timegpt_cross_validation(df, h = 8, id_col = "unique_id", n_windows = 5)
 head(timegpt_cv)
 ```
 
-## 2. Plot TimeGPT forecast 
-`nixtlar` includes a function to plot the historical data and any output from `timegpt_forecast`, `timegpt_historic`, `timegpt_anomaly_detection` and `timegpt_cross_validation`. If you have long series, you can use `max_insample_length` to only plot the last N historical values (the forecast will always be plotted in full). 
+## 4. Plot cross-validation results 
+`nixtlar` includes a function to plot the historical data and any output from `nixtlar::timegpt_forecast`, `nixtlar::timegpt_historic`, `nixtlar::timegpt_anomaly_detection` and `nixtlar::timegpt_cross_validation`. If you have long series, you can use `max_insample_length` to only plot the last N historical values (the forecast will always be plotted in full). 
+
+When using `nixtlar::timegpt_plot` with the output of `nixtlar::timegpt_cross_validation`, each cross-validation window is visually represented with vertical dashed lines. For any given pair of these lines, the data before the first line forms the training set. This set is then used to forecast the data between the two lines. 
 
 ```{r}
 timegpt_plot(df, timegpt_cv, id_col = "unique_id", max_insample_length = 200)

diff --git a/vignettes/historical-forecast.Rmd b/vignettes/historical-forecast.Rmd
@@ -27,18 +27,38 @@ library(nixtlar)
 ```
 
 ## 1. TimeGPT Historical Forecast
+When generating a forecast, sometimes you might be interested in forecasting the historical observations. These predictions, known as **fitted values**, can help you better understand and evaluate a model's performance over time. 
+
+TimeGPT has a method for generating fitted values, and users can call it from `nixtlar`. This vignette will explain how to do this. It assumes you have already set up your TimeGPT token. If you haven't done this, please read the [Get Started](https://nixtla.github.io/nixtlar/articles/anomaly-detection.html) vignette first.  
+
+## 2. Load data 
+For this vignette, we'll use the electricity consumption dataset that is included in `nixtlar`, which contains the hourly prices of five different electricity markets. 
 
 ```{r}
 df <- nixtlar::electricity
 head(df)
 ```
 
+## 3. Forecast historical data 
+To generate a forecast for the historical data, use `nixtlar::timegpt_historic`. The key parameters of this method are: 
+
+- **df**: The dataframe or tsibble with the time series data. It should include at least a column with the datestamps and a column with the observations. Default names for these columns are `ds` and `y`. If different, please specify their names. 
+- **id_col**: If the data contains multiple ids, as in this case, please specify the column that contains them. If working with a single series, leave it as it is by default (NULL). 
+- **level**: The prediction intervals for the forecast. Defaults are 80 and 95%. 
+
 ```{r}
 timegpt_fitted_values <- timegpt_historic(df, id_col = "unique_id", level = c(80,95))
 head(timegpt_fitted_values)
 ```
 
-## 2. Plot TimeGPT forecast 
+Notice that there are no fitted values for some of the initial observations. This is because TimeGPT requires a minimum number of values to generate reliable forecasts. 
+
+All the fitted values are generated using a rolling window, meaning that the fitted value for observation $T$ was generated using the first $T-1$ observations. 
+
+### 3.1 Fitted values from `nixtlar::timegpt_forecast`
+`nixtlar::timegpt_historic` is the dedicated function that calls TimeGPT's method for generating fitted values. However, you can also use `nixtlar::timegpt_forecast` with `add_history=TRUE`. This will generate both a forecast for the historical data and for the next $h$ future observations. 
+
+## 4. Plot historical forecast 
 `nixtlar` includes a function to plot the historical data and any output from `timegpt_forecast`, `timegpt_historic`, `timegpt_anomaly_detection` and `timegpt_cross_validation`. If you have long series, you can use `max_insample_length` to only plot the last N historical values (the forecast will always be plotted in full). 
 
 ```{r}