content changes

Nixtla · Nov 24, 2024 · f246d93 · f246d93
1 parent be9280b
commit f246d93
Showing 1 changed file with 53 additions and 69 deletions.
diff --git a/nbs/docs/getting-started/7_why_timegpt.ipynb b/nbs/docs/getting-started/7_why_timegpt.ipynb
@@ -67,7 +67,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In this notebook, we compare the performance of TimeGPT against three forecasting models: the classical model (ARIMA), the machine learning model (LGBRegressor), and the deep learning model (N-HiTS), using a subset of data from the M5 Forecasting competition. We want to highlight three top-rated benefits our users love about TimeGPT:\n",
+    "In this notebook, we compare the performance of TimeGPT against three forecasting models: the classical model (ARIMA), the machine learning model (LightGBM), and the deep learning model (N-HiTS), using a subset of data from the M5 Forecasting competition. We want to highlight three top-rated benefits our users love about TimeGPT:\n",
     "\n",
     "🎯 **Accuracy**: TimeGPT consistently outperforms traditional models by capturing complex patterns with precision.\n",
     "\n",
@@ -346,7 +346,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 2. Model Fitting (TimeGPT, ARIMA, LGBRegressor, N-HiTS)"
+    "## 2. Model Fitting (TimeGPT, ARIMA, LightGBM, N-HiTS)"
    ]
   },
   {
@@ -393,8 +393,8 @@
      "data": {
       "text/plain": [
        "metric\n",
-       "rmse     592.586609\n",
-       "smape      0.049402\n",
+       "rmse     592.609313\n",
+       "smape      0.049404\n",
        "Name: TimeGPT, dtype: float64"
       ]
      },
@@ -416,7 +416,7 @@
    "metadata": {},
    "source": [
     "### 2.2 Classical Models (ARIMA):\n",
-    "Secondly, we applied ARIMA, a classical statistical model, to the same forecasting task. Here, ARIMA struggled to capture the data's intricate, non-linear patterns, resulting in comparatively lower accuracy."
+    "Next, we applied ARIMA, a traditional statistical model, to the same forecasting task. Classical models use historical trends and seasonality to make predictions by relying on linear assumptions. However, they struggled to capture the complex, non-linear patterns within the data, leading to lower accuracy compared to other approaches. Additionally, ARIMA was slower due to its iterative parameter estimation process, which becomes computationally intensive for larger datasets."
    ]
   },
   {
@@ -496,9 +496,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 2.3 Machine Learning Models (LGBMRegressor)\n",
+    "### 2.3 Machine Learning Models (LightGBM)\n",
     "\n",
-    "Thirdly, we used machine learning model (LGBRegressor) for the same task. While LGBRegressor can capture seasonality and patterns, it requires detailed feature engineering, careful tuning, and domain knowledge to optimize performance."
+    "Thirdly, we used a machine learning model, LightGBM, for the same forecasting task, implemented through the automated pipeline provided by our mlforecast library.\n",
+    "While LightGBM can capture seasonality and patterns, achieving the best performance often requires detailed feature engineering, careful hyperparameter tuning, and domain knowledge. You can try our mlforecast library to simplify this process and get started quickly!"
    ]
   },
   {
@@ -521,37 +522,27 @@
    "outputs": [],
    "source": [
     "import optuna\n",
-    "from mlforecast.auto import AutoMLForecast, AutoLightGBM"
+    "from mlforecast.auto import AutoMLForecast, AutoLightGBM\n",
+    "\n",
+    "# Suppress Optuna's logging output\n",
+    "optuna.logging.set_verbosity(optuna.logging.ERROR)"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "metric\n",
-       "rmse     687.773744\n",
-       "smape      0.051448\n",
-       "Name: AutoLightGBM, dtype: float64"
-      ]
-     },
-     "execution_count": null,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
-    "optuna.logging.set_verbosity(optuna.logging.ERROR)\n",
-    "\n",
+    "# Initialize an automated forecasting pipeline using AutoMLForecast.\n",
     "mlf = AutoMLForecast(\n",
     "    models=[AutoLightGBM()],\n",
     "    freq='D',\n",
-    "    season_length=7,\n",
+    "    season_length=7,            \n",
     "    fit_config=lambda trial: {'static_features': ['unique_id']}\n",
     ")\n",
+    "\n",
+    "# Fit the model to the training dataset.\n",
     "mlf.fit(\n",
     "    df=df_train.astype({'unique_id': 'category'}),\n",
     "    n_windows=1,\n",
@@ -650,7 +641,7 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "8605f576022d436fa3fe0205ddb28c62",
+       "model_id": "0c26ef6fd57a4ea5abd154adb2f31030",
        "version_major": 2,
        "version_minor": 0
       },
@@ -664,7 +655,7 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "0d5c9a7a9d434d009d663962006251c4",
+       "model_id": "f35365801c1448c592757ae376217f50",
        "version_major": 2,
        "version_minor": 0
       },
@@ -678,7 +669,7 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "5d7a2e04d6c64bbea8c4d24e76aa1315",
+       "model_id": "bab6fb21d0b642a98c42dcc8310a5a42",
        "version_major": 2,
        "version_minor": 0
       },
@@ -702,7 +693,7 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "594b4e4727054e569fdee9d0cca86acd",
+       "model_id": "42edfccc1470410da181e96bcef803c6",
        "version_major": 2,
        "version_minor": 0
       },
@@ -723,24 +714,17 @@
     }
    ],
    "source": [
-    "#| echo: true\n",
-    "#| eval: false\n",
+    "# Initialize the N-HiTS model.\n",
     "models = [NHITS(h=28, \n",
     "                input_size=28, \n",
     "                max_steps=100)]\n",
     "\n",
+    "# Fit the model using training data\n",
     "nf = NeuralForecast(models=models, freq='D')\n",
     "nf.fit(df=df_train)\n",
     "fcst_nhits = nf.predict()"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Since this machine doesn’t have GPU, the result is trained using Google Colabs."
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -784,14 +768,14 @@
     "\n",
     "| **Model**        | **RMSE** | **SMAPE** |\n",
     "|------------------|----------|-----------|\n",
-    "| ARIMA            | 1167.5   | 8.30%     |\n",
-    "| LGBRegressor     | 816.7    | 8.06%     |\n",
-    "| N-HiTS           | 748.6    | 6.06%     |\n",
-    "| **TimeGPT**      | **370.9**| **3.98%** |\n",
-    "\n",
+    "| ARIMA            | 724.9   | 5.50%     |\n",
+    "| LightGBM     | 687.8    | 5.14%     |\n",
+    "| N-HiTS           | 605.0    | 5.34%     |\n",
+    "| **TimeGPT**      | **592.6**| **4.94%** |\n",
+    " \n",
     "\n",
     "#### Breakdown for Each Time-series\n",
-    "Followed below are the metrics for each individual time series groups. Our analysis shows that TimeGPT consistently outperforms the other models, achieving the best results for all but one group."
+    "Followed below are the metrics for each individual time series groups. TimeGPT consistently delivers accurate forecasts across all time series groups. In many cases, it performs as well as or better than data-specific models, showing its versatility and reliability across different datasets."
    ]
   },
   {
@@ -824,35 +808,35 @@
    ],
    "source": [
     "# | echo: false\n",
-    "# colors = [\n",
-    "#     (\"#A9B9C3\", 0.5),  # Grey-bluish color 1\n",
-    "#     (\"#7A8D9D\", 0.5),  # Grey-bluish color 2\n",
-    "#     (\"#5B6D79\", 0.5),  # Grey-bluish color 3\n",
-    "#     ('#F95D6A', 0.75)   # Green color for the last\n",
-    "# ]\n",
+    "colors = [\n",
+    "    (\"#A9B9C3\", 0.5),  # Grey-bluish color 1\n",
+    "    (\"#7A8D9D\", 0.5),  # Grey-bluish color 2\n",
+    "    (\"#5B6D79\", 0.5),  # Grey-bluish color 3\n",
+    "    ('#F95D6A', 0.75)   # Green color for the last\n",
+    "]\n",
     "\n",
     "\n",
-    "# # Filter evaluation data by metric and set unique_id as index\n",
-    "# rmse_df = evaluation_df[evaluation_df['metric'] == 'rmse'].set_index('unique_id')\n",
-    "# smape_df = evaluation_df[evaluation_df['metric'] == 'smape'].set_index('unique_id')\n",
+    "# Filter evaluation data by metric and set unique_id as index\n",
+    "rmse_df = evaluation_df[evaluation_df['metric'] == 'rmse'].set_index('unique_id')\n",
+    "smape_df = evaluation_df[evaluation_df['metric'] == 'smape'].set_index('unique_id')\n",
     "\n",
-    "# # Plot function with custom colors and opacity\n",
-    "# def plot_metric(ax, df, title, ylabel):\n",
-    "#     x = np.arange(len(df))\n",
-    "#     bar_width = 0.2\n",
-    "#     for i, (col, (color, alpha)) in enumerate(zip(df.columns[1:], colors)):\n",
-    "#         ax.bar(x + i * bar_width, df[col], width=bar_width, label=col, color=color, alpha=alpha)\n",
-    "#     ax.set(title=title, ylabel=ylabel, xticks=x + bar_width * (len(df.columns[1:]) - 1) / 2, xticklabels=df.index)\n",
-    "#     ax.tick_params(axis='x', rotation=45)\n",
-    "#     ax.legend()\n",
+    "# Plot function with custom colors and opacity\n",
+    "def plot_metric(ax, df, title, ylabel):\n",
+    "    x = np.arange(len(df))\n",
+    "    bar_width = 0.2\n",
+    "    for i, (col, (color, alpha)) in enumerate(zip(df.columns[1:], colors)):\n",
+    "        ax.bar(x + i * bar_width, df[col], width=bar_width, label=col, color=color, alpha=alpha)\n",
+    "    ax.set(title=title, ylabel=ylabel, xticks=x + bar_width * (len(df.columns[1:]) - 1) / 2, xticklabels=df.index)\n",
+    "    ax.tick_params(axis='x', rotation=45)\n",
+    "    ax.legend()\n",
     "\n",
-    "# # Generate side-by-side plots for RMSE and SMAPE\n",
-    "# fig, axes = plt.subplots(1, 2, figsize=(14, 6))\n",
-    "# plot_metric(axes[0], rmse_df, \"RMSE Comparison Across Models by Category\", \"RMSE\")\n",
-    "# plot_metric(axes[1], smape_df*100, \"%SMAPE Comparison Across Models by Category\", \"SMAPE\")\n",
+    "# Generate side-by-side plots for RMSE and SMAPE\n",
+    "fig, axes = plt.subplots(1, 2, figsize=(14, 6))\n",
+    "plot_metric(axes[0], rmse_df, \"RMSE Comparison Across Models by Category\", \"RMSE\")\n",
+    "plot_metric(axes[1], smape_df*100, \"%SMAPE Comparison Across Models by Category\", \"SMAPE\")\n",
     "\n",
-    "# plt.tight_layout()\n",
-    "# plt.show()"
+    "plt.tight_layout()\n",
+    "plt.show()"
    ]
   },
   {