Merge pull request #147 from deepset-ai/update-eval-with-haystack

Fix the cookbook video link and titles
deepset-ai · Oct 30, 2024 · 5441a5c · 5441a5c
2 parents 0c19a83 + e275cd8
commit 5441a5c
Showing 1 changed file with 18 additions and 44 deletions.
diff --git a/notebooks/evaluating_ai_with_haystack.ipynb b/notebooks/evaluating_ai_with_haystack.ipynb
@@ -22,40 +22,7 @@
         "\n",
         "## 📺 Watch Along\n",
         "\n",
-        "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/live/Dy-n_yC3Cto\" title=\"Evaluating AI with Haystack\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen></iframe>"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "colab_type": "toc",
-        "id": "WI3_y1HNGiqQ"
-      },
-      "source": [
-        ">[Evaluating AI with Haystack](#scrollTo=uriHEO8pkgSo)\n",
-        "\n",
-        ">[Building your pipeline](#scrollTo=C_WUXQzEQWv8)\n",
-        "\n",
-        ">>[ARAGOG](#scrollTo=Dms5Ict6NGXq)\n",
-        "\n",
-        ">[Human Evaluation](#scrollTo=zTbmQzeXQY1F)\n",
-        "\n",
-        ">[Deciding on Metrics](#scrollTo=-U-QnCBqQcd6)\n",
-        "\n",
-        ">[Building an Evaluation Pipeline](#scrollTo=yLkAcM_5Qfat)\n",
-        "\n",
-        ">[Running Evaluation](#scrollTo=p76stWMQQmPD)\n",
-        "\n",
-        ">>>[Run the RAG Pipeline](#scrollTo=rUfQQzusXhgk)\n",
-        "\n",
-        ">>>[Run the Evaluation](#scrollTo=mfepD9HwXk4Q)\n",
-        "\n",
-        ">[Analyzing Results](#scrollTo=mC_mIqdMQqZG)\n",
-        "\n",
-        ">>[Evaluation Harness (Step 4, 5, and 6)](#scrollTo=OmkHqAsQZhFr)\n",
-        "\n",
-        ">[Evaluation Frameworks](#scrollTo=gKfrFf1CebJJ)\n",
-        "\n"
+        "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/Dy-n_yC3Cto?si=LB0GdFP0VO-nJT-n\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen></iframe>"
       ]
     },
     {
@@ -91,7 +58,7 @@
         "id": "C_WUXQzEQWv8"
       },
       "source": [
-        "# 1. Building your pipeline"
+        "## 1. Building your pipeline"
       ]
     },
     {
@@ -100,7 +67,7 @@
         "id": "Dms5Ict6NGXq"
       },
       "source": [
-        "## ARAGOG\n",
+        "### ARAGOG\n",
         "\n",
         "This dataset is based on the paper [Advanced Retrieval Augmented Generation Output Grading (ARAGOG)](https://arxiv.org/pdf/2404.01037). It's a\n",
         "collection of papers from ArXiv covering topics around Transformers and Large Language Models, all in PDF format.\n",
@@ -113,7 +80,14 @@
         "- ground-truth answers\n",
         "- questions\n",
         "\n",
-        "Source: https://github.com/deepset-ai/haystack-evaluation/blob/main/datasets/README.md"
+        "Get the dataset [here](https://github.com/deepset-ai/haystack-evaluation/blob/main/datasets/README.md)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Indexing Pipeline"
       ]
     },
     {
@@ -276,7 +250,7 @@
         "embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\"\n",
         "document_store = InMemoryDocumentStore()\n",
         "\n",
-        "files_path = \"/content/papers_for_questions\"\n",
+        "files_path = \"/content/papers_for_questions\" # <ENTER YOUR PATH HERE>\n",
         "pipeline = Pipeline()\n",
         "pipeline.add_component(\"converter\", PyPDFToDocument())\n",
         "pipeline.add_component(\"cleaner\", DocumentCleaner())\n",
@@ -412,7 +386,7 @@
         "id": "zTbmQzeXQY1F"
       },
       "source": [
-        "# 2. Human Evaluation"
+        "## 2. Human Evaluation"
       ]
     },
     {
@@ -543,7 +517,7 @@
         "id": "-U-QnCBqQcd6"
       },
       "source": [
-        "# 3. Deciding on Metrics\n",
+        "## 3. Deciding on Metrics\n",
         "\n",
         "* **Semantic Answer Similarity**: SASEvaluator compares the embedding of a generated answer against a ground-truth answer based on a common embedding model.\n",
         "* **ContextRelevanceEvaluator** will assess the relevancy of the retrieved context to answer the query question\n",
@@ -556,7 +530,7 @@
         "id": "yLkAcM_5Qfat"
       },
       "source": [
-        "# 4. Building an Evaluation Pipeline"
+        "## 4. Building an Evaluation Pipeline"
       ]
     },
     {
@@ -582,7 +556,7 @@
         "id": "p76stWMQQmPD"
       },
       "source": [
-        "# 5. Running Evaluation"
+        "## 5. Running Evaluation"
       ]
     },
     {
@@ -663,7 +637,7 @@
         "id": "mC_mIqdMQqZG"
       },
       "source": [
-        "# 6. Analyzing Results"
+        "## 6. Analyzing Results"
       ]
     },
     {
@@ -3488,7 +3462,7 @@
         "id": "gKfrFf1CebJJ"
       },
       "source": [
-        "# Evaluation Frameworks"
+        "## Evaluation Frameworks"
       ]
     },
     {