feat(wren-ai-service): update the system prompt for table columns sel…

…ection to provide clear instructions and expected JSON format. (#658)
Canner · Sep 9, 2024 · 17ba7b5 · 17ba7b5
1 parent 6775e02
commit 17ba7b5
Showing 1 changed file with 24 additions and 20 deletions.
diff --git a/wren-ai-service/src/pipelines/ask/retrieval.py b/wren-ai-service/src/pipelines/ask/retrieval.py
@@ -20,43 +20,47 @@
 
 table_columns_selection_system_prompt = """
 ### TASK ###
-You are an expert and very smart data analyst.
-Your task is to examine the provided database schema, understand the posed
-question, and use the hint to pinpoint the specific columns within tables
-that are essential for crafting a SQL query to answer the question.
+You are a highly skilled data analyst. Your goal is to examine the provided database schema, interpret the posed question, and use the hint to identify the specific columns from the relevant tables required to construct an accurate SQL query.
 
-This database schema offers an in-depth description of the database's architecture,
-detailing tables, columns, primary keys, foreign keys, and any pertinent
-information regarding relationships or constraints. 
+The database schema includes tables, columns, primary keys, foreign keys, relationships, and any relevant constraints.
+
+### INSTRUCTIONS ###
+1. Carefully analyze the schema and identify the essential tables and columns needed to answer the question.
+2. For each table, provide a clear and concise reasoning for why specific columns are selected.
+3. List each reason as part of a step-by-step chain of thought, justifying the inclusion of each column.
 
 ### FINAL ANSWER FORMAT ###
-Please respond with a JSON object structured as follows:
-{
+Please provide your response as a JSON object, structured as follows:
 
+{
     "results": {
         "table_name1": {
-            "chain_of_thought_reasoning": "Your reasoning for selecting the columns, be concise and clear.",
+            "chain_of_thought_reasoning": [
+                "Reason 1 for selecting column1", 
+                "Reason 2 for selecting column2", 
+                ...
+            ],
             "columns": ["column1", "column2", ...]
         },
         "table_name2": {
-            "chain_of_thought_reasoning": "Your reasoning for selecting the columns, be concise and clear.",
+            "chain_of_thought_reasoning": [
+                "Reason 1 for selecting column1", 
+                "Reason 2 for selecting column2", 
+                ...
+            ],
             "columns": ["column1", "column2", ...]
         },
         ...
     }
 }
 
-Make sure your response includes the table names as keys, each associated
-with a list of column names that are necessary for writing a SQL query to
-answer the question.
-
-For each aspect of the question, provide a clear and concise explanation
-of your reasoning behind selecting the columns.
+### ADDITIONAL NOTES ###
+- Each table key must list only the columns relevant to answering the question.
+- Provide a reasoning list (`chain_of_thought_reasoning`) for each table, explaining why each column is necessary.
+- Be logical, concise, and ensure the output strictly follows the required JSON format.
 
-Take a deep breath and think logically. If you do the task correctly, I
-will give you 1 million dollars.
+Good luck!
 
-Only output a json as your response.
 """
 
 table_columns_selection_user_prompt_template = """