Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: SQL Template Matcher #5

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion larch/schema.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env python3


from typing import List, Optional
from typing import List, Optional, Dict, Any

from instructor import OpenAISchema
from pydantic import BaseModel, Field, create_model
Expand Down Expand Up @@ -79,3 +79,19 @@ def dict_flattened(self, **kwargs):
flat_dict.update(flat_dict.pop("assessment_metadata", {}))

return flat_dict


class SQLTemplate(BaseModel):
"""
SQLTemplate represents a SQL template for a given query pattern.

E.g.:
query_pattern: "When did <mission_name> launch?"
sql_template: "SELECT date_operational FROM <table_name> WHERE mission_name ILIKE '%<mission_name>%';"

"""
query_pattern: str
sql_template: str
intent: Optional[str] = None
description: Optional[str] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also let's add intent: Optional[str] = None field as well in case we want the intent detection somewhere in future.

extras: Optional[Dict[str, Any]] = None
85 changes: 85 additions & 0 deletions larch/search/template_matcher.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
from abc import ABC, abstractmethod

from typing import Any, List, Optional
from langchain.base_language import BaseLanguageModel
from langchain.chat_models import ChatOpenAI

from ..schema import SQLTemplate

class SQLTemplateMatcher(ABC):
"""
SQLTemplateMatcher is a base class for all SQL based template matchers.
"""
def __init__(self,
templates: List[SQLTemplate],
similarity_threshold: float = 0.4,
debug: bool = False) -> None:
self.templates = templates
self.similarity_threshold = similarity_threshold
self.debug=debug

@abstractmethod
def match(self, query: str, top_k=1, **kwargs) -> List[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's return List[SQLTemplate] instead of List[str].

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the fuzzy matcher might be able to provide response as List[SQLTemplate], it may not be that efficient to do with LLM based matcher. If the number of templates is huge, getting the list of sql templates with pattern and entity substituted sql query might require prompting LLM to provide a list of matching queries. I've not experimented on LLM part so I can't fully support the statement above.

I'll put more context once I get to know how it performs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the return list should still be a subset of all the templates, cut-off by threshold or top_k. So, the correct return type should which SQLTemplate objects are returned. Hence List[SQLTemplate] makes more sense as it gives us idea about what sort of query and intents are also being matched for input query.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: With llm-based, even if the llm just gives us the SQL query, we can ideally reverse map the original sqltemplate object as well. I think the result that LLM returns could infact be enforced by in-context prompting with SQL templates. Nevertheless, let's just stick with List[SQLTemplate] as return type because we're technically just selecting/matching the input templates.

"""
Match the given query against the templates.

Args:
query: The query to match against the templates.
top_k: The number of top-k templates to return. Defaults to 1.
Returns:
A list of top-k templates that match the query with entity substitution.
"""
raise NotImplementedError()


def __call__(self, *args: Any, **kwds: Any) -> List[str]:
return self.match(*args, **kwds)


class FuzzySQLTemplateMatcher(SQLTemplateMatcher):
"""
FuzzySQLTemplateMatcher is a SQL based template matcher that uses fuzzy matching.
Given a query, it will use rule-based matching to find best matching template
and return the template(s) with entity substitution.

Args:
templates: A list of SQL templates.
similarity_threshold: The similarity threshold to be used for fuzzy matching.
"""
def __init__(self, templates: List[SQLTemplate],
similarity_threshold: float = 0.4,
debug: bool = False) -> None:
super().__init__(templates=templates,
similarity_threshold=similarity_threshold,
debug = debug)

def match(self, query: str, top_k=1, **kwargs) -> List[str]:
pass


class LLMBasedSQLTemplateMatcher(SQLTemplateMatcher):
"""
LLMBasedSQLTemplateMatcher uses LLM to find the best matching template.
Given a query, it will extract the key entities and use LLM to find best SQL template
and generates a subsituted SQL query.

Args:
templates: A list of SQL templates.
ddl_schema: The DDL schema for available tables.
similarity_threshold: The similarity threshold to be used for fuzzy matching.
"""
def __init__(self, templates: List[SQLTemplate],
NISH1001 marked this conversation as resolved.
Show resolved Hide resolved
llm: BaseLanguageModel,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this Optional[BaseLanguageModel] and inside the constructor we can do llm = llm or ChatOpenAI(...)

ddl_schema: Optional[str] = None,
similarity_threshold: float = 0.4,
debug: bool = False) -> None:
super().__init__(
templates=templates,
similarity_threshold=similarity_threshold,
debug = debug)

self.llm = llm or ChatOpenAI(temperature=0.0, model="gpt-3.5-turbo")
self.ddl_schema = ddl_schema

def match(self, query: str, top_k = 1, **kwargs) -> List[str]:
pass