Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: SQL Template Matcher #5

Closed
wants to merge 3 commits into from

Conversation

anisbhsl
Copy link
Collaborator

@anisbhsl anisbhsl commented Jan 7, 2024

This is a feature PR for SQL based template matcher. Two downstream PRs:

  1. Fuzzy SQL Matcher (Tracked by Fuzzy SQL Template Matcher #3 )
  2. LLM based SQL Matcher (Tracked by LLM Based SQL Template Matcher #4)

will be branched off this PR.

@anisbhsl anisbhsl marked this pull request as draft January 7, 2024 19:12
@anisbhsl anisbhsl linked an issue Jan 7, 2024 that may be closed by this pull request
3 tasks
@NISH1001
Copy link
Collaborator

NISH1001 commented Jan 8, 2024

Reiterating: Let's break the PR (draft) into 2:

@NISH1001
Copy link
Collaborator

NISH1001 commented Jan 8, 2024

Possible use case:

  • Making the sql agent better by having the template matcher as primary tool?
  • Grounding the query-to-sql to reduce possible hallucinations in the SQL.

For LLM-based matcher, we can provide in-context (all-shot prompting?) list of templates (query+sql) and let the LLM give us scored output which we can use.

Usage

Possible usage could be

from larch import SQLTemplate
from larch.search import FuzzySQLTemplateMatcher

templates  = [
    SQLTemplate(
        query="What is the capital of Nepal?",
        sql="select * from country c where c.country=="Nepal";
]

matcher = FuzzySQLTemplateMatcher(templates=templates, ...)

query = "Capital of Nepal"
matched_templates = matcher(query=query, ...)

self.debug=debug

@abstractmethod
def match(self, query: str, top_k=1, **kwargs) -> List[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's return List[SQLTemplate] instead of List[str].

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the fuzzy matcher might be able to provide response as List[SQLTemplate], it may not be that efficient to do with LLM based matcher. If the number of templates is huge, getting the list of sql templates with pattern and entity substituted sql query might require prompting LLM to provide a list of matching queries. I've not experimented on LLM part so I can't fully support the statement above.

I'll put more context once I get to know how it performs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the return list should still be a subset of all the templates, cut-off by threshold or top_k. So, the correct return type should which SQLTemplate objects are returned. Hence List[SQLTemplate] makes more sense as it gives us idea about what sort of query and intents are also being matched for input query.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: With llm-based, even if the llm just gives us the SQL query, we can ideally reverse map the original sqltemplate object as well. I think the result that LLM returns could infact be enforced by in-context prompting with SQL templates. Nevertheless, let's just stick with List[SQLTemplate] as return type because we're technically just selecting/matching the input templates.

"""
query_pattern: str
sql_template: str
description: Optional[str] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also let's add intent: Optional[str] = None field as well in case we want the intent detection somewhere in future.

@anisbhsl
Copy link
Collaborator Author

anisbhsl commented Jan 8, 2024

Reiterating: Let's break the PR (draft) into 2:

Yeah the plan is to do the same as explained in the description above.

similarity_threshold: The similarity threshold to be used for fuzzy matching.
"""
def __init__(self, templates: List[SQLTemplate],
llm: BaseLanguageModel,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this Optional[BaseLanguageModel] and inside the constructor we can do llm = llm or ChatOpenAI(...)

@NISH1001
Copy link
Collaborator

NISH1001 commented Jan 8, 2024

@anisbhsl can you also run the code through pre-commit hooks please.
Here's how to do it:

pip install pre-commit

pre-commit install #installs the config from .pre-commit-config.yaml

Once done, you'll be able to run the hooks automatically every time you invoke git commit <...> command.

@NISH1001
Copy link
Collaborator

NISH1001 commented Jan 8, 2024

Another thought I have:

By making use of description, intent, query for SQLTemplate we can also possibly do RAG on top of templates to find the best matching candidate, and do the entity substitution via llm for matched query.

@NISH1001
Copy link
Collaborator

Closing this for time being as apparently it's already in develop for some rason.

@NISH1001 NISH1001 closed this Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL Template Matcher
3 participants