Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to enable structured output? #22

Open
phiweger opened this issue Aug 6, 2023 · 7 comments
Open

Is there a way to enable structured output? #22

phiweger opened this issue Aug 6, 2023 · 7 comments
Labels
enhancement New feature or request

Comments

@phiweger
Copy link

phiweger commented Aug 6, 2023

... like OpenAI function calling or jsonformer?

@philschmid philschmid added the enhancement New feature or request label Aug 7, 2023
@philschmid
Copy link
Owner

Thank you for opening the issue! There is not such feature, yet. But definitely interesting!. Do you think you could provide an outline on how it should work? Like what do you want to achieve.

@rajaswa-postman
Copy link

A vanilla function calling implementation can be done by appending Python function signatures to the system prompts in easyllm/prompt_utils.

@phiweger
Copy link
Author

@rajaswa-postman can you post an example how this would look like?

@rajaswa-postman
Copy link

Sure. For an OpenAI function call with the following example -

"functions": [
    {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
  ]

We can have the following appended to the system prompt -

functions_string = """

You can respond with a function call by giving out the function name and its arguments in a structured JSON format - 
{
"name" : function_name,
"arguments" : { arg_name : arg_value }
}

You can choose the most suitable function from the following list of Python functions to respond with function calls -

def get_current_weather(location: str, unit: str = "celsius"):
    """
    Get the current weather in a given location.
    
    :param location: The city and state, e.g. San Francisco, CA.
    :param unit: The temperature unit to use, either "celsius" or "fahrenheit" (default is "celsius").
    :return: Current weather information for the specified location.
    """
"""

@rajaswa-postman
Copy link

This will obviously be very limited since we don't know how OpenAI has trained their model to respond with function_call as an independent agent in the chat besides the assistant and user. But we can expect to get some performance with such vanilla function calling prompting with the assistant outputs.

@bcarsley
Copy link

bcarsley commented Aug 17, 2023

I am actually building a LlamaJsonformer right now using this library!
will link to the code in the next couple days if anyone is interested.
presently working w/ an instruct-heavy approach (i.e. nothing as cool as the logits processors of https://github.com/1rgs/jsonformer which would be one neat way of going about this beyond the api client) ...
that being said, I have no trouble using the huggingface base endpoint to get structured json using this approach, even have it working w/ an enum type schema too (this schema in particular only gpt-4 seems to understand, 3.5 doesn't perform well at the following example in my experience)

Example:

keywords = ['Linguistics', "Medicine", 'Engineering', 'Music', 'Performing Arts', 'Sociology', 'Women in STEM', 'Diversity', 'Political Science', 'Automotive Repair', 'Multilingual Studies', 'Global Studies']

keywords_enum_schema = {
            "type": "object",
            "properties": {
                "keywords": {
                    "type": "array",
                    "options": keywords,
                    "best_keywords": []
                }
            }
}

context = '''Sara is a second-year linguistics student at University College London. She is currently taking courses in phonetics, syntax, and sociolinguistics. Sara enjoys her phonetics lectures, where she is learning about the physical properties of speech sounds. For her syntax course, she is studying how words are put together to form phrases and sentences. Her sociolinguistics class examines how language varies based on social factors like gender, ethnicity, and socioeconomic status. While sociolinguistics is fascinating, Sara finds some of the course readings quite dense. Overall, she is glad to be pursuing her passion for the study of language and hopes to continue her linguistics studies after completing her bachelor's degree.'''

prompt = "Choose the best, most relevant keywords from the options list, based on the context. Generate factual information as json based on the following schema and context:"
llamaformer = LlamaJsonformer(keywords_enum_schema, prompt, context)
llamaformer.zero_shot_generate_object(client='huggingface')

output

{
	"type": "object",
	"properties": {
		"keywords": {
			"type": "array",
			"options": [
				"Linguistics",
				"Sociolinguistics",
				"Phonetics",
				"Syntax",
				"Women in STEM",
				"Diversity",
				"Multilingual Studies",
				"Global Studies"
			],
			"best_keywords": [
				"Linguistics",
				"Sociolinguistics",
				"Phonetics",
				"Syntax"
			]
		}
	}
}

Still hallucinating some stuff, but it's valid json! As I said, hope to release the edits soon :)

B ✌️

@phiweger
Copy link
Author

@bcarsley any updates on the llama jsonformer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants