Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test / promptfoo Issues #872

Open
pl-shernandez opened this issue Nov 18, 2024 · 10 comments
Open

Test / promptfoo Issues #872

pl-shernandez opened this issue Nov 18, 2024 · 10 comments

Comments

@pl-shernandez
Copy link

pl-shernandez commented Nov 18, 2024

Issue

I'm having some issues getting the test functionality to succeed so I stripped it down to a very simple script. The issue is when using any of npx genaiscript test add, right-clicking the script and choosing "Run Tests" or via the VS Code Test Explorer. My teammate is experiencing the same issue.

Script

I made this barebones script which works just fine when ran and I know that the model is deployed.

genaiscript: add
prompting azure:gpt-4o (~245 tokens)

1 + 1 equals 2.

genaiscript: success

add.genaiscript.mjs

script({
  title: 'Simple Math Test',
  description: 'Validates that the model correctly calculates 1+1.',
  group: 'Basic Tests',
  temperature: 0,  
  model: 'azure:gpt-4o',
  maxTokens: 10, 
  tests: [
    {
      files: [],  
      rubrics: ['output correctly calculates 1+1 as 2'],
      facts: [`The model should return "2".`],
      asserts: [
        {
          type: 'equals',
          value: '2', 
        },
      ],
    },
  ],
});

$`What is 1 + 1?`;  

Errors

The errors I receive in the terminal when running via right-clicking the script or Test Explorer...

❌ add Command failed with exit code 1: npx --yes '[email protected]' eval --config .genaiscript/tests/add.promptfoo.yaml --max-concurrency 1 --no-progress-bar --cache --verbose --output .genaiscript/tests/add.promptfoo.res.json http://127.0.0.1:15500/eval?evalId=eval-gtY-2024-11-18T19%3A59%3A59

and in the PromptFoo UI
Image

Similarly if i run npx genaiscript test add
Image

Troubleshooting

I tried adding additional variables for the default models in my .env but that didn't help
AZURE_OPENAI_API_KEY=hidden
AZURE_OPENAI_API_ENDPOINT=hidden
GENAISCRIPT_DEFAULT_MODEL="azure:gpt-4o"
GENAISCRIPT_DEFAULT_SMALL_MODEL="azure:gpt-4o"
GENAISCRIPT_DEFAULT_VISION_MODEL="azure:gpt-4o"

Tried adding promptfoo as a dependency
npm install -d promptfoo@latest

Tried un-installing and re-installing extension

Setup

genaiscript": "^1.76.0
vscode: Version: 1.95.3 (Universal)
mac os: 14.6.1

Going Further

Prior to this I was able to runs some more complex prompts with tests unsuccessfully using an augmented version of the poem example. But I received unterminated JSON errors even with 4096 tokens set. I felt I should try and ask for help on this first.

@pelikhan
Copy link
Member

I suspect we leak some logging in the console.out that breaks the console.log

I will investigate.

@pelikhan
Copy link
Member

(The formatting is unfortunate in promptfoo but there is no way to customize how it is rendered in the cli)

@pl-shernandez
Copy link
Author

Troubleshooting Experiment with Ollama

I tried another experiment this AM using ollama locally and I updated the add.genaiscript.mjs to use model: 'ollama:llama3.2' inside the script(...

The script runs and succeeds with 🦙 locally serving the default. The test running approaches though now have this message
"Error: OpenAI API key is not set. Set the OPENAI_API_KEY environment variable or add apiKeyto the provider config.\n\nError: OpenAI API key is not set. Set the OPENAI_API_KEY environment variable or addapiKey to the provider config.\n at OpenAiChatCompletionProvider.callApi

Experimented with setting my .env file to a mix of

GENAISCRIPT_DEFAULT_MODEL="ollama:llama3.2"
GENAISCRIPT_DEFAULT_SMALL_MODEL="ollama:llama3.2"
GENAISCRIPT_DEFAULT_VISION_MODEL="ollama:llama3.2"

and adding placeholders without success but my understanding is those aren't needed for Ollama

OPENAI_API_KEY=
OPENAI_API_ENDPOINT=
GENAISCRIPT_DEFAULT_MODEL="ollama:llama3.2"
GENAISCRIPT_DEFAULT_SMALL_MODEL="ollama:llama3.2"
GENAISCRIPT_DEFAULT_VISION_MODEL="ollama:llama3.2"

promptfoo really seems dead set on that OPEN_API_KEY

@pelikhan
Copy link
Member

The issue is that rubrick or facts are LLM-as-judge type of assertions and they require further configuration of the LLM. This is not well supported by genaiscript yes (oops). The promptfoo docs are at https://www.promptfoo.dev/docs/configuration/expected-outputs/model-graded/llm-rubric/#overriding-the-llm-grader for info.

To debug further the promptfoo issues, you can take a look at the .yml files dropped under .genaiscript/tests/*.yml. These are promptfoo configuration that are run directly by the promptfoo cli. For those, all the docs at https://www.promptfoo.dev/docs/getting-started/ apply since GenAIScript is merely launching the promptfoo cli on those. This is usually how I debug promptfoo.

@pelikhan
Copy link
Member

I need to add an escape hatch to allow for configuration the promptfoo grader.

@pelikhan
Copy link
Member

@pl-shernandez 1.76.2 fixes the JSON parsing issue , which was caused from a "pollution" of the process.stdout when we pipe the JSON out. The issue of properly configuring promptfoo for rubric testing is still somewhat relevant. For example, I haven't see how promptfoo handles running Azure with Microsoft Entra yet.

Maybe it's time to remove this dependency and run it all in genaiscript.

@pl-shernandez
Copy link
Author

pl-shernandez commented Nov 21, 2024

@pelikhan Spent some time working with this today to better understand. I was able to call the promptfoo npx directly and supply an override grader as suggested while using the yaml that genaiscript creates. I had to do two things in calling directly to satisfy promptfoo:

Add these environment variables to .env for their configuration

AZURE_DEPLOYMENT_NAME= 
AZURE_API_KEY= 
AZURE_API_HOST= 

Put in the flag for the --grader and remove the --cache flag and using the auto-generated yaml config with no modifications
npx --yes '[email protected]' eval --grader azureopenai:chat:gpt-4o-mini --config .genaiscript/tests/add.promptfoo.yaml --max-concurrency 1 --no-progress-bar --verbose --output .genaiscript/tests/add.promptfoo.res.json

This is the command that it runs when right click to run tests or test explorer is used that I modified
npx --yes '[email protected]' eval --config .genaiscript/tests/add.promptfoo.yaml --max-concurrency 1 --no-progress-bar --cache --verbose --output .genaiscript/tests/add.promptfoo.res.json

@pelikhan
Copy link
Member

Thank you for the investigation. I'll be looking into the mods.

@pelikhan
Copy link
Member

@pelikhan
Copy link
Member

Please try 1.77.3 and above. Cache is gone and the configuration in the generate .yml file is more precise. You'll still need the .env variables. I'll see about supporting them in GenAIScript as well so you don't have to duplicate them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants