-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Created multiple choice question/evaluation classes, with tests
- Loading branch information
1 parent
d688ccc
commit 3b4fc47
Showing
13 changed files
with
1,395 additions
and
54 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
104 changes: 104 additions & 0 deletions
104
...s/cassettes/TestLitQAEvaluation.test_grade[didnt-match-and-llm-has-innate-knowledge].yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
interactions: | ||
- request: | ||
body: | ||
'{"messages": [{"content": "Given the following question and a proposed | ||
answer to the question, return the single-letter choice in the question that | ||
matches the proposed answer. If the proposed answer is blank or an empty string, | ||
or multiple options are matched, respond with ''0''.\n\nQuestion: Q: What is | ||
the meaning of life?\n\nOptions:\nA) -84\nB) Insufficient information to answer | ||
this question\nC) cheesecake\nD) 11\nE) 42\n\nProposed Answer: 14\n\nSingle | ||
Letter Answer:", "role": "user"}], "model": "gpt-4o"}' | ||
headers: | ||
accept: | ||
- application/json | ||
accept-encoding: | ||
- gzip, deflate | ||
connection: | ||
- keep-alive | ||
content-length: | ||
- "513" | ||
content-type: | ||
- application/json | ||
host: | ||
- api.openai.com | ||
user-agent: | ||
- AsyncOpenAI/Python 1.57.4 | ||
x-stainless-arch: | ||
- arm64 | ||
x-stainless-async: | ||
- async:asyncio | ||
x-stainless-lang: | ||
- python | ||
x-stainless-os: | ||
- MacOS | ||
x-stainless-package-version: | ||
- 1.57.4 | ||
x-stainless-raw-response: | ||
- "true" | ||
x-stainless-retry-count: | ||
- "1" | ||
x-stainless-runtime: | ||
- CPython | ||
x-stainless-runtime-version: | ||
- 3.12.7 | ||
method: POST | ||
uri: https://api.openai.com/v1/chat/completions | ||
response: | ||
body: | ||
string: !!binary | | ||
H4sIAAAAAAAAAwAAAP//jJJLa8MwEITv/hVC57goqamd3HpooPQBORVSilGktaNW1gpJoY+Q/15k | ||
u7FDU+jFh/l2xrNr7xNCqJJ0QajY8iAaq9Prav24NO7rNmMKH55ulvd8VaxXHu/ejaOT6MDNK4jw | ||
47oQ2FgNQaHpsHDAA8TUaX6ZZTnLi3kLGpSgo622Ic0wnbFZlrIiZVe9cYtKgKcL8pwQQsi+fcaK | ||
RsIHXRA2+VEa8J7XQBfHIUKoQx0Vyr1XPnAT6GSAAk0A07ZmY91BtfM81jI7rXv9cHyRxto63Pie | ||
H/VKGeW3pQPu0cRQH9DSlh4SQl7ahXYnHal12NhQBnwDEwOnrOjy6HDCEe1ZwMD12DSfnIkrJQSu | ||
tB9dhAoutiAH63A+vpMKRyAZLf27zLnsbnFl6v/ED0AIsAFkaR1IJU4XHsYcxB/sr7HjkdvC1H/6 | ||
AE1ZKVODs05137iyJc/nspBcTCuaHJJvAAAA//8DAGY5XevsAgAA | ||
headers: | ||
CF-Cache-Status: | ||
- DYNAMIC | ||
CF-RAY: | ||
- 8f39fde1cf88cf1b-SJC | ||
Connection: | ||
- keep-alive | ||
Content-Encoding: | ||
- gzip | ||
Content-Type: | ||
- application/json | ||
Date: | ||
- Tue, 17 Dec 2024 21:26:29 GMT | ||
Server: | ||
- cloudflare | ||
Transfer-Encoding: | ||
- chunked | ||
X-Content-Type-Options: | ||
- nosniff | ||
access-control-expose-headers: | ||
- X-Request-ID | ||
alt-svc: | ||
- h3=":443"; ma=86400 | ||
openai-organization: | ||
- future-house-xr4tdh | ||
openai-processing-ms: | ||
- "363" | ||
openai-version: | ||
- "2020-10-01" | ||
strict-transport-security: | ||
- max-age=31536000; includeSubDomains; preload | ||
x-ratelimit-limit-requests: | ||
- "10000" | ||
x-ratelimit-limit-tokens: | ||
- "30000000" | ||
x-ratelimit-remaining-requests: | ||
- "9999" | ||
x-ratelimit-remaining-tokens: | ||
- "29999874" | ||
x-ratelimit-reset-requests: | ||
- 6ms | ||
x-ratelimit-reset-tokens: | ||
- 0s | ||
x-request-id: | ||
- req_aff8daa48aa43d3df077f97da6136e5a | ||
status: | ||
code: 200 | ||
message: OK | ||
version: 1 |
Oops, something went wrong.