Same model, same question - different answer if loaded directly vs as openai endpoint #2962
hugalafutro
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
the model I use is
L3-8B-Stheno-v3.2-abliterated.i1-Q4_K_M.gguf
. The question I use isHow many R's are there in the word strawberry
(I know why most models fail it and that it's not really a good test of LLM, but disregard that)If I load the model in the app directly, it counts 1 R most of time, sometimes 2, never seen it count 3.
If I load the model in koboldcpp, and connect gpt4all to the openai endpoint of koboldcpp, without fail it will count 3 R's and often also spell it out and explain it. (By without fail I mean I tried like 5 times with each method, which you might argue is not enough of a statistical sample, but we ain't doing statistics here, this is simple arithmetic/spelling most pre-schoolers can do.)
Why? The backend shouldn't matter, no? OpenAI endpoint is just an access to the model. So why the difference?
Beta Was this translation helpful? Give feedback.
All reactions