Same model, same question - different answer if loaded directly vs as openai endpoint #2962

hugalafutro · 2024-09-16T22:37:17Z

hugalafutro
Sep 16, 2024

Hi,
the model I use is L3-8B-Stheno-v3.2-abliterated.i1-Q4_K_M.gguf. The question I use is How many R's are there in the word strawberry (I know why most models fail it and that it's not really a good test of LLM, but disregard that)

If I load the model in the app directly, it counts 1 R most of time, sometimes 2, never seen it count 3.
If I load the model in koboldcpp, and connect gpt4all to the openai endpoint of koboldcpp, without fail it will count 3 R's and often also spell it out and explain it. (By without fail I mean I tried like 5 times with each method, which you might argue is not enough of a statistical sample, but we ain't doing statistics here, this is simple arithmetic/spelling most pre-schoolers can do.)

Why? The backend shouldn't matter, no? OpenAI endpoint is just an access to the model. So why the difference?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same model, same question - different answer if loaded directly vs as openai endpoint #2962

{{title}}

Replies: 0 comments

Select a reply

Same model, same question - different answer if loaded directly vs as openai endpoint #2962

hugalafutro Sep 16, 2024

Replies: 0 comments

hugalafutro
Sep 16, 2024