Typical Response Time for Query? #670

adsalad · 2023-12-04T05:27:17Z

adsalad
Dec 4, 2023

Hey!

I have a simple .txt document with subtitles and website links. It's about 200 lines, but very short and simple. Still, it takes about 50s-1m to get a response for a simple query - on my M1 chip. I was wondering if people have similar experiences.

For something simple like this, I'm curious if there's a more suitable model, or if I simply just need a much more powerful machine (VM) to cut that response time substantially.

brucevanhorn2 · 2024-01-23T00:09:32Z

brucevanhorn2
Jan 23, 2024

I'm trying it out on a monstrous (but old) bare-metal machine. I have one document in the source_documents folder: the copy of The Wizard of Oz downloaded from Project Gutenberg. It takes 50-65 seconds to answer the question "What was Dorothy's mission?"

I have an RTX4060 GPU with 16GB of vram and I'm running the XL model. The processors are 12 core Xeon E5s (two of them) and I have 256GB of RAM. It's not the machine.

One thing I'm seeing is the GPU isn't involved in the query; both my CPUs are seriously working but while I can see the GPU memory obviously has something in it, the GPU isn't being used to process. It's frustrating since I'm watching videos where people just do a straightforward install and it's zippy for them.

1 reply

brucevanhorn2 Jan 25, 2024

I was able to fix it by READING THE INSTRUCTIONS :-)

First, I was using regular python and not anaconda. I doubt that's the problem. I definitely didn't follow the instructions on installing llama cpp python. I had just run a regular pip install. I installed anaconda, deleted the DB folder, made a new environment and pasted the BLAS command in from the instructions and now I can get a response back in fewer than 10 seconds, and I can see the GPU doing MOST (but not all) of the work.

lavericklavericklaverick · 2024-02-02T11:04:00Z

lavericklavericklaverick
Feb 2, 2024

im running on a xeon with quadro 5000 16gb. instructor-xl + Llama-2-7B-Chat-GPTQ and it takes <20s per response.

it takes the same amount of time whether my DB is 200kb or 1Gb.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Typical Response Time for Query? #670

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Typical Response Time for Query? #670

adsalad Dec 4, 2023

Replies: 2 comments · 1 reply

brucevanhorn2 Jan 23, 2024

brucevanhorn2 Jan 25, 2024

lavericklavericklaverick Feb 2, 2024

adsalad
Dec 4, 2023

Replies: 2 comments 1 reply

brucevanhorn2
Jan 23, 2024

lavericklavericklaverick
Feb 2, 2024