Replies: 2 comments 1 reply
-
I'm trying it out on a monstrous (but old) bare-metal machine. I have one document in the source_documents folder: the copy of The Wizard of Oz downloaded from Project Gutenberg. It takes 50-65 seconds to answer the question "What was Dorothy's mission?" I have an RTX4060 GPU with 16GB of vram and I'm running the XL model. The processors are 12 core Xeon E5s (two of them) and I have 256GB of RAM. It's not the machine. One thing I'm seeing is the GPU isn't involved in the query; both my CPUs are seriously working but while I can see the GPU memory obviously has something in it, the GPU isn't being used to process. It's frustrating since I'm watching videos where people just do a straightforward install and it's zippy for them. |
Beta Was this translation helpful? Give feedback.
-
im running on a xeon with quadro 5000 16gb. instructor-xl + Llama-2-7B-Chat-GPTQ and it takes <20s per response. it takes the same amount of time whether my DB is 200kb or 1Gb. |
Beta Was this translation helpful? Give feedback.
-
Hey!
I have a simple .txt document with subtitles and website links. It's about 200 lines, but very short and simple. Still, it takes about 50s-1m to get a response for a simple query - on my M1 chip. I was wondering if people have similar experiences.
For something simple like this, I'm curious if there's a more suitable model, or if I simply just need a much more powerful machine (VM) to cut that response time substantially.
Beta Was this translation helpful? Give feedback.
All reactions