-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please add to quickstart "how to use" section #3
Comments
Hello Israel-Laguan, I can briefly try to explain the dilemma I'm in with gppm. When I started a few months ago, I hit on it at exactly the right time. There was no ready-made solution to get the power consumption of P40 under control, but sasha0552 had just released nvidia-pstate, on which gppm is essentially based. I wanted a solution for very specific scenarios. I wanted to enable cheap computers with as many P40s as possible, which were very cheap at the time, to run a whole bunch of llama.cpp instances at the same time. The aim was to make this usable for teams, very small companies (liek a hand ull of people or so) etc. or agentic setups where inference runs almost continuously and there is little idle time. When I introduced gppm on Reddit, many P40 users used it immediately, although they actually only needed a much simpler functionality. This is exactly what sasha0552 recognized and published nvidia-pstated. nvidia-pstated does something that I have already done here a29a3ea. You can observe whether the GPU is working (or wants to work) or not and then switch the performance state accordingly. I have even experimented with switching the performance state very frequently, even between individual tokens. However, the whole thing didn't meet with much interest, which is why I didn't develop it any further. When there was no more feedback from users, I assumed that most of them had switched to nvidia-pstated in the meantime. For users with a small number of GPUs or where the GPU does nothing 99% of the time, it probably makes sense because of the low complexity. I have only developed gppm according to my personal needs from that point on. This has 1. led to the current architecture, which I find somewhat suboptimal and 2. also to the fact that I have put little effort into the documentation and have not further developed half-finished things like the gui but have not removed them either. But I'll be happy to make up for that if it helps, thanks again for the feedback. As far as the architecture of gppm is concerned, I can briefly outline the roadmap. Originally, gppm stood for “GPU Power and Performance Manager”. However, I now also use it to launch and manage my entire stack. That's why I've renamed it. gppm is now a recursive acronym and stands for “gppm power process manager”. Just like GNU stands for “GNU's Not Unix”. I don't like that so much, because I prefer a concrete tool for a concrete purpose. That's why I'm going to split this into two projects in future, which can be used in combination or individually. |
I think your solution is solid for the time being. Maybe you can try to leave the project without rough edges and move on, but the project is good. It just needs some polish. |
First of all thanks for the great work!
Context
I was trying to use with my new build 2 p40 in ubuntu 24.04 (Pop_OS), and it seems to run, I had to check the code and found a endpoint with some info:
Great, I found also some errors
Also the one named
/gui
was not working.Request
Besides the existing quickstart would be nice a couple of ways to help users to use the program. Some stuff as I user I would find useful:
Reference of available endpoints and examples of use
An example of using one single p40 and guide until call to llama.cpp instance with a query like "how many squares are on a chessboard?". I notice this is partially mentioned already, just would be nice the "now query llama.cpp with a prompt" section or similar.
An example of using 2 or more p40 and maybe to change a config, then inspect the change.
Just in case is needed, how to disable/uninstall.
Other considerations
It worked fine, I think not needed but I share my PC specs:
CPU: AMD Ryzen 3 3200G
RAM: 8 RAM (3200MHz)
GPU(s): 2 Nvidia P40
OS: pop_OS 24
The text was updated successfully, but these errors were encountered: