Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please add to quickstart "how to use" section #3

Open
Israel-Laguan opened this issue Sep 13, 2024 · 2 comments
Open

Please add to quickstart "how to use" section #3

Israel-Laguan opened this issue Sep 13, 2024 · 2 comments

Comments

@Israel-Laguan
Copy link

Israel-Laguan commented Sep 13, 2024

First of all thanks for the great work!

Context

I was trying to use with my new build 2 p40 in ubuntu 24.04 (Pop_OS), and it seems to run, I had to check the code and found a endpoint with some info:

image

image

Great, I found also some errors

image

Also the one named /gui was not working.

Request

Besides the existing quickstart would be nice a couple of ways to help users to use the program. Some stuff as I user I would find useful:

  1. cURL example to check gppm is working, for example
curl http://localhost:5001/get_llamacpp_subprocesses
  1. Reference of available endpoints and examples of use

  2. An example of using one single p40 and guide until call to llama.cpp instance with a query like "how many squares are on a chessboard?". I notice this is partially mentioned already, just would be nice the "now query llama.cpp with a prompt" section or similar.

  3. An example of using 2 or more p40 and maybe to change a config, then inspect the change.

  4. Just in case is needed, how to disable/uninstall.

Other considerations

It worked fine, I think not needed but I share my PC specs:
CPU: AMD Ryzen 3 3200G
RAM: 8 RAM (3200MHz)
GPU(s): 2 Nvidia P40
OS: pop_OS 24

@Israel-Laguan Israel-Laguan changed the title Please add to quickstart how to use it Please add to quickstart "how to use" section Sep 13, 2024
@crashr
Copy link
Owner

crashr commented Sep 14, 2024

Hello Israel-Laguan,
thank you very much for that feedback!

I can briefly try to explain the dilemma I'm in with gppm. When I started a few months ago, I hit on it at exactly the right time. There was no ready-made solution to get the power consumption of P40 under control, but sasha0552 had just released nvidia-pstate, on which gppm is essentially based. I wanted a solution for very specific scenarios. I wanted to enable cheap computers with as many P40s as possible, which were very cheap at the time, to run a whole bunch of llama.cpp instances at the same time. The aim was to make this usable for teams, very small companies (liek a hand ull of people or so) etc. or agentic setups where inference runs almost continuously and there is little idle time. When I introduced gppm on Reddit, many P40 users used it immediately, although they actually only needed a much simpler functionality. This is exactly what sasha0552 recognized and published nvidia-pstated.

nvidia-pstated does something that I have already done here a29a3ea. You can observe whether the GPU is working (or wants to work) or not and then switch the performance state accordingly. I have even experimented with switching the performance state very frequently, even between individual tokens. However, the whole thing didn't meet with much interest, which is why I didn't develop it any further. When there was no more feedback from users, I assumed that most of them had switched to nvidia-pstated in the meantime. For users with a small number of GPUs or where the GPU does nothing 99% of the time, it probably makes sense because of the low complexity. I have only developed gppm according to my personal needs from that point on. This has 1. led to the current architecture, which I find somewhat suboptimal and 2. also to the fact that I have put little effort into the documentation and have not further developed half-finished things like the gui but have not removed them either. But I'll be happy to make up for that if it helps, thanks again for the feedback.

As far as the architecture of gppm is concerned, I can briefly outline the roadmap. Originally, gppm stood for “GPU Power and Performance Manager”. However, I now also use it to launch and manage my entire stack. That's why I've renamed it. gppm is now a recursive acronym and stands for “gppm power process manager”. Just like GNU stands for “GNU's Not Unix”. I don't like that so much, because I prefer a concrete tool for a concrete purpose. That's why I'm going to split this into two projects in future, which can be used in combination or individually.

@Israel-Laguan
Copy link
Author

I think your solution is solid for the time being. Maybe you can try to leave the project without rough edges and move on, but the project is good. It just needs some polish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants