Fix KV cache error in wildguard.py #2

comfzy · 2024-07-01T04:17:21Z

ValueError: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (30448). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.

ValueError: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (30448). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.

kavelrao

Thanks for submitting this fix, I hadn't run it on 3090 before so I didn't encounter this.

But I wonder if there's a more general way to address it by detecting the memory limit of the device instead of special-casing GTX 3090? I bet other consumer GPUs would have the same issue.

wildguard/wildguard.py

kavelrao

Couple comments, let me know if you'd rather I just implement this fix instead of going back and forth. Thank you for your patience!

kavelrao · 2024-07-05T16:05:34Z

wildguard/wildguard.py

-            self.model = LLM(model=MODEL_NAME)
+            gpu_name = torch.cuda.get_device_name(0)
+            if gpu_name == 'NVIDIA GeForce RTX 3090':
+                self.model = LLM(model=MODEL_NAME,max_model_len=30448)


Suggested change

self.model = LLM(model=MODEL_NAME,max_model_len=30448)

self.model = LLM(model=MODEL_NAME, max_model_len=30448)

kavelrao · 2024-07-05T16:09:06Z

wildguard/wildguard.py

+            gpu_name = torch.cuda.get_device_name(0)
+            if gpu_name == 'NVIDIA GeForce RTX 3090':


Suggested change

gpu_name = torch.cuda.get_device_name(0)

if gpu_name == 'NVIDIA GeForce RTX 3090':

if torch.cuda.get_device_properties(0).total_memory < 30e9:

And to make this work with ephemeral_model=True, you will also need to add a parameter for max_model_len to subprocess_inference_with_vllm and pass it through to create_and_inference_with_vllm.

Fix KV cache error in wildguard.py

57f45bb

ValueError: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (30448). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.

kavelrao reviewed Jul 3, 2024

View reviewed changes

wildguard/wildguard.py Show resolved Hide resolved

kavelrao reviewed Jul 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix KV cache error in wildguard.py #2

Fix KV cache error in wildguard.py #2

comfzy commented Jul 1, 2024

kavelrao left a comment

kavelrao left a comment

kavelrao Jul 5, 2024

kavelrao Jul 5, 2024

	self.model = LLM(model=MODEL_NAME,max_model_len=30448)
	self.model = LLM(model=MODEL_NAME, max_model_len=30448)

		gpu_name = torch.cuda.get_device_name(0)
		if gpu_name == 'NVIDIA GeForce RTX 3090':

	gpu_name = torch.cuda.get_device_name(0)
	if gpu_name == 'NVIDIA GeForce RTX 3090':
	if torch.cuda.get_device_properties(0).total_memory < 30e9:

Fix KV cache error in wildguard.py #2

Are you sure you want to change the base?

Fix KV cache error in wildguard.py #2

Conversation

comfzy commented Jul 1, 2024

kavelrao left a comment

Choose a reason for hiding this comment

kavelrao left a comment

Choose a reason for hiding this comment

kavelrao Jul 5, 2024

Choose a reason for hiding this comment

kavelrao Jul 5, 2024

Choose a reason for hiding this comment