huggingface / text-generation-inference Public

Notifications You must be signed in to change notification settings
Fork 1.1k
Star 9.2k

Code
Issues 131
Pull requests 14
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: huggingface/text-generation-inference

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

131 Open 1,236 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

TGI auto max token not working for Llama-3 models

#2787 opened Nov 29, 2024 by SeEngel

4 tasks

"sharded is not supported for AutoModel" Error When Deploying SageMaker Endpoint For Qwen 2.5 7B Trained via SageMaker

#2783 opened Nov 26, 2024 by jjbuck

2 of 4 tasks

Potential Qwen/Qwen2-VL-7B-Instruct issue

#2781 opened Nov 26, 2024 by maxjeblick

2 of 4 tasks

I encountered the same issue while using baichuan2-13B-chat..

#2780 opened Nov 26, 2024 by Lacacy

Triton Error [CUDA]

#2776 opened Nov 25, 2024 by paulcx

2 of 4 tasks

"RuntimeError: weight lm_head.weight does not exist" quantizing Llama-3.2-11B-Vision-Instruct

#2775 opened Nov 22, 2024 by akowalsk

2 of 4 tasks

Latest Docker Image failing for A40 GPU

#2763 opened Nov 20, 2024 by SMAntony

2 of 4 tasks

The same model, but different loading methods will result in very different inference speeds?

#2757 opened Nov 19, 2024 by hjs2027864933

2 of 4 tasks

Regression in 2.4.0 : Input Valdidation errors return code 200 and do not return the error message

#2749 opened Nov 15, 2024 by leonarddls

2 of 4 tasks

On-The-Fly Quantization for Inference appears not to be working as per documentation.

#2748 opened Nov 15, 2024 by colin-byrneireland1

1 of 4 tasks

Different inference results and speed between /generate and OpenAI endpoint

#2747 opened Nov 14, 2024 by jegork

2 of 4 tasks

CUDA OutOfMemory even after warmup phase succeeded

#2744 opened Nov 13, 2024 by martinigoyanes

Support for Falcon-Mamba-7B

#2736 opened Nov 10, 2024 by mokeddembillel

1 of 2 tasks

In dev mode, server is stuck at Server started at unix:///tmp/text-generation-server-0

#2735 opened Nov 10, 2024 by mokeddembillel

2 of 4 tasks

Failed to build vllm in local install

#2734 opened Nov 9, 2024 by mokeddembillel

2 of 4 tasks

Bi-gram Repetation Penalty for the TGI configuration

#2731 opened Nov 7, 2024 by mertege

launch TGI with the argument --max-input-tokens smaller than sliding_window=4096 (got here max_input_tokens=16384)

#2730 opened Nov 7, 2024 by ashwincv0112

1 of 4 tasks

device-side assert triggered when trying to use LLaMA 3.2 Vision with grammar

#2729 opened Nov 6, 2024 by SokolAnn

2 of 4 tasks

TGI crashes while loading Qwen2-VL-7B-Instruct

#2728 opened Nov 6, 2024 by ktobah

2 of 4 tasks

Unable to load/run LoRA Adapters on llama - 7B

#2727 opened Nov 5, 2024 by kaushikmitr

Python client: Pydantic protected namespace "model_"

#2722 opened Nov 4, 2024 by Simon-Stone

4 tasks

FlashLlamaForCausalLM's using name dense for its mlp submodule causes error when using LoRA adapter

#2715 opened Nov 2, 2024 by sadra-barikbin

detokenize

#2705 opened Oct 29, 2024 by oroojlooy

CUDA Error: No kernel image is available for execution on the device

#2703 opened Oct 28, 2024 by shubhamgajbhiye1994

2 of 4 tasks

Is there a way to defines "bad_words"?

#2700 opened Oct 28, 2024 by tonylek

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly