Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ipex llm upstream #33

Open
wants to merge 42 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
c3e0fcf
Merge pull request #4927 from oobabooga/dev
oobabooga Dec 15, 2023
443be39
Merge pull request #4937 from oobabooga/dev
oobabooga Dec 15, 2023
7be0983
Merge pull request #4961 from oobabooga/dev
oobabooga Dec 17, 2023
b28020a
Merge pull request #4980 from oobabooga/dev
oobabooga Dec 18, 2023
781367b
Merge pull request #4988 from oobabooga/dev
oobabooga Dec 19, 2023
71eb744
Merge pull request #5002 from oobabooga/dev
oobabooga Dec 19, 2023
5b791ca
Merge pull request #5005 from oobabooga/dev
oobabooga Dec 19, 2023
c1f78db
Merge pull request #5011 from oobabooga/dev
oobabooga Dec 20, 2023
489f4a2
Merge pull request #5012 from oobabooga/dev
oobabooga Dec 20, 2023
11288d1
Merge pull request #5022 from oobabooga/dev
oobabooga Dec 20, 2023
4b25acf
Merge pull request #5039 from oobabooga/dev
oobabooga Dec 21, 2023
af87609
Merge pull request #5073 from oobabooga/dev
oobabooga Dec 25, 2023
19d1374
Merge pull request #5078 from oobabooga/dev
oobabooga Dec 25, 2023
3fd7073
Merge pull request #5100 from oobabooga/dev
oobabooga Dec 27, 2023
3e3a66e
Merge pull request #5132 from oobabooga/dev
oobabooga Dec 31, 2023
3f28925
Merge pull request #5152 from oobabooga/dev
oobabooga Jan 2, 2024
c54d1da
Merge pull request #5163 from oobabooga/dev
oobabooga Jan 4, 2024
8ea3f31
Merge pull request #5181 from oobabooga/dev
oobabooga Jan 5, 2024
e169993
Merge pull request #5195 from oobabooga/dev
oobabooga Jan 7, 2024
ad1ff53
Merge pull request #5199 from oobabooga/dev
oobabooga Jan 7, 2024
2dc8db8
Merge pull request #5220 from oobabooga/dev
oobabooga Jan 10, 2024
61e4bfe
Merge pull request #5253 from oobabooga/dev
oobabooga Jan 14, 2024
d8c3a5b
Merge pull request #5266 from oobabooga/dev
oobabooga Jan 14, 2024
1343aa3
Merge pull request #5347 from oobabooga/dev
oobabooga Jan 22, 2024
837bd88
Merge pull request #5348 from oobabooga/dev
oobabooga Jan 22, 2024
e7a760e
Merge pull request #5379 from oobabooga/dev
oobabooga Jan 26, 2024
4f3fdf1
Merge pull request #5404 from oobabooga/dev
oobabooga Jan 30, 2024
a329db0
Merge pull request #5452 from oobabooga/dev
oobabooga Feb 6, 2024
0f134bf
Merge pull request #5453 from oobabooga/dev
oobabooga Feb 6, 2024
dc6adef
Merge pull request #5496 from oobabooga/dev
oobabooga Feb 14, 2024
771c592
Merge pull request #5502 from oobabooga/dev
oobabooga Feb 14, 2024
dd46229
Merge pull request #5530 from oobabooga/dev
oobabooga Feb 17, 2024
7838075
Merge pull request #5534 from oobabooga/dev
oobabooga Feb 17, 2024
d6bb6e7
Merge pull request #5549 from oobabooga/dev
oobabooga Feb 19, 2024
ba85271
Merge pull request #5574 from oobabooga/dev
oobabooga Feb 25, 2024
60f3d87
Merge pull request #5617 from oobabooga/dev
oobabooga Mar 3, 2024
c53bafe
Add bigdl-llm loader to bigdl-upstream (#17)
chtanch Mar 29, 2024
d017a8f
Update style for upstream requests (#27)
hkvision Apr 8, 2024
58f4be0
Fix load_in_4bit and load_in_low_bit not taking effect in UI (#28)
hkvision Apr 8, 2024
7168290
Add dependency to requirements (#32)
hkvision Apr 8, 2024
babe8dd
Merge branch 'dev' into ipex-llm-upstream
hkvision Apr 8, 2024
1608c01
Remove some arguments (#35)
hkvision Apr 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,13 @@ List of command-line flags
|-------------|-------------|
| `--hqq-backend` | Backend for the HQQ loader. Valid options: PYTORCH, PYTORCH_COMPILE, ATEN. |

#### IPEX-LLM

| Flag | Description |
|---------------------------------------|-------------|
| `--load-in-4bit` | Load the model to symmetric int4 precision with ipex-llm optimizations. |
| `--trust-remote-code` | Set `trust_remote_code=True` while loading the model. Necessary for some models. |

#### DeepSpeed

| Flag | Description |
Expand Down
5 changes: 5 additions & 0 deletions modules/loaders.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,10 @@
'hqq_backend',
'trust_remote_code',
'no_use_fast',
],
'IPEX-LLM': [
'load_in_4bit',
'trust_remote_code',
]
})

Expand Down Expand Up @@ -203,6 +207,7 @@ def transformers_samplers():
'AutoAWQ': transformers_samplers(),
'QuIP#': transformers_samplers(),
'HQQ': transformers_samplers(),
'IPEX-LLM': transformers_samplers(),
'ExLlamav2': {
'temperature',
'temperature_last',
Expand Down
30 changes: 30 additions & 0 deletions modules/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ def load_model(model_name, loader=None):
'AutoAWQ': AutoAWQ_loader,
'QuIP#': QuipSharp_loader,
'HQQ': HQQ_loader,
'IPEX-LLM': ipex_llm_loader,
}

metadata = get_model_metadata(model_name)
Expand Down Expand Up @@ -376,6 +377,35 @@ def HQQ_loader(model_name):
return model


def ipex_llm_loader(model_name):

from ipex_llm.transformers import AutoModelForCausalLM, AutoModel, AutoModelForSeq2SeqLM

path_to_model = Path(f'{shared.args.model_dir}/{model_name}')

config = AutoConfig.from_pretrained(path_to_model, trust_remote_code=shared.args.trust_remote_code)

if 'chatglm' in model_name.lower():
LoaderClass = AutoModel
else:
if config.to_dict().get('is_encoder_decoder', False):
LoaderClass = AutoModelForSeq2SeqLM
shared.is_seq2seq = True
else:
LoaderClass = AutoModelForCausalLM

model = LoaderClass.from_pretrained(
path_to_model,
load_in_4bit=shared.args.load_in_4bit,
optimize_model=True,
trust_remote_code=shared.args.trust_remote_code,
use_cache=True)

tokenizer = AutoTokenizer.from_pretrained(path_to_model, trust_remote_code=shared.args.trust_remote_code)

return model, tokenizer


def get_max_memory_dict():
max_memory = {}
max_cpu_memory = shared.args.cpu_memory.strip() if shared.args.cpu_memory is not None else '99GiB'
Expand Down
8 changes: 7 additions & 1 deletion modules/shared.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@

# bitsandbytes 4-bit
group = parser.add_argument_group('bitsandbytes 4-bit')
group.add_argument('--load-in-4bit', action='store_true', help='Load the model with 4-bit precision (using bitsandbytes).')
group.add_argument('--load-in-4bit', action='store_true', help='Load the model with 4-bit precision (using bitsandbytes or ipex-llm).')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"using bitsandbytes or ipex-llm" is a bit confusing. Should we tell user in which case to use bitesandbytes and in which case ipex-llm?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No which case? If one is using bitsandbytes or ipex-llm, this argument will both take effect.

group.add_argument('--use_double_quant', action='store_true', help='use_double_quant for 4-bit.')
group.add_argument('--compute_dtype', type=str, default='float16', help='compute dtype for 4-bit. Valid options: bfloat16, float16, float32.')
group.add_argument('--quant_type', type=str, default='nf4', help='quant_type for 4-bit. Valid options: nf4, fp4.')
Expand Down Expand Up @@ -165,6 +165,10 @@
group = parser.add_argument_group('HQQ')
group.add_argument('--hqq-backend', type=str, default='PYTORCH_COMPILE', help='Backend for the HQQ loader. Valid options: PYTORCH, PYTORCH_COMPILE, ATEN.')

# IPEX-LLM
# --load-in-4bit is the same as bitsandbytes 4-bit
# --trust-remote-code is the same as Transformers

# DeepSpeed
group = parser.add_argument_group('DeepSpeed')
group.add_argument('--deepspeed', action='store_true', help='Enable the use of DeepSpeed ZeRO-3 for inference via the Transformers integration.')
Expand Down Expand Up @@ -263,6 +267,8 @@ def fix_loader_name(name):
return 'QuIP#'
elif name in ['hqq']:
return 'HQQ'
elif name in ['IPEX-LLM', 'ipex-llm']:
return 'IPEX-LLM'


def add_extension(name, last=False):
Expand Down
2 changes: 2 additions & 0 deletions modules/text_generation.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,8 @@ def encode(prompt, add_special_tokens=True, add_bos_token=True, truncation_lengt
return input_ids
elif shared.args.deepspeed:
return input_ids.to(device=local_rank)
elif shared.args.loader == 'IPEX-LLM':
return input_ids
elif torch.backends.mps.is_available():
device = torch.device('mps')
return input_ids.to(device)
Expand Down
2 changes: 1 addition & 1 deletion modules/ui_model_menu.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ def create_ui():

with gr.Column():
shared.gradio['load_in_8bit'] = gr.Checkbox(label="load-in-8bit", value=shared.args.load_in_8bit)
shared.gradio['load_in_4bit'] = gr.Checkbox(label="load-in-4bit", value=shared.args.load_in_4bit)
shared.gradio['load_in_4bit'] = gr.Checkbox(label="load-in-4bit", value=shared.args.load_in_4bit, info="Load the model with 4-bit precision.")
shared.gradio['use_double_quant'] = gr.Checkbox(label="use_double_quant", value=shared.args.use_double_quant)
shared.gradio['use_flash_attention_2'] = gr.Checkbox(label="use_flash_attention_2", value=shared.args.use_flash_attention_2, info='Set use_flash_attention_2=True while loading the model.')
shared.gradio['auto_devices'] = gr.Checkbox(label="auto-devices", value=shared.args.auto_devices)
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ tensorboard
transformers==4.39.*
tqdm
wandb
py-cpuinfo
ipex-llm

# API
SpeechRecognition==3.10.0
Expand Down
2 changes: 2 additions & 0 deletions requirements_cpu_only.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ tensorboard
transformers==4.39.*
tqdm
wandb
py-cpuinfo
ipex-llm

# API
SpeechRecognition==3.10.0
Expand Down
2 changes: 2 additions & 0 deletions requirements_cpu_only_noavx2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ tensorboard
transformers==4.39.*
tqdm
wandb
py-cpuinfo
ipex-llm

# API
SpeechRecognition==3.10.0
Expand Down
2 changes: 2 additions & 0 deletions requirements_noavx2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ tensorboard
transformers==4.39.*
tqdm
wandb
py-cpuinfo
ipex-llm

# API
SpeechRecognition==3.10.0
Expand Down
2 changes: 2 additions & 0 deletions requirements_nowheels.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ tensorboard
transformers==4.39.*
tqdm
wandb
py-cpuinfo
ipex-llm

# API
SpeechRecognition==3.10.0
Expand Down