You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But the output seems to be quant_storage_dtype = torch.bfloat16.
When i then try to merge LoRA weights with the model later:
from peft import AutoPeftModelForCausalLM
import torch
# # Load PEFT model on CPU
model = AutoPeftModelForCausalLM.from_pretrained(
'/my-checkpoint-40',
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
)
# Merge LoRA and base model and save
merged_model = model.merge_and_unload()
# Double-check if quantization is still effective
for name, param in merged_model.named_parameters():
print(name, param.dtype, param.shape) # This will show the dtype and shape of each parameter
Each layer is stored as quant_storage_dtype = torch.bfloat16.
The size of the safetensors added together for the fine-tuned model is 118 GB, and llama3-70b size is 127 GB. Meaning a ~7% reduction in the fine-tuned model.
Maybe a weird question but -> Is this model quantized? Is it semi-quantized? Should I quantize it further to reduce the size even more? (Need a smaller model because of the GPUs I have)
The quant_storage_dtype = torch.bfloat16 confuses me a bit.
The text was updated successfully, but these errors were encountered:
Using this code in order to fine-tune llama3 70b on AWS GPUs.
Here we use BitsAndBytesConfig to quantize the model weights and load them as float 4.
But the output seems to be quant_storage_dtype = torch.bfloat16.
When i then try to merge LoRA weights with the model later:
Each layer is stored as quant_storage_dtype = torch.bfloat16.
The size of the safetensors added together for the fine-tuned model is 118 GB, and llama3-70b size is 127 GB. Meaning a ~7% reduction in the fine-tuned model.
Maybe a weird question but -> Is this model quantized? Is it semi-quantized? Should I quantize it further to reduce the size even more? (Need a smaller model because of the GPUs I have)
The quant_storage_dtype = torch.bfloat16 confuses me a bit.
The text was updated successfully, but these errors were encountered: