deepseek模型转换问题 #327

bao-xiaoyi · 2024-08-25T07:01:46Z

模型转换前后大小差别很大（30G->53G)，是存在什么问题吗

jerryli1981 · 2024-08-27T03:26:21Z

您好，具体的命令发下，我们复现下

bao-xiaoyi · 2024-08-28T08:27:23Z

您好，具体的命令发下，我们复现下

cd Pai-Megatron-Patch/toolkits/model_checkpoints_convertor/deepseek
bash hf2mcore_deepseek_v2_moe_convertor.sh
A2.4B
open_llm/DeepSeek-Coder-V2-Lite-Instruct
output_model/deekseek-coder-V2-lite-instruction-mg
2
1
4
false

aeeeeeep · 2024-08-30T13:51:13Z

因为 deepspeek 的开源权重是 bfloat16，huggingface load 进来会自动转换成 float32，占用空间就会 double，如果你想按原类型加载，需要将加载代码修改为如下，原类型是权重目录中 config.json 的 torch_dtype 值

        config = AutoConfig.from_pretrained(args.load)
        hf_model = AutoModelForCausalLM.from_pretrained(args.load, trust_remote_code=True, torch_dtype=config.torch_dtype)

或者添加 --bf16/--fp16 参数，但是我实验过，精度损失不小，不建议使用这个参数控制类型。

bao-xiaoyi · 2024-09-02T01:58:19Z

因为 deepspeek 的开源权重是 bfloat16，huggingface load 进来会自动转换成 float32，占用空间就会 double，如果你想按原类型加载，需要将加载代码修改为如下，原类型是权重目录中 config.json 的 torch_dtype 值
        config = AutoConfig.from_pretrained(args.load)
        hf_model = AutoModelForCausalLM.from_pretrained(args.load, trust_remote_code=True, torch_dtype=config.torch_dtype)
或者添加 --bf16/--fp16 参数，但是我实验过，精度损失不小，不建议使用这个参数控制类型。

感谢大佬回答，但是为什么原类型加载会有精度损失？

aeeeeeep · 2024-09-02T02:22:12Z

因为 deepspeek 的开源权重是 bfloat16，huggingface load 进来会自动转换成 float32，占用空间就会 double，如果你想按原类型加载，需要将加载代码修改为如下，原类型是权重目录中 config.json 的 torch_dtype 值
        config = AutoConfig.from_pretrained(args.load)
        hf_model = AutoModelForCausalLM.from_pretrained(args.load, trust_remote_code=True, torch_dtype=config.torch_dtype)
或者添加 --bf16/--fp16 参数，但是我实验过，精度损失不小，不建议使用这个参数控制类型。
感谢大佬回答，但是为什么原类型加载会有精度损失？

如果你加上了 --bf16/--fp16，在代码中实际是先 load 时自动转换为 fp32 类型，再在转换时执行 model.bfloat16()/float16()，会有精度损失，具体代码见
https://github.com/aeeeeeep/Pai-Megatron-Patch/blob/ad0b25d217df8ae1d6b0f67d860c8edaf7863e14/toolkits/model_checkpoints_convertor/deepseek/hf2mcore_deepseek_v2_moe.py#L219-L225

jerryli1981 · 2024-09-02T06:31:03Z

因为 deepspeek 的开源权重是 bfloat16，huggingface load 进来会自动转换成 float32，占用空间就会 double，如果你想按原类型加载，需要将加载代码修改为如下，原类型是权重目录中 config.json 的 torch_dtype 值
        config = AutoConfig.from_pretrained(args.load)
        hf_model = AutoModelForCausalLM.from_pretrained(args.load, trust_remote_code=True, torch_dtype=config.torch_dtype)
或者添加 --bf16/--fp16 参数，但是我实验过，精度损失不小，不建议使用这个参数控制类型。
感谢大佬回答，但是为什么原类型加载会有精度损失？
如果你加上了 --bf16/--fp16，在代码中实际是先 load 时自动转换为 fp32 类型，再在转换时执行 model.bfloat16()/float16()，会有精度损失，具体代码见 https://github.com/aeeeeeep/Pai-Megatron-Patch/blob/ad0b25d217df8ae1d6b0f67d860c8edaf7863e14/toolkits/model_checkpoints_convertor/deepseek/hf2mcore_deepseek_v2_moe.py#L219-L225

同学您好，方便进入钉钉，然后加下我们一起对焦一下这个问题吗？

tzyodear · 2024-09-04T01:20:10Z

我在转换qwen2-7B的时候也遇到了，hf的权重15G，megatron的权重36G。我起初以为是hf用了safetensor，但是发现转换的megatron权重不连续，不能直接用safetensor保存。然后如果转成连续，用torch.save保存空间也更小。所以感觉可能是转换的megatron 的权重tensor不连续的问题

aeeeeeep added a commit to aeeeeeep/Pai-Megatron-Patch that referenced this issue Aug 30, 2024

fix precision issues in convert script (see alibaba#327)

a70a23c

aeeeeeep added a commit to aeeeeeep/Pai-Megatron-Patch that referenced this issue Aug 30, 2024

fix precision issues in convert script (alibaba#327)

ad0b25d

jerryli1981 pushed a commit that referenced this issue Sep 4, 2024

fix precision issues in convert script (#327) (#331)

a1fa06f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepseek模型转换问题 #327

deepseek模型转换问题 #327

bao-xiaoyi commented Aug 25, 2024

jerryli1981 commented Aug 27, 2024

bao-xiaoyi commented Aug 28, 2024

aeeeeeep commented Aug 30, 2024

bao-xiaoyi commented Sep 2, 2024

aeeeeeep commented Sep 2, 2024

jerryli1981 commented Sep 2, 2024

tzyodear commented Sep 4, 2024

deepseek模型转换问题 #327

deepseek模型转换问题 #327

Comments

bao-xiaoyi commented Aug 25, 2024

jerryli1981 commented Aug 27, 2024

bao-xiaoyi commented Aug 28, 2024

aeeeeeep commented Aug 30, 2024

bao-xiaoyi commented Sep 2, 2024

aeeeeeep commented Sep 2, 2024

jerryli1981 commented Sep 2, 2024

tzyodear commented Sep 4, 2024