diff --git a/README.md b/README.md
index 0bbe317..3aa8f48 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-[**🇨🇳中文**](./README.md) | [**🌐English**](./README_EN.md) | [**📖文档/Docs**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki) | [**❓提问/Issues**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/issues) | [**💬讨论/Discussions**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/discussions) | [**⚔️竞技场/Arena**](http://chinese-alpaca-arena.ymcui.com/)
+[**🇨🇳中文**](./README.md) | [**🌐English**](./README_EN.md) | [**📖文档/Docs**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki) | [**❓提问/Issues**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/issues) | [**💬讨论/Discussions**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/discussions) | [**⚔️竞技场/Arena**](http://llm-arena.ymcui.com/)
@@ -13,7 +13,7 @@
-本项目基于Meta发布的可商用大模型[Llama-2](https://github.com/facebookresearch/llama)开发,是[中文LLaMA&Alpaca大模型](https://github.com/ymcui/Chinese-LLaMA-Alpaca)的第二期项目,开源了**中文LLaMA-2基座模型和Alpaca-2指令精调大模型**。这些模型**在原版Llama-2的基础上扩充并优化了中文词表**,使用了大规模中文数据进行增量预训练,进一步提升了中文基础语义和指令理解能力,相比一代相关模型获得了显著性能提升。相关模型**支持4K上下文并可通过NTK方法最高扩展至18K+。**
+本项目基于Meta发布的可商用大模型[Llama-2](https://github.com/facebookresearch/llama)开发,是[中文LLaMA&Alpaca大模型](https://github.com/ymcui/Chinese-LLaMA-Alpaca)的第二期项目,开源了**中文LLaMA-2基座模型和Alpaca-2指令精调大模型**。这些模型**在原版Llama-2的基础上扩充并优化了中文词表**,使用了大规模中文数据进行增量预训练,进一步提升了中文基础语义和指令理解能力,相比一代相关模型获得了显著性能提升。相关模型**支持FlashAttention-2训练**,**支持4K上下文并可通过NTK方法最高扩展至18K+。**
**本项目主要内容:**
@@ -21,18 +21,20 @@
- 🚀 开源了预训练脚本、指令精调脚本,用户可根据需要进一步训练模型
- 🚀 使用个人电脑的CPU/GPU快速在本地进行大模型量化和部署体验
- 🚀 支持[🤗transformers](https://github.com/huggingface/transformers), [llama.cpp](https://github.com/ggerganov/llama.cpp), [text-generation-webui](https://github.com/oobabooga/text-generation-webui), [LangChain](https://github.com/hwchase17/langchain), [privateGPT](https://github.com/imartinez/privateGPT), [vLLM](https://github.com/vllm-project/vllm)等LLaMA生态
-- 目前已开源的模型:Chinese-LLaMA-2-7B, Chinese-Alpaca-2-7B (更大的模型可先参考[一期项目](https://github.com/ymcui/Chinese-LLaMA-Alpaca))
+- 目前已开源的模型:Chinese-LLaMA-2(7B/13B), Chinese-Alpaca-2(7B/13B)(更大的模型可先参考[一期项目](https://github.com/ymcui/Chinese-LLaMA-Alpaca))
![](./pics/screencast.gif)
----
-[多模态中文LLaMA&Alpaca大模型](https://github.com/airaria/Visual-Chinese-LLaMA-Alpaca) | [多模态VLE](https://github.com/iflytek/VLE) | [中文MiniRBT](https://github.com/iflytek/MiniRBT) | [中文LERT](https://github.com/ymcui/LERT) | [中英文PERT](https://github.com/ymcui/PERT) | [中文MacBERT](https://github.com/ymcui/MacBERT) | [中文ELECTRA](https://github.com/ymcui/Chinese-ELECTRA) | [中文XLNet](https://github.com/ymcui/Chinese-XLNet) | [中文BERT](https://github.com/ymcui/Chinese-BERT-wwm) | [知识蒸馏工具TextBrewer](https://github.com/airaria/TextBrewer) | [模型裁剪工具TextPruner](https://github.com/airaria/TextPruner) | [蒸馏裁剪一体化GRAIN](https://github.com/airaria/GRAIN)
+[中文LLaMA&Alpaca大模型](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | [多模态中文LLaMA&Alpaca大模型](https://github.com/airaria/Visual-Chinese-LLaMA-Alpaca) | [多模态VLE](https://github.com/iflytek/VLE) | [中文MiniRBT](https://github.com/iflytek/MiniRBT) | [中文LERT](https://github.com/ymcui/LERT) | [中英文PERT](https://github.com/ymcui/PERT) | [中文MacBERT](https://github.com/ymcui/MacBERT) | [中文ELECTRA](https://github.com/ymcui/Chinese-ELECTRA) | [中文XLNet](https://github.com/ymcui/Chinese-XLNet) | [中文BERT](https://github.com/ymcui/Chinese-BERT-wwm) | [知识蒸馏工具TextBrewer](https://github.com/airaria/TextBrewer) | [模型裁剪工具TextPruner](https://github.com/airaria/TextPruner) | [蒸馏裁剪一体化GRAIN](https://github.com/airaria/GRAIN)
## 新闻
-**[2023/08/02] 添加FlashAttention-2训练支持,基于vLLM的推理加速支持,提供长回复系统提示语模板等。详情查看[📚 v1.1版本发布日志](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/releases/tag/v1.1)**
+**[2023/08/14] 发布Chinese-LLaMA-2-13B和Chinese-Alpaca-2-13B,添加text-generation-webui/LangChain/privateGPT支持,添加CFG Sampling解码方法等。详情查看[📚 v2.0版本发布日志](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/releases/tag/v2.0)**
+
+[2023/08/02] 添加FlashAttention-2训练支持,基于vLLM的推理加速支持,提供长回复系统提示语模板等。详情查看[📚 v1.1版本发布日志](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/releases/tag/v1.1)
[2023/07/31] 正式发布Chinese-LLaMA-2-7B(基座模型),使用120G中文语料增量训练(与一代Plus系列相同);进一步通过5M条指令数据精调(相比一代略微增加),得到Chinese-Alpaca-2-7B(指令/chat模型)。详情查看[📚 v1.0版本发布日志](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/releases/tag/v1.0)
@@ -54,24 +56,24 @@
本项目推出了基于Llama-2的中文LLaMA-2以及Alpaca-2系列模型,相比[一期项目](https://github.com/ymcui/Chinese-LLaMA-Alpaca)其主要特点如下:
-**📖 经过优化的中文词表**
+#### 📖 经过优化的中文词表
-- 在[一期项目](https://github.com/ymcui/Chinese-LLaMA-Alpaca)中,我们针对一代LLaMA模型的32K词表扩展了中文字词(LLaMA:49953,Alpaca:49954),以期进一步提升模型对中文文本的编解码效率
-- 在本项目中,我们**重新设计了新词表**(大小:55296),进一步提升了中文字词的覆盖程度,同时统一了LLaMA/Alpaca的词表,避免了因混用词表带来的问题
+- 在[一期项目](https://github.com/ymcui/Chinese-LLaMA-Alpaca)中,我们针对一代LLaMA模型的32K词表扩展了中文字词(LLaMA:49953,Alpaca:49954)
+- 在本项目中,我们**重新设计了新词表**(大小:55296),进一步提升了中文字词的覆盖程度,同时统一了LLaMA/Alpaca的词表,避免了因混用词表带来的问题,以期进一步提升模型对中文文本的编解码效率
-**⚡ 基于FlashAttention-2的高效注意力**
+#### ⚡ 基于FlashAttention-2的高效注意力
- [FlashAttention-2](https://github.com/Dao-AILab/flash-attention)是高效注意力机制的一种实现,相比其一代技术具有**更快的速度和更优化的显存占用**
- 当上下文长度更长时,为了避免显存爆炸式的增长,使用此类高效注意力技术尤为重要
- 本项目的所有模型均使用了FlashAttention-2技术进行训练
-**🚄 基于NTK的自适应上下文扩展技术**
+#### 🚄 基于NTK的自适应上下文扩展技术
- 在[一期项目](https://github.com/ymcui/Chinese-LLaMA-Alpaca)中,我们实现了[基于NTK的上下文扩展技术](https://github.com/ymcui/Chinese-LLaMA-Alpaca/pull/743),可在不继续训练模型的情况下支持更长的上下文
- 在上述基础上,我们进一步设计了**方便的自适应经验公式**,无需针对不同的上下文长度设置相应超参
- 本项目模型原生支持4K上下文,利用上述技术可扩展至12K,并最高支持扩展至18K+(精度有一定损失)
-**🤖 简化的中英双语系统提示语**
+#### 🤖 简化的中英双语系统提示语
- 在[一期项目](https://github.com/ymcui/Chinese-LLaMA-Alpaca)中,中文Alpaca系列模型使用了[Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)的指令模板和系统提示语
- 初步实验发现,Llama-2-Chat系列模型的默认系统提示语未能带来统计显著的性能提升,且其内容过于冗长
@@ -82,48 +84,59 @@
### 模型选择指引
-下面是中文LLaMA-2和Alpaca-2模型的基本对比以及建议使用场景。
+下面是中文LLaMA-2和Alpaca-2模型的基本对比以及建议使用场景。**如需和模型聊天交互,请选择Alpaca而不是LLaMA。**
| 对比项 | 中文LLaMA-2 | 中文Alpaca-2 |
-| :-------------------- | :----------------------------------------------------- | :----------------------------------------------------------- |
-| 训练方式 | 传统CLM | 指令精调 |
+| :-------------------- | :----------------------------------------------------: | :----------------------------------------------------------: |
| 模型类型 | **基座模型** | **指令/Chat模型(类ChatGPT)** |
+| 已开源大小 | 7B、13B | 7B、13B |
+| 训练类型 | Causal-LM (CLM) | 指令精调 |
+| 训练方式 | LoRA + 全量emb/lm-head | LoRA + 全量emb/lm-head |
+| 基于什么模型训练 | [原版Llama-2](https://github.com/facebookresearch/llama) | 中文LLaMA-2 |
| 训练语料 | 无标注通用语料 | 有标注指令数据 |
| 词表大小[1] | 55,296 | 55,296 |
-| 输入模板 | 不需要 | 需要套用特定模板[2],类似Llama-2-Chat |
+| 上下文长度[2] | 4K (12K-18K) | 4K (12K-18K) |
+| 输入模板 | 不需要 | 需要套用特定模板[3],类似Llama-2-Chat |
| 适用场景 | 文本续写:给定上文,让模型生成下文 | 指令理解:问答、写作、聊天、交互等 |
| 不适用场景 | 指令理解 、多轮聊天等 | 文本无限制自由生成 |
-[1] *本项目一代模型和二代模型的词表不同,请勿混用。二代LLaMA和Alpaca的词表相同。*
-[2] *Alpaca-2采用了Llama-2-chat系列模板(格式相同,提示语不同),而不是一代Alpaca的模板,请勿混用。*
+> [!NOTE]
+> [1] *本项目一代模型和二代模型的词表不同,请勿混用。二代LLaMA和Alpaca的词表相同。*
+> [2] *括号内表示基于NTK上下文扩展支持的最大长度。*
+> [3] *Alpaca-2采用了Llama-2-chat系列模板(格式相同,提示语不同),而不是一代Alpaca的模板,请勿混用。*
### 完整模型下载
以下是完整版模型,直接下载即可使用,无需其他合并步骤。推荐网络带宽充足的用户。
| 模型名称 | 类型 | 训练数据 | 大小 | 下载地址 |
-| :------------------------ | :------: | :------: | :----------------: | :----------------------------------------------------------: |
-| Chinese-LLaMA-2-7B | 基座模型 | 120G通用文本 | 13GB | [[百度网盘]](https://pan.baidu.com/s/1E5NI3nlQpx1j8z3eIzbIlg?pwd=n8k3)
[[Google Drive]](https://drive.google.com/drive/folders/18pp4I-mvQxRA7b8vF9gP-2cH_ocnXVKh?usp=share_link)
[[HuggingFace]](https://huggingface.co/ziqingyang/chinese-llama-2-7b) |
-| Chinese-Alpaca-2-7B | 指令模型 | 5M条指令 | 13GB | [[百度网盘]](https://pan.baidu.com/s/1wxx-CdgbMupXVRBcaN4Slw?pwd=kpn9)
[[Google Drive]](https://drive.google.com/drive/folders/1JsJDVs7tE2y31PBNleBlDPsB7S0ZrY8d?usp=share_link)
[[HuggingFace]](https://huggingface.co/ziqingyang/chinese-alpaca-2-7b) |
+| :------------------------ | :------: | :------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
+| Chinese-LLaMA-2-13B 🆕 | 基座模型 | 120GB通用文本 | 24.7 GB | [[百度]](https://pan.baidu.com/s/1T3RqEUSmyg6ZuBwMhwSmoQ?pwd=e9qy) [[Google]](https://drive.google.com/drive/folders/1YNa5qJ0x59OEOI7tNODxea-1YvMPoH05?usp=share_link) [[🤗HF]](https://huggingface.co/ziqingyang/chinese-llama-2-13b) |
+| Chinese-LLaMA-2-7B | 基座模型 | 120GB通用文本 | 12.9 GB | [[百度]](https://pan.baidu.com/s/1E5NI3nlQpx1j8z3eIzbIlg?pwd=n8k3) [[Google]](https://drive.google.com/drive/folders/18pp4I-mvQxRA7b8vF9gP-2cH_ocnXVKh?usp=share_link) [[🤗HF]](https://huggingface.co/ziqingyang/chinese-llama-2-7b) |
+| Chinese-Alpaca-2-13B 🆕 | 指令模型 | 5M条指令 | 24.7 GB | [[百度]](https://pan.baidu.com/s/1MT_Zlap1OtdYMgoBNTS3dg?pwd=9xja) [[Google]](https://drive.google.com/drive/folders/1MTsKlzR61xmbTR4hBWzQas_MOpUZsogN?usp=share_link) [[🤗HF]](https://huggingface.co/ziqingyang/chinese-alpaca-2-13b) |
+| Chinese-Alpaca-2-7B | 指令模型 | 5M条指令 | 12.9 GB | [[百度]](https://pan.baidu.com/s/1wxx-CdgbMupXVRBcaN4Slw?pwd=kpn9) [[Google]](https://drive.google.com/drive/folders/1JsJDVs7tE2y31PBNleBlDPsB7S0ZrY8d?usp=share_link) [[🤗HF]](https://huggingface.co/ziqingyang/chinese-alpaca-2-7b) |
### LoRA模型下载
-以下是LoRA模型,与上述完整模型一一对应。需要注意的是**LoRA模型无法直接使用**,必须按照教程与重构模型进行合并。推荐网络带宽不足,手头有原版Llama-2且需要轻量下载的用户。
+以下是LoRA模型(含emb/lm-head),与上述完整模型一一对应。需要注意的是**LoRA模型无法直接使用**,必须按照教程与重构模型进行合并。推荐网络带宽不足,手头有原版Llama-2且需要轻量下载的用户。
| 模型名称 | 类型 | 训练数据 | 重构模型 | 大小 | LoRA下载地址 |
| :------------------------ | :------: | :------: | :--------------------------------------------------------: | :----------------: | :----------------------------------------------------------: |
-| Chinese-LLaMA-2-LoRA-7B | 基座模型 | 120G通用文本 | [Llama-2-7B-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 1.1GB | [[百度网盘]](https://pan.baidu.com/s/1bmgqdyRh9E3a2uqOGyNqiQ?pwd=7kvq)
[[Google Drive]](https://drive.google.com/file/d/1njJGSU_PRbzjYRNw5RSbC5-4fBOXTVY3/view?usp=share_link)
[[HuggingFace]](https://huggingface.co/ziqingyang/chinese-llama-2-lora-7b) |
-| Chinese-Alpaca-2-LoRA-7B | 指令模型 | 5M条指令 | [Llama-2-7B-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 1.1GB | [[百度网盘]](https://pan.baidu.com/s/1g0olPxkB_rlZ9UUVfOnbcw?pwd=5e7w)
[[Google Drive]](https://drive.google.com/file/d/1MzJL-ZIzdJW7MIcAiYIDIDJ5dlMi8Kkk/view?usp=share_link)
[[HuggingFace]](https://huggingface.co/ziqingyang/chinese-alpaca-2-lora-7b) |
-
-由于LoRA模型无法单独使用,必须与原版Llama-2进行合并才能转为完整模型,以便进行模型推理、量化或者进一步训练。请选择以下方法对模型进行转换合并。
+| Chinese-LLaMA-2-LoRA-13B 🆕 | 基座模型 | 120GB通用文本 | [Llama-2-13B-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) | 1.5 GB | [[百度]](https://pan.baidu.com/s/1PFKTBn54GjAjzWeQISKruw?pwd=we6s) [[Google]](https://drive.google.com/file/d/10Z_k9A9N9D_6RHrMTmbHQRCuI6s1iMb1/view?usp=share_link) [[🤗HF]](https://huggingface.co/ziqingyang/chinese-llama-2-lora-13b) |
+| Chinese-LLaMA-2-LoRA-7B | 基座模型 | 120GB通用文本 | [Llama-2-7B-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 1.1 GB | [[百度]](https://pan.baidu.com/s/1bmgqdyRh9E3a2uqOGyNqiQ?pwd=7kvq) [[Google]](https://drive.google.com/file/d/1njJGSU_PRbzjYRNw5RSbC5-4fBOXTVY3/view?usp=share_link) [[🤗HF]](https://huggingface.co/ziqingyang/chinese-llama-2-lora-7b) |
+| Chinese-Alpaca-2-LoRA-13B 🆕 | 指令模型 | 5M条指令 | [Llama-2-13B-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) | 1.5 GB | [[百度]](https://pan.baidu.com/s/1Y5giIXOUUzI4Na6JOcviVA?pwd=tc2j) [[Google]](https://drive.google.com/file/d/1z2FIInsYJBTXipgztc-Mv7kkeqscx442/view?usp=share_link) [[🤗HF]](https://huggingface.co/ziqingyang/chinese-alpaca-2-lora-13b) |
+| Chinese-Alpaca-2-LoRA-7B | 指令模型 | 5M条指令 | [Llama-2-7B-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 1.1 GB | [[百度]](https://pan.baidu.com/s/1g0olPxkB_rlZ9UUVfOnbcw?pwd=5e7w) [[Google]](https://drive.google.com/file/d/1MzJL-ZIzdJW7MIcAiYIDIDJ5dlMi8Kkk/view?usp=share_link) [[🤗HF]](https://huggingface.co/ziqingyang/chinese-alpaca-2-lora-7b) |
-- [**在线转换**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/online_conversion_zh):Colab用户可利用本项目提供的notebook进行在线转换并量化模型
-- [**手动转换**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/manual_conversion_zh):离线方式转换,生成不同格式的模型,以便进行量化或进一步精调
+> [!IMPORTANT]
+> LoRA模型无法单独使用,必须与原版Llama-2进行合并才能转为完整模型。请通过以下方法对模型进行合并。
+>
+> - [**在线转换**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/online_conversion_zh):Colab用户可利用本项目提供的notebook进行在线转换并量化模型
+> - [**手动转换**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/manual_conversion_zh):离线方式转换,生成不同格式的模型,以便进行量化或进一步精调
## 推理与部署
-本项目中的相关模型主要支持以下量化、推理和部署方式。
+本项目中的相关模型主要支持以下量化、推理和部署方式,具体内容请参考对应教程。
| 工具 | 特点 | CPU | GPU | 量化 | GUI | API | vLLM | 教程 |
| :----------------------------------------------------------- | ---------------------------- | :--: | :--: | :--: | :--: | :--: | :--: | :----------------------------------------------------------: |
@@ -131,49 +144,61 @@
| [**🤗Transformers**](https://github.com/huggingface/transformers) | 原生transformers推理接口 | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/inference_with_transformers_zh) |
| [**Colab Demo**](https://colab.research.google.com/drive/1yu0eZ3a66by8Zqm883LLtRQrguBAb9MR?usp=sharing) | 在Colab中启动交互界面 | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | [link](https://colab.research.google.com/drive/1yu0eZ3a66by8Zqm883LLtRQrguBAb9MR?usp=sharing) |
| [**仿OpenAI API调用**](https://platform.openai.com/docs/api-reference) | 仿OpenAI API接口的服务器Demo | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/api_calls_zh) |
-| [**text-generation-webui**](https://github.com/oobabooga/text-generation-webui) | 前端Web UI界面的部署方式 | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/text-generation-webui_zh) |
+| [**text-generation-webui**](https://github.com/oobabooga/text-generation-webui) | 前端Web UI界面的部署方式 | ✅ | ✅ | ✅ | ✅ | ✅† | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/text-generation-webui_zh) |
| [**LangChain**](https://github.com/hwchase17/langchain) | 适合二次开发的大模型应用开源框架 | ✅† | ✅ | ✅† | ❌ | ❌ | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/langchain_zh) |
| [**privateGPT**](https://github.com/imartinez/privateGPT) | 基于LangChain的多文档本地问答框架 | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/privategpt_zh) |
-†: LangChain框架支持,但教程中未实现;详细说明请参考LangChain官方文档。
+> [!NOTE]
+> † 工具支持该特性,但教程中未实现;详细说明请参考对应官方文档。
## 系统效果
+为了评测相关模型的效果,本项目分别进行了生成效果评测和客观效果评测(NLU类),从不同角度对大模型进行评估。需要注意的是,综合评估大模型能力仍然是亟待解决的重要课题,单个数据集的结果并不能综合评估模型性能。推荐用户在自己关注的任务上进行测试,选择适配相关任务的模型。
+
### 生成效果评测
为了更加直观地了解模型的生成效果,本项目仿照[Fastchat Chatbot Arena](https://chat.lmsys.org/?arena)推出了模型在线对战平台,可浏览和评测模型回复质量。对战平台提供了胜率、Elo评分等评测指标,并且可以查看两两模型的对战胜率等结果。题库来自于[一期项目人工制作的200题](https://github.com/ymcui/Chinese-LLaMA-Alpaca/tree/main/examples/f16-p7b-p13b-33b),以及在此基础上额外增加的题目。生成回复具有随机性,受解码超参、随机种子等因素影响,因此相关评测并非绝对严谨,结果仅供晾晒参考,欢迎自行体验。部分生成样例请查看[examples目录](./examples)。
-测试模型包括:
-
-- [**一期模型**](https://github.com/ymcui/Chinese-LLaMA-Alpaca):Chinese-Alpaca-Pro系列(7B/13B/33B)、Chinese-Alpaca-Plus系列(7B/13B/33B)
-- [**二期模型(本项目)**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2):Chinese-Alpaca-2(7B)
+**⚔️ 模型竞技场:[http://llm-arena.ymcui.com](http://llm-arena.ymcui.com/)**
-**📊 模型在线对战**:[http://chinese-alpaca-arena.ymcui.com](http://chinese-alpaca-arena.ymcui.com/)
+| 系统 | 对战胜率(无平局) ↓ | Elo评分 |
+| ------------------------------------------------------------ | :----------------: | :-----: |
+| [Alpaca-Pro-33B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 68.98% | 1584.23 |
+| [Alpaca-Pro-7B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 66.38% | 1626.87 |
+| **Alpaca-2-7B** | 66.24% | 1541.09 |
+| [Alpaca-Pro-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 65.94% | 1518.04 |
+| [Alpaca-Plus-33B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 34.09% | 1475.68 |
+| [Alpaca-Plus-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 25.79% | 1411.07 |
+| [Alpaca-Plus-7B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 22.13% | 1343.01 |
-### 客观效果评测
+> [!NOTE]
+> 以上结果截至2023年8月11日。最新结果请进入[**⚔️竞技场**](http://llm-arena.ymcui.com/)进行查看。
-本项目还在“NLU”类客观评测集合上对相关模型进行了测试。这类评测的结果不具有主观性,只需要输出给定标签(需要设计标签mapping策略),因此可以评测大模型的部分NLU能力。本项目在[C-Eval评测数据集](https://cevalbenchmark.com)上测试了相关模型效果,其中验证集包含1.3K个选择题,测试集包含12.3K个选择题,涵盖52个学科。从以下结果可以看出本项目推出的模型相比一期模型具有显著性能优势,甚至在大部分指标上超越了之前的Plus-13B系列模型。
-LLaMA系列模型之间对比:
+### 客观效果评测:C-Eval
-| 模型 | Valid (zero-shot) | Valid (5-shot) | Test (zero-shot) | Test (5-shot) |
-| ---------------------- | :---------------: | :------------: | :--------------: | :-----------: |
-| **Chinese-LLaMA-2-7B** | **28.2** | **36.0** | **30.3** | **34.2** |
-| Chinese-LLaMA-Plus-13B | 27.3 | 34.0 | 27.8 | 33.3 |
-| Chinese-LLaMA-Plus-7B | 27.3 | 28.3 | 26.9 | 28.4 |
+[C-Eval](https://cevalbenchmark.com)是一个全面的中文基础模型评估套件,其中验证集包含1.3K个选择题,测试集包含12.3K个选择题,涵盖52个学科,题目类型为选择题。实验结果以“zero-shot / 5-shot”进行呈现。C-Eval推理代码请参考本项目 [📖GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/ceval_zh)
-Alpaca系列模型之间对比:
+| LLaMA Models | Valid | Test | Alpaca Models | Valid | Test |
+| ----------------------- | :---------: | :---------: | ------------------------ | :---------: | :---------: |
+| **Chinese-LLaMA-2-13B** | 40.6 / 42.7 | 38.0 / 41.6 | **Chinese-Alpaca-2-13B** | 44.3 / 45.9 | 42.6 / 44.0 |
+| **Chinese-LLaMA-2-7B** | 28.2 / 36.0 | 30.3 / 34.2 | **Chinese-Alpaca-2-7B** | 41.3 / 42.9 | 40.3 / 39.5 |
+| Chinese-LLaMA-Plus-33B | 37.4 / 40.0 | 35.7 / 38.3 | Chinese-Alpaca-Plus-33B | 46.5 / 46.3 | 44.9 / 43.5 |
+| Chinese-LLaMA-Plus-13B | 27.3 / 34.0 | 27.8 / 33.3 | Chinese-Alpaca-Plus-13B | 43.3 / 42.4 | 41.5 / 39.9 |
+| Chinese-LLaMA-Plus-7B | 27.3 / 28.3 | 26.9 / 28.4 | Chinese-Alpaca-Plus-7B | 36.7 / 32.9 | 36.4 / 32.3 |
-| 模型 | Valid (zero-shot) | Valid (5-shot) | Test (zero-shot) | Test (5-shot) |
-| ----------------------- | :---------------: | :------------: | :--------------: | :-----------: |
-| **Chinese-Alpaca-2-7B** | 41.3 | **42.9** | 40.3 | 39.5 |
-| Chinese-Alpaca-Plus-13B | **43.3** | 42.4 | **41.5** | **39.9** |
-| Chinese-Alpaca-Plus-7B | 36.7 | 32.9 | 36.4 | 32.3 |
+### 客观效果评测:CMMLU
-需要注意的是,综合评估大模型能力仍然是亟待解决的重要课题,单个数据集的结果并不能综合评估模型性能。合理辩证地看待大模型相关评测结果有助于大模型技术的良性发展。推荐用户在自己关注的任务上进行测试,选择适配相关任务的模型。
+[CMMLU](https://github.com/haonan-li/CMMLU)是另一个综合性中文评测数据集,专门用于评估语言模型在中文语境下的知识和推理能力,涵盖了从基础学科到高级专业水平的67个主题,共计11.5K个测试样例,题目类型为选择题。CMMLU推理代码请参考本项目 [📖GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/cmmlu_zh)
-C-Eval推理代码请参考本项目 >>> [📚 GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/ceval_zh)
+| LLaMA Models | Test (0/few-shot) | Alpaca Models | Test (0/few-shot) |
+| ----------------------- | :---------------: | ------------------------ | :---------------: |
+| **Chinese-LLaMA-2-13B** | 38.9 / 42.5 | **Chinese-Alpaca-2-13B** | 43.2 / 45.5 |
+| **Chinese-LLaMA-2-7B** | 27.9 / 34.1 | **Chinese-Alpaca-2-7B** | 40.0 / 41.8 |
+| Chinese-LLaMA-Plus-33B | 35.2 / 38.8 | Chinese-Alpaca-Plus-33B | 46.6 / 45.3 |
+| Chinese-LLaMA-Plus-13B | 29.6 / 34.0 | Chinese-Alpaca-Plus-13B | 40.6 / 39.9 |
+| Chinese-LLaMA-Plus-7B | 25.4 / 26.3 | Chinese-Alpaca-Plus-7B | 36.8 / 32.6 |
### 量化效果评测
@@ -181,30 +206,37 @@ C-Eval推理代码请参考本项目 >>> [📚 GitHub Wiki](https://github.com/y
| 精度 | 模型大小 | PPL | C-Eval |
| :-------- | :------: | :----: | :---------: |
-| FP16 | 12.9 GB | 8.1797 | 28.2 / 36.0 |
-| 8-bit量化 | 6.8 GB | 8.2884 | 26.8 / 35.4 |
-| 4-bit量化 | 3.7 GB | 8.8581 | 25.5 / 32.8 |
+| FP16 | 12.9 GB | 9.373 | 28.2 / 36.0 |
+| 8-bit量化 | 6.8 GB | 9.476 | 26.8 / 35.4 |
+| 4-bit量化 | 3.7 GB | 10.132 | 25.5 / 32.8 |
-特别地,以下是在llama.cpp下不同量化方法的评测数据,供用户参考,速度以ms/tok计。具体细节见[Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/llamacpp_zh#关于量化方法选择及推理速度)。
+特别地,以下是在llama.cpp下不同量化方法的评测数据,供用户参考,速度以ms/tok计,测试设备为M1 Max。具体细节见[📖GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/llamacpp_zh#关于量化方法选择及推理速度)
-| | F16 | Q4_0 | Q4_1 | Q4_K | Q5_0 | Q5_1 | Q5_K | Q6_K | Q8_0 |
-| --------- | -----: | ----: | ----: | ----: | ----: | ----: | ----: | ----: | ----: |
-| PPL | 8.640 | 8.987 | 9.175 | 8.836 | 8.730 | 8.776 | 8.707 | 8.671 | 8.640 |
-| Size | 12.91G | 3.69G | 4.08G | 3.92G | 4.47G | 4.86G | 4.59G | 5.30G | 6.81G |
-| CPU Speed | 117 | 39 | 44 | 43 | 48 | 51 | 50 | 54 | 65 |
-| GPU Speed | 53 | 17 | 18 | 20 | n/a | n/a | 25 | 26 | n/a |
+| llama.cpp | F16 | Q2_K | Q3_K | Q4_0 | Q4_1 | Q4_K | Q5_0 | Q5_1 | Q5_K | Q6_K | Q8_0 |
+| --------- | -----: | -----: | ----: | ----: | ----: | ----: | ----: | ----: | ----: | ----: | ----: |
+| PPL | 9.128 | 13.640 | 9.910 | 9.476 | 9.576 | 9.257 | 9.156 | 9.213 | 9.141 | 9.143 | 9.129 |
+| Size | 12.91G | 2.77G | 3.17G | 3.69G | 4.08G | 3.92G | 4.47G | 4.86G | 4.59G | 5.30G | 6.81G |
+| CPU Speed | 117 | 42 | 51 | 39 | 44 | 43 | 48 | 51 | 50 | 54 | 65 |
+| GPU Speed | 53 | 19 | 21 | 17 | 18 | 20 | x | x | 25 | 26 | x |
## 训练与精调
-预训练(中文LLaMA-2训练)和指令精调(中文Alpaca-2训练)相关内容请参考对应Wiki。
+#### 预训练
-- **预训练**:代码参考了🤗transformers中的[run_clm.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm.py),使用方法见[预训练脚本Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/pt_scripts_zh)
-- **指令精调**:代码参考了[Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)项目中数据集处理的相关部分,使用方法见[指令精调脚本Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/sft_scripts_zh)
+- 在原版Llama-2的基础上,利用大规模无标注数据进行增量训练,得到Chinese-LLaMA-2系列基座模型
+- 训练数据采用了一期项目中Plus版本模型一致的数据,其总量约120G纯文本文件
+- 训练代码参考了🤗transformers中的[run_clm.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm.py),使用方法见[📖预训练脚本Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/pt_scripts_zh)
+
+#### 指令精调
+
+- 在Chinese-LLaMA-2的基础上,利用有标注指令数据进行进一步精调,得到Chinese-Alpaca-2系列模型
+- 训练数据采用了一期项目中Pro版本模型使用的指令数据,其总量约500万条指令数据(相比一期略增加)
+- 训练代码参考了[Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)项目中数据集处理的相关部分,使用方法见[📖指令精调脚本Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/sft_scripts_zh)
## 常见问题
-请在提Issue前务必先查看FAQ中是否已存在解决方案。
+请在提Issue前务必先查看FAQ中是否已存在解决方案。具体问题和解答请参考本项目 [📖GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/faq_zh)
```
问题1:本项目和一期项目的区别?
@@ -212,10 +244,9 @@ C-Eval推理代码请参考本项目 >>> [📚 GitHub Wiki](https://github.com/y
问题3:接受第三方Pull Request吗?
问题4:为什么不对模型做全量预训练而是用LoRA?
问题5:二代模型支不支持某些支持一代LLaMA的工具?
+问题6:Chinese-Alpaca-2是Llama-2-Chat训练得到的吗?
```
-具体问题和解答请参考本项目 >>> [📚 GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/faq_zh)
-
## 引用
@@ -253,7 +284,7 @@ C-Eval推理代码请参考本项目 >>> [📚 GitHub Wiki](https://github.com/y
- 可能会产生不可预测的有害内容以及不符合人类偏好和价值观的内容
- 由于算力和数据问题,相关模型的训练并不充分,中文理解能力有待进一步提升
-- 暂时没有在线可互动的demo(注:用户仍然可以自行在本地部署)
+- 暂时没有在线可互动的demo(注:用户仍然可以自行在本地部署和体验)
diff --git a/README_EN.md b/README_EN.md
index 29dea92..60a3a27 100644
--- a/README_EN.md
+++ b/README_EN.md
@@ -21,17 +21,19 @@ This project is based on the Llama-2, released by Meta, and it is the second gen
- 🚀 Open-sourced the pre-training and instruction finetuning (SFT) scripts for further tuning on user's data
- 🚀 Quickly deploy and experience the quantized LLMs on CPU/GPU of personal PC
- 🚀 Support for LLaMA ecosystems like [🤗transformers](https://github.com/huggingface/transformers), [llama.cpp](https://github.com/ggerganov/llama.cpp), [text-generation-webui](https://github.com/oobabooga/text-generation-webui), [LangChain](https://github.com/hwchase17/langchain), [privateGPT](https://github.com/imartinez/privateGPT), [vLLM](https://github.com/vllm-project/vllm) etc.
-- The currently open-source models are Chinese-LLaMA-2-7B and Chinese-Alpaca-2-7B (check our [first-gen project](https://github.com/ymcui/Chinese-LLaMA-Alpaca) for more models).
+- The currently open-source models are Chinese-LLaMA-2 (7B/13B) and Chinese-Alpaca-2 (7B/13B) (check our [first-gen project](https://github.com/ymcui/Chinese-LLaMA-Alpaca) for more models).
![](./pics/screencast.gif)
----
- [Visual Chinese-LLaMA-Alpaca](https://github.com/airaria/Visual-Chinese-LLaMA-Alpaca) | [Multi-modal VLE](https://github.com/iflytek/VLE) | [Chinese MiniRBT](https://github.com/iflytek/MiniRBT) | [Chinese LERT](https://github.com/ymcui/LERT) | [Chinese-English PERT](https://github.com/ymcui/PERT) | [Chinese MacBERT](https://github.com/ymcui/MacBERT) | [Chinese ELECTRA](https://github.com/ymcui/Chinese-ELECTRA) | [Chinese XLNet](https://github.com/ymcui/Chinese-XLNet) | [Chinese BERT](https://github.com/ymcui/Chinese-BERT-wwm) | [Knowledge distillation tool TextBrewer](https://github.com/airaria/TextBrewer) | [Model pruning tool TextPruner](https://github.com/airaria/TextPruner)
+[Chinese LLaMA&Alpaca LLMs](https://github.com/ymcui/Chinese-LLaMA-Alpaca)| [Visual Chinese-LLaMA-Alpaca](https://github.com/airaria/Visual-Chinese-LLaMA-Alpaca) | [Multi-modal VLE](https://github.com/iflytek/VLE) | [Chinese MiniRBT](https://github.com/iflytek/MiniRBT) | [Chinese LERT](https://github.com/ymcui/LERT) | [Chinese-English PERT](https://github.com/ymcui/PERT) | [Chinese MacBERT](https://github.com/ymcui/MacBERT) | [Chinese ELECTRA](https://github.com/ymcui/Chinese-ELECTRA) | [Chinese XLNet](https://github.com/ymcui/Chinese-XLNet) | [Chinese BERT](https://github.com/ymcui/Chinese-BERT-wwm) | [Knowledge distillation tool TextBrewer](https://github.com/airaria/TextBrewer) | [Model pruning tool TextPruner](https://github.com/airaria/TextPruner)
## News
-**[Aug 02, 2023] Add FlashAttention-2 training support, vLLM-based inference acceleration support, a new system prompt that generates longer response, etc. For details, see [📚 v1.1 release note](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/releases/tag/v1.1)**
+**[Aug 14, 2023] Release Chinese-LLaMA-2-13B and Chinese-Alpaca-2-13B. Add text-generation-webui/LangChain/privateGPT support. Add CFG sampling, etc. For details, see [📚 v2.0 release note](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/releases/tag/v2.0)**
+
+[Aug 02, 2023] Add FlashAttention-2 training support, vLLM-based inference acceleration support, a new system prompt that generates longer response, etc. For details, see [📚 v1.1 release note](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/releases/tag/v1.1)
[July 31, 2023] Release Chinese-LLaMA-2-7B (base model), trained with 120GB Chinese data. It was further fine-tuned using 5M instruction data, resulting in the Chinese-Alpaca-2-7B (instruction/chat model). For details, see [📚 v1.0 release notes](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/releases/tag/v1.0)
@@ -77,29 +79,37 @@ This project launches the Chinese LLaMA-2 and Alpaca-2 models based on Llama-2.
### Model Selection Guide
-Below is a basic comparison between the Chinese LLaMA-2 and Alpaca-2 models, as well as recommended use cases.
-
-| Comparison | Chinese LLaMA-2 | Chinese Alpaca-2 |
-| :---------------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- |
-| Training Method | Traditional CLM | Instruction fine-tuning |
-| Model Type | **Base Model** | **Instruction/Chat Model (like ChatGPT)** |
-| Training Corpus | Unlabeled general corpus | Labeled instruction data |
-| Vocabulary Size[1] | 55,296 | 55,296 |
-| Input Template | Not required | Requires specific templates[2] |
+Below is a basic comparison between the Chinese LLaMA-2 and Alpaca-2 models, as well as recommended use cases. **Use Alpaca for ChatGPT-like interaction.**
+
+| Comparison | Chinese LLaMA-2 | Chinese Alpaca-2 |
+| :---------------------------- | :----------------------------------------------------------: | :----------------------------------------------------------: |
+| Model Type | **Base Model** | **Instruction/Chat Model (like ChatGPT)** |
+| Released Sizes | 7B, 13B | 7B, 13B |
+| Training Method | Causal-LM (CLM) | Instruction fine-tuning |
+| Training Parts | LoRA + emb/lm-head | LoRA + emb/lm-head |
+| Trained on | [Original Llama-2](https://github.com/facebookresearch/llama) | Chinese LLaMA-2 |
+| Training Corpus | Unlabeled general corpus | Labeled instruction data |
+| Vocabulary Size[1] | 55,296 | 55,296 |
+| Context Size[2] | 4K (12K-18K) | 4K (12K-18K) |
+| Input Template | Not required | Requires specific templates[3] |
| Suitable Scenarios | Text continuation: Given the context, the model generates the following text | Instruction understanding: Q&A, writing, chatting, interaction, etc. |
-| Unsuitable Scenarios | Instruction understanding, multi-turn chat, etc. | Unrestricted text generation |
+| Unsuitable Scenarios | Instruction understanding, multi-turn chat, etc. | Unrestricted text generation |
-[1] *The vocabulary of the first and second generation models in this project are different, do not mix them. The vocabularies of the second generation LLaMA and Alpaca are the same.*
-[2] *Alpaca-2 uses the Llama-2-chat series templates (different prompts), not the templates of the first-generation Alpaca, do not mix them.*
+> [!NOTE]
+> [1] *The vocabulary of the first and second generation models in this project are different, do not mix them. The vocabularies of the second generation LLaMA and Alpaca are the same.*
+> [2] *Extended context size with NTK method is depicted in brackets.*
+> [3] *Alpaca-2 uses the Llama-2-chat series templates (different prompts), not the templates of the first-generation Alpaca, do not mix them.*
### Full Model Download
Below are the full models, which can be used directly afterwards, without additional merging steps. Recommended for users with sufficient network bandwidth.
-| Model Name | Type | Training Data | Size | Download Link |
-| :------------------ | :---------------: | :---------------: | :--: | :----------------------------------------------------------: |
-| Chinese-LLaMA-2-7B | Base Model | 120G General Text | 13GB | [[Baidu Disk]](https://pan.baidu.com/s/1E5NI3nlQpx1j8z3eIzbIlg?pwd=n8k3)
[[Google Drive]](https://drive.google.com/drive/folders/18pp4I-mvQxRA7b8vF9gP-2cH_ocnXVKh?usp=share_link)
[[HuggingFace]](https://huggingface.co/ziqingyang/chinese-llama-2-7b) |
-| Chinese-Alpaca-2-7B | Instruction Model | 5M Instructions | 13GB | [[Baidu Disk]](https://pan.baidu.com/s/1wxx-CdgbMupXVRBcaN4Slw?pwd=kpn9)
[[Google Drive]](https://drive.google.com/drive/folders/1JsJDVs7tE2y31PBNleBlDPsB7S0ZrY8d?usp=share_link)
[[HuggingFace]](https://huggingface.co/ziqingyang/chinese-alpaca-2-7b) |
+| Model Name | Type | Training Data | Size | Download Link |
+| :-------------------- | :---------------: | :---------------: | :-----: | :----------------------------------------------------------: |
+| Chinese-LLaMA-2-13B 🆕 | Base model | 120G General Text | 24.7 GB | [[Baidu]](https://pan.baidu.com/s/1T3RqEUSmyg6ZuBwMhwSmoQ?pwd=e9qy) [[Google]](https://drive.google.com/drive/folders/1YNa5qJ0x59OEOI7tNODxea-1YvMPoH05?usp=share_link) [[🤗HF]](https://huggingface.co/ziqingyang/chinese-llama-2-13b) |
+| Chinese-LLaMA-2-7B | Base model | 120G General Text | 12.9 GB | [[Baidu]](https://pan.baidu.com/s/1E5NI3nlQpx1j8z3eIzbIlg?pwd=n8k3) [[Google]](https://drive.google.com/drive/folders/18pp4I-mvQxRA7b8vF9gP-2cH_ocnXVKh?usp=share_link) [[🤗HF]](https://huggingface.co/ziqingyang/chinese-llama-2-7b) |
+| Chinese-Alpaca-2-13B 🆕 | Chat Model | 5M Instructions | 24.7 GB | [[Baidu]](https://pan.baidu.com/s/1MT_Zlap1OtdYMgoBNTS3dg?pwd=9xja) [[Google]](https://drive.google.com/drive/folders/1MTsKlzR61xmbTR4hBWzQas_MOpUZsogN?usp=share_link) [[🤗HF]](https://huggingface.co/ziqingyang/chinese-alpaca-2-13b) |
+| Chinese-Alpaca-2-7B | Chat Model | 5M Instructions | 12.9 GB | [[Baidu]](https://pan.baidu.com/s/1wxx-CdgbMupXVRBcaN4Slw?pwd=kpn9) [[Google]](https://drive.google.com/drive/folders/1JsJDVs7tE2y31PBNleBlDPsB7S0ZrY8d?usp=share_link) [[🤗HF]](https://huggingface.co/ziqingyang/chinese-alpaca-2-7b) |
### LoRA Model Download
@@ -107,13 +117,16 @@ Below are the LoRA models, **which cannot be used directly and must be merged wi
| Model Name | Type | Training Data | Refactored Model | Size | LoRA Download Link |
| :----------------------- | :---------------: | :---------------: | :----------------------------------------------------------: | :---: | :----------------------------------------------------------: |
-| Chinese-LLaMA-2-7B-LoRA | Base Model | 120G General Text | [Llama-2-7B-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 1.1GB | [[Baidu Disk]](https://pan.baidu.com/s/1bmgqdyRh9E3a2uqOGyNqiQ?pwd=7kvq)
[[Google Drive]](https://drive.google.com/file/d/1njJGSU_PRbzjYRNw5RSbC5-4fBOXTVY3/view?usp=share_link)
[[HuggingFace]](https://huggingface.co/ziqingyang/chinese-llama-2-lora-7b) |
-| Chinese-Alpaca-2-7B-LoRA | Instruction Model | 5M Instructions | [Llama-2-7B-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 1.1GB | [[Baidu Disk]](https://pan.baidu.com/s/1g0olPxkB_rlZ9UUVfOnbcw?pwd=5e7w)
[[Google Drive]](https://drive.google.com/file/d/1MzJL-ZIzdJW7MIcAiYIDIDJ5dlMi8Kkk/view?usp=share_link)
[[HuggingFace]](https://huggingface.co/ziqingyang/chinese-alpaca-2-lora-7b) |
+| Chinese-LLaMA-2-LoRA-13B 🆕 | Base model | 120G General Text | [Llama-2-13B-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) | 1.5 GB | [[Baidu]](https://pan.baidu.com/s/1PFKTBn54GjAjzWeQISKruw?pwd=we6s) [[Google]](https://drive.google.com/file/d/10Z_k9A9N9D_6RHrMTmbHQRCuI6s1iMb1/view?usp=share_link) [[🤗HF]](https://huggingface.co/ziqingyang/chinese-llama-2-lora-13b) |
+| Chinese-LLaMA-2-LoRA-7B | Base model | 120G General Text | [Llama-2-7B-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 1.1 GB | [[Baidu]](https://pan.baidu.com/s/1bmgqdyRh9E3a2uqOGyNqiQ?pwd=7kvq) [[Google]](https://drive.google.com/file/d/1njJGSU_PRbzjYRNw5RSbC5-4fBOXTVY3/view?usp=share_link) [[🤗HF]](https://huggingface.co/ziqingyang/chinese-llama-2-lora-7b) |
+| Chinese-Alpaca-2-LoRA-13B 🆕 | Chat Model | 5M Instructions | [Llama-2-13B-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) | 1.5 GB | [[Baidu]](https://pan.baidu.com/s/1Y5giIXOUUzI4Na6JOcviVA?pwd=tc2j) [[Google]](https://drive.google.com/file/d/1z2FIInsYJBTXipgztc-Mv7kkeqscx442/view?usp=share_link) [[🤗HF]](https://huggingface.co/ziqingyang/chinese-alpaca-2-lora-13b) |
+| Chinese-Alpaca-2-LoRA-7B | Chat Model | 5M Instructions | [Llama-2-7B-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 1.1 GB | [[Baidu]](https://pan.baidu.com/s/1g0olPxkB_rlZ9UUVfOnbcw?pwd=5e7w) [[Google]](https://drive.google.com/file/d/1MzJL-ZIzdJW7MIcAiYIDIDJ5dlMi8Kkk/view?usp=share_link) [[🤗HF]](https://huggingface.co/ziqingyang/chinese-alpaca-2-lora-7b) |
-As the LoRA models cannot be used separately, they must be merged with the original Llama-2 to form a complete model for model inference, quantization, or further training. Please choose one of the following methods to merge these models.
-
-- [**Online Conversion**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/online_conversion_en): Colab users can use the notebook provided by this project for online conversion and model quantization
-- [**Manual Conversion**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/manual_conversion_en): Offline method of conversion, generating different formats of models for quantization or further fine-tuning
+> [!IMPORTANT]
+> As the LoRA models cannot be used separately, they must be merged with the original Llama-2 to form a complete model for model inference, quantization, or further training. Please choose one of the following methods to merge these models.
+>
+> - [**Online Conversion**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/online_conversion_en): Colab users can use the notebook provided by this project for online conversion and model quantization
+> - [**Manual Conversion**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/manual_conversion_en): Offline method of conversion, generating different formats of models for quantization or further fine-tuning
## Inference and Deployment
@@ -125,11 +138,12 @@ The models in this project mainly support the following quantization, inference,
| [**🤗Transformers**](https://github.com/huggingface/transformers) | Native transformers inference interface | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/inference_with_transformers_en) |
| [**Colab Demo**](https://colab.research.google.com/drive/1yu0eZ3a66by8Zqm883LLtRQrguBAb9MR?usp=sharing) | Running a Gradio web demo in Colab | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | [link](https://colab.research.google.com/drive/1yu0eZ3a66by8Zqm883LLtRQrguBAb9MR?usp=sharing) |
| [**OpenAI API Calls**](https://platform.openai.com/docs/api-reference) | A server that implements OpenAI API | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/api_calls_en) |
-| [**text-generation-webui**](https://github.com/oobabooga/text-generation-webui) | A tool for deploying model as a web UI | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/text-generation-webui_en) |
+| [**text-generation-webui**](https://github.com/oobabooga/text-generation-webui) | A tool for deploying model as a web UI | ✅ | ✅ | ✅ | ✅ | ✅† | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/text-generation-webui_en) |
| [**LangChain**](https://github.com/hwchase17/langchain) | LLM application development framework, suitable for secondary development | ✅† | ✅ | ✅† | ❌ | ❌ | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/langchain_en) |
| [**privateGPT**](https://github.com/imartinez/privateGPT) | LangChain-based multi-document QA framework | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/privategpt_en) |
-†: Supported by LangChain, but not implemented in the tutorial. Please refer to the official LangChain Documentation for details.
+> [!NOTE]
+> †: Supported by this tool, but not implemented in the tutorial. Please refer to the official documentation for details.
## System Performance
@@ -137,36 +151,45 @@ The models in this project mainly support the following quantization, inference,
In order to intuitively understand the generation performance of the model, this project has launched an online model arena platform imitating [Fastchat Chatbot Arena](https://chat.lmsys.org/?arena), where you can browse and evaluate the quality of model responses. The arena platform provides evaluation indicators such as win rate and Elo score, and you can view the win rate of battles between two models. The question bank comes from [200 questions manually created in the first-generation project](https://github.com/ymcui/Chinese-LLaMA-Alpaca/tree/main/examples/f16-p7b-p13b-33b), and additional questions added on this basis. Generated replies are subject to randomness and are influenced by decoding hyperparameters, random seeds, etc., so the related evaluations are not absolutely rigorous. The results are only for reference, and you are welcome to experience it yourself. Please see the [examples directory](./examples) for some generated examples.
-Tested models include:
-
-- [**First-gen model**](https://github.com/ymcui/Chinese-LLaMA-Alpaca): Chinese-Alpaca-Pro series (7B/13B/33B), Chinese-Alpaca-Plus series (7B/13B/33B)
-- [**Second-gen model (this project)**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2): Chinese-Alpaca-2 (7B)
+**⚔️ Online Chatbot Arena: [http://llm-arena.ymcui.com](http://llm-arena.ymcui.com/)**
-**📊 Online ChatBot Arena**: [http://chinese-alpaca-arena.ymcui.com](http://chinese-alpaca-arena.ymcui.com/)
+| System | Win Rate (no tie)↓ | Elo Rating |
+| ------------------------------------------------------------ | :----------------: | :--------: |
+| [Alpaca-Pro-33B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 68.98% | 1584.23 |
+| [Alpaca-Pro-7B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 66.38% | 1626.87 |
+| **Alpaca-2-7B** | 66.24% | 1541.09 |
+| [Alpaca-Pro-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 65.94% | 1518.04 |
+| [Alpaca-Plus-33B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 34.09% | 1475.68 |
+| [Alpaca-Plus-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 25.79% | 1411.07 |
+| [Alpaca-Plus-7B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 22.13% | 1343.01 |
-### NLU Performance Evaluation
+> [!NOTE]
+> Results are based . For the latest results, see [**⚔️Arena**](http://llm-arena.ymcui.com/).
-This project also tested related models on the NLU datasets. The results of this type of evaluation are objective and only require the output of given labels, so they can provide insights into the capabilities of large models from another perspective. In the recently launched [C-Eval dataset](https://cevalbenchmark.com/), this project tested the performance of the relevant models. The test set contains 12.3K multiple-choice questions covering 52 subjects. The following are the evaluation results (average) of some models on the validation and test sets.
+### NLU Performance Evaluation: C-Eval
-Comparisons between LLaMA models:
+[C-Eval](https://cevalbenchmark.com/) is a comprehensive Chinese basic model evaluation suite. The validation set contains 1.3K multiple-choice questions, and the test set contains 12.3K multiple-choice questions, covering 52 subjects. The type of questions is multiple-choice. The experimental results are presented in the format of "zero-shot / 5-shot". For C-Eval inference code, please refer to this project's [📖GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/ceval_en).
-| Model | Valid (zero-shot) | Valid (5-shot) | Test (zero-shot) | Test (5-shot) |
-| ---------------------- | :---------------: | :------------: | :--------------: | :-----------: |
-| **Chinese-LLaMA-2-7B** | **28.2** | **36.0** | **30.3** | **34.2** |
-| Chinese-LLaMA-Plus-13B | 27.3 | 34.0 | 27.8 | 33.3 |
-| Chinese-LLaMA-Plus-7B | 27.3 | 28.3 | 26.9 | 28.4 |
+| LLaMA Models | Valid | Test | Alpaca Models | Valid | Test |
+| ----------------------- | :---------: | :---------: | ------------------------ | :---------: | :---------: |
+| **Chinese-LLaMA-2-13B** | 40.6 / 42.7 | 38.0 / 41.6 | **Chinese-Alpaca-2-13B** | 44.3 / 45.9 | 42.6 / 44.0 |
+| **Chinese-LLaMA-2-7B** | 28.2 / 36.0 | 30.3 / 34.2 | **Chinese-Alpaca-2-7B** | 41.3 / 42.9 | 40.3 / 39.5 |
+| Chinese-LLaMA-Plus-33B | 37.4 / 40.0 | 35.7 / 38.3 | Chinese-Alpaca-Plus-33B | 46.5 / 46.3 | 44.9 / 43.5 |
+| Chinese-LLaMA-Plus-13B | 27.3 / 34.0 | 27.8 / 33.3 | Chinese-Alpaca-Plus-13B | 43.3 / 42.4 | 41.5 / 39.9 |
+| Chinese-LLaMA-Plus-7B | 27.3 / 28.3 | 26.9 / 28.4 | Chinese-Alpaca-Plus-7B | 36.7 / 32.9 | 36.4 / 32.3 |
-Comparisons between Alpaca models:
+### NLU Performance Evaluation: CMMLU
-| Model | Valid (zero-shot) | Valid (5-shot) | Test (zero-shot) | Test (5-shot) |
-| ----------------------- | :---------------: | :------------: | :--------------: | :-----------: |
-| **Chinese-Alpaca-2-7B** | 41.3 | **42.9** | 40.3 | 39.5 |
-| Chinese-Alpaca-Plus-13B | **43.3** | 42.4 | **41.5** | **39.9** |
-| Chinese-Alpaca-Plus-7B | 36.7 | 32.9 | 36.4 | 32.3 |
+[CMMLU](https://github.com/haonan-li/CMMLU) is another comprehensive Chinese evaluation dataset, specifically designed to evaluate the knowledge and reasoning abilities of language models in a Chinese context. It covers 67 topics ranging from basic subjects to advanced professional levels, with a total of 11.5K test cases. The type of questions is multiple-choice. For CMMLU inference code, please refer to this project's [📖GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/cmmlu_en).
-It is important to note that the comprehensive assessment of the capabilities of large models is still an urgent and significant topic to address. It is beneficial to approach the various evaluation results of large models in a rational and balanced manner to promote the healthy development of large-scale model technology. It is recommended for users to conduct tests on their own tasks and choose models that are suitable for the relevant tasks.
+| LLaMA Models | Test (0/few-shot) | Alpaca Models | Test (0/few-shot) |
+| ----------------------- | :---------------: | ------------------------ | :---------------: |
+| **Chinese-LLaMA-2-13B** | 38.9 / 42.5 | **Chinese-Alpaca-2-13B** | 43.2 / 45.5 |
+| **Chinese-LLaMA-2-7B** | 27.9 / 34.1 | **Chinese-Alpaca-2-7B** | 40.0 / 41.8 |
+| Chinese-LLaMA-Plus-33B | 35.2 / 38.8 | Chinese-Alpaca-Plus-33B | 46.6 / 45.3 |
+| Chinese-LLaMA-Plus-13B | 29.6 / 34.0 | Chinese-Alpaca-Plus-13B | 40.6 / 39.9 |
+| Chinese-LLaMA-Plus-7B | 25.4 / 26.3 | Chinese-Alpaca-Plus-7B | 36.8 / 32.6 |
-For C-Eval inference code, please refer to >>> [📚 GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/ceval_en)
### Quantization Evaluation
@@ -174,18 +197,18 @@ To understand the quality loss brought by quantization, taking Chinese-LLaMA-2-7
| Precision | Model Size | PPL | C-Eval |
| :-------- | :--------: | :----: | :---------: |
-| FP16 | 12.9 GB | 8.1797 | 28.2 / 36.0 |
-| 8-bit | 6.8 GB | 8.2884 | 26.8 / 35.4 |
-| 4-bit | 3.7 GB | 8.8581 | 25.5 / 32.8 |
+| FP16 | 12.9 GB | 9.373 | 28.2 / 36.0 |
+| 8-bit量化 | 6.8 GB | 9.476 | 26.8 / 35.4 |
+| 4-bit量化 | 3.7 GB | 10.132 | 25.5 / 32.8 |
Specifically, the followings are the benchmark for different quantization methods in llama.cpp. The speed is presented with ms/tok. For details, see our [Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/llamacpp_en#quantization-method-and-inference-speed).
-| | F16 | Q4_0 | Q4_1 | Q4_K | Q5_0 | Q5_1 | Q5_K | Q6_K | Q8_0 |
-| --------- | -----: | ----: | ----: | ----: | ----: | ----: | ----: | ----: | ----: |
-| PPL | 8.640 | 8.987 | 9.175 | 8.836 | 8.730 | 8.776 | 8.707 | 8.671 | 8.640 |
-| Size | 12.91G | 3.69G | 4.08G | 3.92G | 4.47G | 4.86G | 4.59G | 5.30G | 6.81G |
-| CPU Speed | 117 | 39 | 44 | 43 | 48 | 51 | 50 | 54 | 65 |
-| GPU Speed | 53 | 17 | 18 | 20 | n/a | n/a | 25 | 26 | n/a |
+| llama.cpp | F16 | Q2_K | Q3_K | Q4_0 | Q4_1 | Q4_K | Q5_0 | Q5_1 | Q5_K | Q6_K | Q8_0 |
+| --------- | -----: | -----: | ----: | ----: | ----: | ----: | ----: | ----: | ----: | ----: | ----: |
+| PPL | 9.128 | 13.640 | 9.910 | 9.476 | 9.576 | 9.257 | 9.156 | 9.213 | 9.141 | 9.143 | 9.129 |
+| Size | 12.91G | 2.77G | 3.17G | 3.69G | 4.08G | 3.92G | 4.47G | 4.86G | 4.59G | 5.30G | 6.81G |
+| CPU Speed | 117 | 42 | 51 | 39 | 44 | 43 | 48 | 51 | 50 | 54 | 65 |
+| GPU Speed | 53 | 19 | 21 | 17 | 18 | 20 | x | x | 25 | 26 | x |
## Training and Fine-tuning
@@ -204,6 +227,7 @@ Question 2: Can the model be commercialized?
Question 3: Do you accept third-party Pull Requests?
Question 4: Why not perform full pre-training but use LoRA instead?
Question 5: Does Llama-2 series support tools that support the first-gen LLaMA?
+Question 6: Is Chinese-Alpaca-2 trained from Llama-2-Chat?
```
For specific questions and answers, please refer to the project >>> [📚 GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/faq_en)