InternLM2-1.8B is the 1.8 billion parameter version of the second generation InternLM series. In order to facilitate user use and research, InternLM2-1.8B has three versions of open-source models. They are:
- InternLM2-1.8B: Foundation models with high quality and high adaptation flexibility, which serve as a good starting point for downstream deep adaptations.
- InternLM2-Chat-1.8B-SFT: Chat model after supervised fine-tuning (SFT) on InternLM2-1.8B.
- InternLM2-Chat-1.8B: Further aligned on top of InternLM2-Chat-1.8B-SFT through online RLHF. InternLM2-Chat-1.8B exhibits better instruction following, chat experience, and function calling, which is recommended for downstream applications.
The base model of InternLM2 has the following technical features:
- Effective support for ultra-long contexts of up to 200,000 characters: The model nearly perfectly achieves "finding a needle in a haystack" in long inputs of 200,000 characters. It also leads among open-source models in performance on long-text tasks such as LongBench and L-Eval.
- Comprehensive performance enhancement: Compared to the previous generation model, it shows significant improvements in various capabilities, including reasoning, mathematics, and coding.
Model | Transformers(HF) | ModelScope(HF) | OpenXLab(HF) | OpenXLab(Origin) | Release Date |
---|---|---|---|---|---|
InternLM2-1.8B | 🤗internlm2-1.8b | internlm2-1.8b | 2024-01-31 | ||
InternLM2-Chat-1.8B-SFT | 🤗internlm2-chat-1.8b-sft | internlm2-chat-1.8b-sft | 2024-01-31 | ||
InternLM2-Chat-1.8B | 🤗internlm2-chat-1.8b | internlm2-chat-1.8b | 2024-02-19 |
We have evaluated InternLM2 on several important benchmarks using the open-source evaluation tool OpenCompass. Some of the evaluation results are shown in the table below. You are welcome to visit the OpenCompass Leaderboard for more evaluation results.
Dataset\Models | InternLM2-1.8B | InternLM2-Chat-1.8B-SFT | InternLM2-Chat-1.8B | InternLM2-7B | InternLM2-Chat-7B |
---|---|---|---|---|---|
MMLU | 46.9 | 47.1 | 44.1 | 65.8 | 63.7 |
AGIEval | 33.4 | 38.8 | 34.6 | 49.9 | 47.2 |
BBH | 37.5 | 35.2 | 34.3 | 65.0 | 61.2 |
GSM8K | 31.2 | 39.7 | 34.3 | 70.8 | 70.7 |
MATH | 5.6 | 11.8 | 10.7 | 20.2 | 23.0 |
HumanEval | 25.0 | 32.9 | 29.3 | 43.3 | 59.8 |
MBPP(Sanitized) | 22.2 | 23.2 | 27.0 | 51.8 | 51.4 |
- The evaluation results were obtained from OpenCompass , and evaluation configuration can be found in the configuration files provided by OpenCompass.
- The evaluation data may have numerical differences due to the version iteration of OpenCompass, so please refer to the latest evaluation results of OpenCompass.