[LLM-IE] Add qwen2 to Taskflow #9681

Fantasy-02 · 2024-12-24T03:51:34Z

PR types

New features

PR changes

APIs

Description

add qwen2 to Taskflow

paddle-bot · 2024-12-24T03:51:38Z

Thanks for your contribution!

codecov · 2024-12-24T04:26:03Z

Codecov Report

Attention: Patch coverage is 2.47027% with 1066 lines in your changes missing coverage. Please review.

Project coverage is 52.31%. Comparing base (c9cfa99) to head (1458fe8).
Report is 164 commits behind head on develop.

❗ Current head 1458fe8 differs from pull request most recent head 2f056bf

Please upload reports for the commit 2f056bf to get more accurate results.

Files with missing lines	Patch %	Lines
paddlenlp/taskflow/predictor.py	0.00%	888 Missing ⚠️
paddlenlp/taskflow/information_extraction.py	13.00%	87 Missing ⚠️
paddlenlp/taskflow/export_model.py	17.85%	46 Missing ⚠️
paddlenlp/taskflow/task.py	6.38%	44 Missing ⚠️
paddlenlp/taskflow/text2text_generation.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9681      +/-   ##
===========================================
- Coverage    52.85%   52.31%   -0.55%     
===========================================
  Files          676      720      +44     
  Lines       107827   113349    +5522     
===========================================
+ Hits         56990    59293    +2303     
- Misses       50837    54056    +3219

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ZHUI · 2024-12-30T06:46:33Z

paddlenlp/taskflow/taskflow.py

@@ -314,6 +314,17 @@
    },
    "information_extraction": {
        "models": {
+            "llama": {"task_class": QwenIETask, "hidden_size": 768, "task_flag": "information_extraction-llama"},


有Llama吗？

没有，这个是我当时测试的，可以删了

ZHUI · 2024-12-30T06:47:37Z

paddlenlp/taskflow/taskflow.py

+            "llama": {"task_class": QwenIETask, "hidden_size": 768, "task_flag": "information_extraction-llama"},
+            "qwen-1.5b": {
+                "task_class": QwenIETask,
+                "hidden_size": 768,


这些 hidden_size 不对吧 @wawltor zeyang看一下，这个 hidden_size 参数有用不？

这个hidden_size参数不需要用到

ZHUI · 2024-12-30T06:48:39Z

paddlenlp/taskflow/taskflow.py

@@ -314,6 +314,17 @@
    },
    "information_extraction": {
        "models": {
+            "llama": {"task_class": QwenIETask, "hidden_size": 768, "task_flag": "information_extraction-llama"},
+            "qwen-1.5b": {


看看名字要不要换，ie-qwen-1.5b 或者其他 @wawltor

ZHUI · 2024-12-30T06:49:50Z

paddlenlp/taskflow/text2text_generation.py

@@ -1,252 +0,0 @@
-# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


这个文件是不需要了吗?

ZHUI · 2024-12-30T06:54:02Z

paddlenlp/taskflow/information_extraction.py

+        self._temperature = kwargs.get("temperature", 1.0)
+        self._decode_strategy = kwargs.get("decode_strategy", "sampling")
+        self._num_return_sequences = kwargs.get("num_return_sequences", 1)
+        self.prompt = """你是一个阅读理解专家，请提取所给句子与问题，提取实体。请注意，如果存在实体，则一定在原句中逐字出现，请输出对应实体的原文，不要进行额外修改；如果无法提取，请输出“无相应实体”。


写成全局变量，大写。放在类定义的外面。

QWEN_IE_PROMPT = """"xxx"""

ZHUI · 2024-12-30T07:11:56Z

paddlenlp/taskflow/information_extraction.py

+                result_list = self._single_stage_predict(examples)
+                print('after single stage predict:',result_list)
+
+            if not node.parent_relations:


这里走的是哪一个分支？还是 parent_relations 两个分支都有可能走到？

两个分支都可能走到

ZHUI · 2024-12-30T07:21:59Z

llm/ie/README.md

@@ -0,0 +1,381 @@
+# 通用信息抽取 UIE(Universal Information Extraction)


Suggested change

# 通用信息抽取 UIE(Universal Information Extraction)

# 大模型信息抽取 LLM-IE(Large Language Model Information Extraction)

ZHUI · 2024-12-30T07:23:31Z

llm/ie/README.md

+  | `uie-base` (默认)| 12-layers, 768-hidden, 12-heads | 中文 |
+  | `uie-base-en` | 12-layers, 768-hidden, 12-heads | 英文 |
+  | `uie-medical-base` | 12-layers, 768-hidden, 12-heads | 中文 |
+  | `uie-medium`| 6-layers, 768-hidden, 12-heads | 中文 |
+  | `uie-mini`| 6-layers, 384-hidden, 12-heads | 中文 |
+  | `uie-micro`| 4-layers, 384-hidden, 12-heads | 中文 |
+  | `uie-nano`| 4-layers, 312-hidden, 12-heads | 中文 |
+  | `uie-m-large`| 24-layers, 1024-hidden, 16-heads | 中、英文 |
+  | `uie-m-base`| 12-layers, 768-hidden, 12-heads | 中、英文 | -->


ZHUI · 2024-12-30T07:24:01Z

llm/ie/README.md

+```
+
+* `schema`：定义任务抽取目标，可参考开箱即用中不同任务的调用示例进行配置。
+* `schema_lang`：设置 schema 的语言，默认为`zh`, 可选有`zh`和`en`。因为中英 schema 的构造有所不同，因此需要指定 schema 的语言。该参数只对`uie-m-base`和`uie-m-large`模型有效。


还没吃吃的先删除吧、

ZHUI · 2024-12-30T07:24:10Z

llm/ie/README.md

+* `schema_lang`：设置 schema 的语言，默认为`zh`, 可选有`zh`和`en`。因为中英 schema 的构造有所不同，因此需要指定 schema 的语言。该参数只对`uie-m-base`和`uie-m-large`模型有效。
+* `batch_size`：批处理大小，请结合机器情况进行调整，默认为1。
+* `model`：选择任务使用的模型，默认为`qwen-0.5b`，可选有`qwen-0.5b`, `qwen-1.5b`。
+* `precision`：选择模型精度，默认为`fp32`，可选有`fp16`和`fp32`。`fp16`推理速度更快，支持 GPU 和 NPU 硬件环境。如果选择`fp16`，在 GPU 硬件环境下，请先确保机器正确安装 NVIDIA 相关驱动和基础软件，**确保 CUDA>=11.2，cuDNN>=8.1.1**，初次使用需按照提示安装相关依赖。其次，需要确保 GPU 设备的 CUDA 计算能力（CUDA Compute Capability）大于7.0，典型的设备包括 V100、T4、A10、A100、GTX 20系列和30系列显卡等。更多关于 CUDA Compute Capability 和精度支持情况请参考 NVIDIA 文档：[GPU 硬件与支持精度对照表](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-840-ea/support-matrix/index.html#hardware-precision-matrix)。


bf16 支持

Fantasy-02 added 7 commits December 16, 2024 12:42

fix taskflow

ceeaf60

add llm/ie for SFT of ie task

31ae5f7

remove data folder

a99f41f

update readme

2eeb536

add taskflow

97f578e

remove taskflow

9abda7d

add sft for ie task

1458fe8

paddle-bot bot added the contributor label Dec 24, 2024

paddle-bot bot assigned wawltor Dec 24, 2024

ZHUI changed the title ~~add qwen2 to Taskflow~~ [LLM-IE] Add qwen2 to Taskflow Dec 24, 2024

add multi_stage_predict of taskflow

b64427c

ZHUI reviewed Dec 30, 2024

View reviewed changes

fix some bug and rename qwen2 as uie-llm

2f056bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM-IE] Add qwen2 to Taskflow #9681

[LLM-IE] Add qwen2 to Taskflow #9681

Fantasy-02 commented Dec 24, 2024

paddle-bot bot commented Dec 24, 2024

codecov bot commented Dec 24, 2024 •

edited

Loading

ZHUI Dec 30, 2024

Fantasy-02 Dec 30, 2024

ZHUI Dec 30, 2024

Fantasy-02 Dec 30, 2024

ZHUI Dec 30, 2024

ZHUI Dec 30, 2024

Fantasy-02 Dec 30, 2024

ZHUI Dec 30, 2024

ZHUI Dec 30, 2024

Fantasy-02 Dec 30, 2024

ZHUI Dec 30, 2024

ZHUI Dec 30, 2024

ZHUI Dec 30, 2024

ZHUI Dec 30, 2024

		@@ -1,252 +0,0 @@
		# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,381 @@
		# 通用信息抽取 UIE(Universal Information Extraction)

	# 通用信息抽取 UIE(Universal Information Extraction)
	# 大模型信息抽取 LLM-IE(Large Language Model Information Extraction)

[LLM-IE] Add qwen2 to Taskflow #9681

Are you sure you want to change the base?

[LLM-IE] Add qwen2 to Taskflow #9681

Conversation

Fantasy-02 commented Dec 24, 2024

PR types

PR changes

Description

paddle-bot bot commented Dec 24, 2024

codecov bot commented Dec 24, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Dec 24, 2024 •

edited

Loading