Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Badcase]: Qwen2.5-7B-Instruct 中德、中意翻译 不遵循指令、code switch #1097

Open
4 tasks done
pio57019 opened this issue Nov 23, 2024 · 0 comments
Open
4 tasks done
Assignees
Labels
enhancement New feature or request

Comments

@pio57019
Copy link

pio57019 commented Nov 23, 2024

Model Series

Qwen2.5

What are the models used?

Qwen2.5-7B-Instruct

What is the scenario where the problem happened?

翻译

Is this badcase known and can it be solved using avaiable techniques?

  • I have followed the GitHub README.
  • I have checked the Qwen documentation and cannot find a solution there.
  • I have checked the documentation of the related framework and cannot find useful information.
  • I have searched the issues and there is not a similar one.

Information about environment

操作系统:win10
Python 版本:Python 3.11
GPU:2080ti
NVIDIA 驱动:NVIDIA-SMI 560.94 Driver Version: 560.94
CUDA 编译器:12.6

Description

我用下面代码测试了 Qwen2.5-7B-Instruct 翻译看像不像你们说的那么牛,结果有些都翻译不出来,下面的文本都是随便拿一些游戏的测试的


德语:

    它长得人模人样,却又完全不像是人。
    你men不li开……强dao! xiao偷! sha了你!
    秋娜
	在岩石的阴影处,你发现了一具人类的尸体。似乎是被那些怪物杀害掉的。\n好象死后经过了很久,尸体已经彻底白骨化了。\n周围虽然几乎没有留下什么行李,不过在旁边的地面上有一本笔记。\n旅人的遗体也该回收起来好好安葬吧?\n你用变得破烂不堪的斗篷包起了遗骨。\n把遗骨带去神殿,为死者祈求冥福吧。
	却在中途无路可走了。
	金色猫
	里面一片漆黑,看不到尽头。
	还是快点回家吧。
	这是一个小型空洞。有许多石笋在壁檐交错纵横着。
	嘻ー嘻ー嘿嘿!嗨xiao子,lai这里gan什么。
	身后的通道里却传出了惨叫声。
	你用手触摸了石柱的表面。\n本应是漆黑的石柱,却像是在被青白色的光芒照射般,散发出异质的光泽。\n石柱的岩体上刻满了文字。\n应该是某种古代文字。\n文字不但多有磨损,其中的表音文字与表意文字也是交错混杂,
	阿格迪乌啊,愿您永世昌盛
	通道似乎一路向里延伸着……
	周围虽然几乎没有留下什么行李,

意大利语:

	你总算把蝙蝠群赶跑了。
	1的
	默认内容
	认内容
	的默  认内容
	看来你已经在地面下待了很长时间。
	滚石的势头停住了。
	而是你刚才遭遇过的异型怪物。
	你men不li开……强dao! xiao偷! sha了你!
	秋娜
	地下水脉的川流从高处汇至湖面,形成了瀑布。
	不能……看……
	不能……碰……
	汇入了洞外森林中的小溪谷。
	几天前被闪电偶然击中,烧得只剩灰烬的树下,
	突然,手腕剧痛起来。

.......还有很多我就不测试了


代码是:

model_cache_dir = r"C:\model_cache"
model_name = "Qwen/Qwen2.5-7B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=model_cache_dir)



quantization_config = BitsAndBytesConfig(
    load_in_8bit=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    cache_dir=model_cache_dir,
    quantization_config=quantization_config,  # 使用新的参数传入方式进行量化配置
    device_map='auto'
)

.................

                        device = torch.device("cuda:0")

                        messages = [
                            {"role": "system", "content": "You are a translator who can only translate the content I provide and won't have other chatting ideas.Only the translation content will be returned, no prompt.If it cannot be translated or there are translation errors, please return the original content intact without changing it."},  
                            {"role": "user", "content": f"Translate into { input_str }: {input_text}"}  

                        ]
                        text = tokenizer.apply_chat_template(
                            messages,
                            tokenize=False,
                            add_generation_prompt=True
                        )
                        inputs = tokenizer([text], return_tensors="pt").to(device)

                        outputs = model.generate(
                            **inputs,
                            max_new_tokens=512
                            #,max_length=10000
                        )

                        decoded_output = tokenizer.decode(outputs[0])

我的问题是,有计划完善 语言翻译 这方面吗?

@jklj077 jklj077 added the enhancement New feature or request label Nov 25, 2024
@jklj077 jklj077 changed the title [Badcase]: [Badcase]: Qwen2.5-7B-Instruct 中德、中意翻译 不遵循指令、code switch Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants