Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translation from Simplified Chinese to Eng successful but incorrect or unreliable if run multiple times #273

Open
cavadoge opened this issue Aug 26, 2024 · 0 comments

Comments

@cavadoge
Copy link

  • deep_translator version: Version: 1.11.4
  • Python version: 3.11.5
  • Operating System: MacOS Ventura 13.5.1 (22G90)

Descripion

I tested batch translation for a column of product names from Chinese (Simplified Chinese Mandarin) to English, code below ran smoothly with no output error. After manually checking samples of the resulting translations exported to .csv and .xlxs, I found that many were translated incorrectly.

Batch size: over 5000 rows under df['Product_Name'] column, see below.
No N/A or missing values in the original column.
No N/A or missing values in the resulting df['Product_Name_Eng'] column.

Example of a wrong translation:
"life space益生菌大人调理肠胃肠道双歧杆菌元免疫力提旗舰店正品" has been translated as "[Second sale] wlab makeup primer, primer, invisible pores, flagship store, genuine product, valid until 24/06" on the corresponding row.

Not all are wrong, I'd say most are correct (stopped manually checking after a while)
Example of a successful translation:
"康萃乐儿童益生菌宝宝婴幼儿调理肠胃鼠李糖乳杆菌lgg冲剂30袋" has been translated as "Kangcuile children's probiotics baby infant gastrointestinal conditioning Lactobacillus rhamnosus LGG granules 30 bags" on the corresponding row.

What I did

I ran the following in Jupyter Lab 3.6.3

import pandas as pd
from deep_translator import GoogleTranslator
from concurrent.futures import ThreadPoolExecutor

df = pd.read_csv('test_translation.csv')
translator = GoogleTranslator(source='chinese (simplified)', target='english')

def translate_text(text):
    try:
        return translator.translate(text)
    except Exception as e:
        return text  # Return the original text if translation fails

def batch_translate_texts(texts):
    with ThreadPoolExecutor(max_workers=10) as executor:
        translated_texts = list(executor.map(translate_text, texts))
    return translated_texts

product_names = df['Product_Name'].astype(str).tolist()
translated_names = batch_translate_texts(product_names)

df['Product_Name_Eng'] = translated_names

I then re-run the same code on a smaller batch of 600+ rows, the same string that was originally translated incorrectly in my example has been translated correctly the second time.

Same string this time translated correctly:
"life space益生菌大人调理肠胃肠道双歧杆菌元免疫力提旗舰店正品" has been translated as "Life space probiotics for adults regulate gastrointestinal tract Bifidobacterium Yuan immunity enhance flagship store authentic" which is an acceptable automatic translation for my project.

Need help to understand how to ensure reliable translation results.
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant