fast-langdetect provides ultra-fast and highly accurate language detection based on FastText, a library developed by Facebook. This package is 80x faster than traditional methods and offers 95% accuracy.
It supports Python versions 3.9 to 3.12.
This project builds upon zafercavdar/fasttext-langdetect with enhancements in packaging.
For more information on the underlying FastText model, refer to the official documentation: FastText Language Identification.
Note
This library requires over 200MB of memory to use in low memory mode.
To install fast-langdetect, you can use either pip
or pdm
:
pip install fast-langdetect
pdm add fast-langdetect
For optimal performance and accuracy in language detection, use detect(text, low_memory=False)
to load the larger
model.
The model will be downloaded to the
/tmp/fasttext-langdetect
directory upon first use.
from fast_langdetect import detect, detect_multilingual
# Single language detection
print(detect("Hello, world!"))
# Output: {'lang': 'en', 'score': 0.1520957201719284}
print(detect("Привет, мир!")["lang"])
# Output: ru
# Multi-language detection
print(detect_multilingual("Hello, world!你好世界!Привет, мир!"))
# Output: [
# {'lang': 'ru', 'score': 0.39008623361587524},
# {'lang': 'zh', 'score': 0.18235979974269867},
# ]
from fast_langdetect import detect_language
# Single language detection
print(detect_language("Hello, world!"))
# Output: EN
print(detect_language("Привет, мир!"))
# Output: RU
print(detect_language("你好,世界!"))
# Output: ZH
For text splitting based on language, please refer to the split-lang repository.
For detailed benchmark results, refer to zafercavdar/fasttext-langdetect#benchmark.