Text_LanguageDetect

PHP library to identify human languages from text samples. Returns confidence scores for each.

Installation

PEAR

$ pear install Text_LanguageDetect

Composer

$ composer require pear/text_languagedetect

Usage

Also see the examples in the docs/ directory and the official documentation.

Language detection

Simple language detection:

<?php
require_once 'Text/LanguageDetect.php';

$text = 'Was wäre, wenn ich Ihnen das jetzt sagen würde?';

$ld = new Text_LanguageDetect();
$language = $ld->detectSimple($text);

echo $language;
//output: german

Show the three most probable languages with their confidence score:

<?php
require_once 'Text/LanguageDetect.php';

$text = 'Was wäre, wenn ich Ihnen das jetzt sagen würde?';

$ld = new Text_LanguageDetect();
//3 most probable languages
$results = $ld->detect($text, 3);

foreach ($results as $language => $confidence) {
    echo $language . ': ' . number_format($confidence, 2) . "\n";
}

//output:
//german: 0.35
//dutch: 0.25
//swedish: 0.20
?>

Language code

Instead of returning the full language name, ISO 639-2 two and three letter codes can be returned:

<?php
require_once 'Text/LanguageDetect.php';
$ld = new Text_LanguageDetect();

//will output the ISO 639-1 two-letter language code
// "de"
$ld->setNameMode(2);
echo $ld->detectSimple('Das ist ein kleiner Text') . "\n";

//will output the ISO 639-2 three-letter language code
// "deu"
$ld->setNameMode(3);
echo $ld->detectSimple('Das ist ein kleiner Text') . "\n";
?>

Supported languages

albanian
arabic
azeri
bengali
bulgarian
cebuano
croatian
czech
danish
dutch
english
estonian
farsi
finnish
french
german
hausa
hawaiian
hindi
hungarian
icelandic
indonesian
italian
kazakh
kyrgyz
latin
latvian
lithuanian
macedonian
mongolian
nepali
norwegian
pashto
pidgin
polish
portuguese
romanian
russian
serbian
slovak
slovene
somali
spanish
swahili
swedish
tagalog
turkish
ukrainian
urdu
uzbek
vietnamese
welsh

Links

Homepage: http://pear.php.net/package/Text_LanguageDetect
Bug tracker: http://pear.php.net/bugs/search.php?cmd=display&package_name[]=Text_LanguageDetect
Documentation: http://pear.php.net/package/Text_LanguageDetect/docs
Unit test status: https://travis-ci.org/pear/Text_LanguageDetect

Notes

Where are the data from?

I don't recall where I got the original data set. It's just the frequencies of 3-letter combinations in each supported language. It could be generated from a few random wikipedia pages from each language.

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
Text		Text
data		data
docs		docs
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
README.rst		README.rst
composer.json		composer.json
phpcs.xml		phpcs.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text_LanguageDetect

Installation

PEAR

Composer

Usage

Language detection

Language code

Supported languages

Links

Notes

About

Releases

Packages

Contributors 3

Languages

pear/Text_LanguageDetect

Folders and files

Latest commit

History

Repository files navigation

Text_LanguageDetect

Installation

PEAR

Composer

Usage

Language detection

Language code

Supported languages

Links

Notes

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages