Language Detection System

This repo provides clean implementation of Language Detection System in TensorFlow-2 using all best practices.

Languages that Models can detect are:

Usage

Installation

Conda (Recommended)

# Tensorflow CPU
conda activate (import tensorflow as tf)

pip

pip install -r requirements.txt

Downloading pre-trained weights

Model_two
Tokenizer

NOTE: Models requires their respective tokenizers to work with; SO kindly download models along with their tokenizers

Or hit wget on terminal (linux)

# Model
wget https://github.com/saahiluppal/langdet/blob/master/model.h5
# Tokenizer
wget https://github.com/saahiluppal/langdet/blob/master/tokenizer.json

Not sure which model to use, You can find information about models here

Action

# wanna detect language (we recommend using more than 5 words for better accuracy)
# file dependencies soon to be added
detect.py

# Training custom model (we recommend setting code which better suits your needs)
manual_tokens.py
# jupyter notebook for same
manual_tokens.ipynb

# Wanna preprocess downloaded data for custom use
extraction.py

Dataset Used

I used Dataset from European Parliament Parallel Corpus,which can be found here
While full dataset is large (1.5 GB Unextracted) you might want to use smaller preprocessed dataset can be found here

LICENSE

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.zip		dataset.zip
detect.py		detect.py
extraction.py		extraction.py
manual_tokens.py		manual_tokens.py
manual_tokens_ipynb.zip		manual_tokens_ipynb.zip
model_history		model_history
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language Detection System

Languages that Models can detect are:

Usage

Installation

Conda (Recommended)

pip

Downloading pre-trained weights

Or hit wget on terminal (linux)

Action

Dataset Used

LICENSE

About

Releases

Packages

Languages

License

saahiluppal/langdet

Folders and files

Latest commit

History

Repository files navigation

Language Detection System

Languages that Models can detect are:

Usage

Installation

Conda (Recommended)

pip

Downloading pre-trained weights

Or hit wget on terminal (linux)

Action

Dataset Used

LICENSE

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages