Skip to content

Natural Language Procession For Text Classification and Machine learning (Python)

Notifications You must be signed in to change notification settings

MazenSalama/NLP_Text_Classification_With_ML

Repository files navigation

NLP_Text_Classification_With_ML

Natural Language Procession For Text Classification and Machine learning

Text Classification is one model of supervised machine learning task with a labelled dataset containing text documents and their labels is used for train a classifier.

Steps : -

1. Dataset Preparation:

Dataset Preparation step which includes the process of loading a dataset and performing basic pre-processing. The dataset is then splitted into train and validation sets.

2. Feature Engineering

In this step, raw text data will be transformed into feature vectors #and new features will be created using the existing dataset. We will implement the following different ideas in #order to obtain relevant features from our dataset.

2.1 Count Vectors as features

2.2 TF-IDF Vectors as features

-Word level

-N-Gram level

-Character level

2.3 Word Embeddings as features

2.4 Text / NLP based features

2.5 Topic Models as features

3. Model Building & Training:

Model Building step in which a machine learning model is trained on a labelled dataset.

-Naive Bayes Classifier

-Linear Classifier

-Support Vector Machine

-Bagging Models

-Boosting Models

-Shallow Neural Networks

-Deep Neural Networks

-Convolutional Neural Network (CNN)

-Long Short Term Modelr (LSTM)

-Gated Recurrent Unit (GRU)

-Bidirectional RNN

-Recurrent Convolutional Neural Network (RCNN)

-Other Variants of Deep Neural Networks

The diagnostic measures covered are:

  1. accuracy: proportion of test results that are correct

  2. sensitivity: proportion of true +ve identified

  3. specificity: proportion of true -ve identified

  4. positive likelihood: increased probability of true +ve if test +ve

  5. negative likelihood: reduced probability of true +ve if test -ve

  6. false positive rate: proportion of false +ves in true -ve patients

  7. false negative rate: proportion of false -ves in true +ve patients

  8. positive predictive value: chance of true +ve if test +ve

  9. negative predictive value: chance of true -ve if test -ve

  10. precision = positive predictive value

  11. recall = sensitivity

  12. f1 = (2 * precision * recall) / (precision + recall)

4. Improve Performance of Text Classifier:

we will use different ways to improve the performance of text classifiers.

5. Machine learning classifiers interpretability

Using ELI5

About

Natural Language Procession For Text Classification and Machine learning (Python)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published