This is a comparision study for three NLP models(Logistic Regression, RNN, BERT).
The primary study material is AWS Machine Learning University @Youtube. The original data from Amazon Customer Reviews Dataset (https://s3.amazonaws.com/amazon-reviews-pds/readme.html).
Key Steps:
- Clean texts and exclude stop words through stopwords in NLTK library
- Use TD-IDF to vectorize to vectors of len 750.
- Build and train a double layer Logistic Regression Model
- Predict with test data
Key Steps:
- use GloVe for Word2vec pretraining
- build and train 2-layers RNN model
- Predict with test data
Key Steps:
- Pretraining and get tokenizer for BERT
- Fine-tuning BERT