State-of-the-art NLP principles and methods for toxic comment classification
The links below display the above notebooks via nbviewer, because Github sometimes fails to display .ipynb-files properly.
- Basic NLP-Preprocessing Techniques (Python Code Snippets)
- Data Exploration - Quora Kaggle Competition
- Custom Keras NN-Model
- ELMO via TensorflowHub showcase for the Quora Dataset
- Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer. Deep contextualized word representations. arXiv preprint arXiv:1802.05365, 2018.
- BERT via TensorflowHub showcase for the Quora Dataset
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova: "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding", 2018.
- XLNET with SpaCy-Pytorch Transformers showcase for the Quora Dataset
- Zhilin Yang and Zihang Dai and Yiming Yang and Jaime Carbonell and Ruslan Salakhutdinov and Quoc V. Le.XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv preprint arXiv:1906.08237, 2019
- BERT with SpaCy-Pytorch-Transformers showcase for the Quora Dataset
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova: "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding", 2018.
-
-
- https://github.com/explosion/spacy-pytorch-transformers (spaCy pipelines for BERT, XLNet and GPT-2)
- How to Fine Tune BERT
-
NLP: nltk book and nlp.stanford 1 / nlp.standford 2
-
- http://www.arxiv-sanity.com/ (AI-papers in general)
- https://www.aclweb.org/anthology (papers on the study of computational linguistics and NLP)
- https://github.com/sebastianruder/NLP-progress/blob/master/english/sentiment_analysis.md
- https://github.com/keon/awesome-nlp#research-summaries-and-trends
Libraries for working with human languages.
- General
- gensim - Topic Modeling for Humans.
- langid.py - Stand-alone language identification system.
- nltk - A leading platform for building Python programs to work with human language data.
- pattern - A web mining module for the Python.
- polyglot - Natural language pipeline supporting hundreds of languages.
- pytext - A natural language modeling framework based on PyTorch.
- PyTorch-NLP - A toolkit enabling rapid deep learning NLP prototyping for research.
- spacy - A library for industrial-strength natural language processing in Python and Cython.
- stanfordnlp - The Stanford NLP Group's official Python library, supporting 50+
- flair - library for state-of-the-art NLP by zalandoresearch
- fastai - well documented NLP library for transfer learning