Skip to content

manuelsh/text-classification-tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Text classification with PyTorch and torchtext

This notebook shows how to use torchtext and PyTorch libraries to retrieve a dataset and build a simple RNN model to classify text.

It is based on the TREC-6 dataset, which consists on 5,952 questions written in English, classified in the following categories, depending on their answer:

  • HUM: Human
  • DESC: Description
  • ABBR: Abbreviation
  • LOC: Location
  • NUM: Number
  • ENTY: Entity

Further exercises

Try improving the performance of the model by:

  • Adding more complexity (RNN layers, other layers)
  • Add regularisation (L1, L2, dropout)
  • Make the model a bidirectional RNN
  • Use pretrained embeddings such as word2vec or GLOVE. Note that you can use: nn.Embedding.from_pretrained(...)

About

Text classification with PyTorch and torchtext

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published