README.md

Introduction to Generative AI and Large Language Models (LLM)

Week 1 is the most theory-heavy week of the course. You can find the lecture slides here: Week 1 Slides.

Research on Tokenizers and write a section to your final report reflecting on the following questions:

What are tokenizers?
Why are they important for language modeling and LLMs?
What different tokenization algorithms are there and which ones are the most popular ones and why?

Some references:

Neural Machine Translation of Rare Words with Subword Units: https://arxiv.org/abs/1508.07909
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing: https://arxiv.org/abs/1808.06226