Skip to content

Latest commit

 

History

History
42 lines (32 loc) · 1.48 KB

README.md

File metadata and controls

42 lines (32 loc) · 1.48 KB

MoE-Stability

Steps:

  • Implement basic MoE architecture

  • Implement simple training loop (The current code is able to work with Lab2 code)

  • Find dataset and tokenizer (wikipedia english and huggingface tokenizers BPE?) <- Partially here

  • Create training and eval code

  • Create Config setup

  • Tensorboard monitoring

  • Create Fedus Load Balance Loss

  • Add in evaluation metrics

  • Train and benchmark for testing (this is where I need to be in 2-3 weeks) <- You are here

  • Verify results against papers?

  • Create additional loss terms

  • Create norm and projected router

  • Expert Choice routing?

There is a chance I need to start with say BERT base and then duplicate the FF and add noise for "experts", this may affect results

I have now set up my model to work with the HuggingFace Trainer - may have to change output to type ModelOutput - This may also mean loss is calculated internally? (loss is calculated internally but I can change that) I now need to create a HuggingFace training script and then test this - use BERT tokenizer and Wikidump preprocessed at like 2022 I think they have - I will be testing reporting to tensorboard right after this

Once I am done with this I will be at the step of creating loss terms and monitoring

By the end of spring break I need to: - fedus (shazeer) loss - load balance loss - evaluation metrics - test code on a small batch <- here - test eval - test loss - test trainer - test monitoring