You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm wondering if this method should (theoretically) work with enc-dec models? Have You tried to train those models with code from this repository? I'm interested in utilizing this approach with T5 model.
The text was updated successfully, but these errors were encountered:
Hi Adam, I've done some experiments on BART with LT-SFT and I can confirm that it works, so I'm pretty sure T5 should work as well. I think you should be able to use LotteryTicketSparseFineTuner without modification, although the boilerplate code in the example scripts will likely require some adjustment for generative models. It's important to note that as with the BERT style models, you should generally decouple the input and output embedding matrices and freeze the output embeddings to achieve good performance.
@AlanAnsell thank You for quick reply. Could You share scripts with BART experiments? It would be great starting point for further experimentation and adaptation for T5 architecture.
Unfortunately I can't share those experiments with you right now, but I generally expect that adaptation shouldn't be too difficult, e.g. for BART I replaced DataCollatorForLanguageModeling with DataCollatorForDenoisingTasks I found here: https://github.com/morganmcg1/rotobart/blob/main/data_collator.py.
I'm wondering if this method should (theoretically) work with enc-dec models? Have You tried to train those models with code from this repository? I'm interested in utilizing this approach with T5 model.
The text was updated successfully, but these errors were encountered: