Implement Knowledge distillation by Functional Mapping #121

manncodes · 2021-09-02T07:12:34Z

Paper focuses on 2 important aspects of Knowledge Distillation: Consistency & Patience.
In function matching, the authors quote knowledge distillation shouldn’t just be about matching the predictions on this target data and you should try to increase the support of the data distribution. So what they use here is something called mixup augmentation, you can use out-of-domain data or this sort of mix-up data way of interpreting between data points to match the function across the data distribution with an interesting view of the sample.
Another component of the Knowledge distillation training recipe is patience. Knowledge distillation benefits from long training schedules.
Results:

Het-Shah · 2021-09-02T08:00:17Z

Hi @manncodes! Thank you for opening this issue. Do you want to take it up?

Provide feedback