You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Paper focuses on 2 important aspects of Knowledge Distillation: Consistency & Patience.
In function matching, the authors quote knowledge distillation shouldn’t just be about matching the predictions on this target data and you should try to increase the support of the data distribution. So what they use here is something called mixup augmentation, you can use out-of-domain data or this sort of mix-up data way of interpreting between data points to match the function across the data distribution with an interesting view of the sample.
Another component of the Knowledge distillation training recipe is patience. Knowledge distillation benefits from long training schedules. Results:
The text was updated successfully, but these errors were encountered:
Description
Results:
The text was updated successfully, but these errors were encountered: