Govur University Logo
--> --> --> -->
...

During model distillation, what is the specific purpose of using the soft labels (probability distributions) produced by the teacher model instead of the hard ground-truth labels for the student model?



In model distillation, the primary purpose of using soft labels—which are the probability distributions generated by the teacher model’s output layer—instead of hard ground-truth labels is to capture the rich relational information embedded within the teacher's internal logic. A hard label provides only a single correct category, such as assigning a label of 1 to a cat and 0 to a dog ....

Log in to view the answer



Redundant Elements