In model distillation, the primary purpose of using soft labels—which are the probability distributions generated by the teacher model’s output layer—instead of hard ground-truth labels is to capture the rich relational information embedded within the teacher's internal logic. A hard label provides only a single correct category, such as assigning a label of 1 to a cat and 0 to a dog ....
Log in to view the answer