Loading…
Frustratingly Easy Knowledge Distillation via Attentive Similarity Matching
Knowledge distillation is an effective approach to transferring knowledge from the large teacher network to its small proxy student one, thereby letting the proxy student work on those resource-limited mobile devices. Most previous arts manually select the paired intermediate layers of teacher and s...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Knowledge distillation is an effective approach to transferring knowledge from the large teacher network to its small proxy student one, thereby letting the proxy student work on those resource-limited mobile devices. Most previous arts manually select the paired intermediate layers of teacher and student networks to align their pertinent features by dimension reduction. This sort of approach may confront information loss and insufficient layer-wise alignment that limit knowledge transferability. In this paper, we propose a simple and effective knowledge distillation method named attentive similarity matching (ASM). ASM at first concatenates the teacher's intermediate features and the student's ones together to enhance similarity representation of all the student's layers, without involving dimension reduction, then align all cross-layer advanced similarities in an attentively weighted manner for semantic calibration. Experiments of image classification on three popular datasets show the effectiveness of the proposed method as compared to its previous cousins. |
---|---|
ISSN: | 2831-7475 |
DOI: | 10.1109/ICPR56361.2022.9956410 |