Loading…

Frustratingly Easy Knowledge Distillation via Attentive Similarity Matching

Knowledge distillation is an effective approach to transferring knowledge from the large teacher network to its small proxy student one, thereby letting the proxy student work on those resource-limited mobile devices. Most previous arts manually select the paired intermediate layers of teacher and s...

Full description

Saved in:
Bibliographic Details
Main Authors: Chen, Dingyao, Tan, Huibin, Lan, Long, Zhang, Xiang, Liang, Tianyi, Luo, Zhigang
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Knowledge distillation is an effective approach to transferring knowledge from the large teacher network to its small proxy student one, thereby letting the proxy student work on those resource-limited mobile devices. Most previous arts manually select the paired intermediate layers of teacher and student networks to align their pertinent features by dimension reduction. This sort of approach may confront information loss and insufficient layer-wise alignment that limit knowledge transferability. In this paper, we propose a simple and effective knowledge distillation method named attentive similarity matching (ASM). ASM at first concatenates the teacher's intermediate features and the student's ones together to enhance similarity representation of all the student's layers, without involving dimension reduction, then align all cross-layer advanced similarities in an attentively weighted manner for semantic calibration. Experiments of image classification on three popular datasets show the effectiveness of the proposed method as compared to its previous cousins.
ISSN:2831-7475
DOI:10.1109/ICPR56361.2022.9956410