Loading…
DDDG: A dual bi-directional knowledge distillation method with generative self-supervised pre-training and its hardware implementation on SoC for ECG
Nowadays, the increase in computing power and data volume boosts the development of deep learning. However, computational resources and the high cost of data labeling are two main obstacles to employing algorithms in various applications. Therefore, a novel method naming Dual Distillation Double Gai...
Saved in:
Published in: | Expert systems with applications 2024-06, Vol.244, p.122969, Article 122969 |
---|---|
Main Authors: | , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Nowadays, the increase in computing power and data volume boosts the development of deep learning. However, computational resources and the high cost of data labeling are two main obstacles to employing algorithms in various applications. Therefore, a novel method naming Dual Distillation Double Gains (DDDG) is proposed, it is a dual bi-directional knowledge distillation (KD) method with generative self-supervised pre-training. In a self-supervised manner, models are pre-trained with unlabeled data. KD can transfer knowledge from a large model to a lightweight one, which is more suitable for deployments on portable/mobile devices. Based on the teacher–student structure, a reconstructing teacher and a classifying teacher are pre-trained in advance. The reconstructing teacher distills knowledge for the student in pretext tasks by feature-based knowledge. The second distillation occurs in fine-tuning, the classifying teacher mentors the student with response-based knowledge. Both of the distillations are bi-directional, which also reinforce the teacher model in reverse. According to experimental results, F1 score of the student network in two datasets is improved by 8.69% and 9.26% respectively. This value for the teacher is 4.82% and 8.33%. Additionally, DDDG outperforms other state-of-the-art algorithms by 5.25% and 2.06% in F1. For practical applications, DDDG is deployed to a “system-on-a-chip” (SoC) in a heterogeneous manner. Employing ARM and FPGA, the designed system accelerates DDDG by 4.09 times than pure software deployment on the same SoC. The efficient model deployments in heterogeneous systems is promising to be applied to practical applications.
•Knowledge distillation and generative self-supervised learning are incorporated.•Dual distillations are contained in the stage of pre-training and fine-tuning.•Bi-directional knowledge distillation enhance teacher models in reverse.•Ultra-lightweight and well-performing student models are obtained.•Heterogeneously deploy models on resource-limited devices for real-time inference. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2023.122969 |