Loading…

Mitigating carbon footprint for knowledge distillation based deep learning model compression

Deep learning techniques have recently demonstrated remarkable success in numerous domains. Typically, the success of these deep learning models is measured in terms of performance metrics such as accuracy and mean average precision (mAP). Generally, a model's high performance is highly valued,...

Full description

Saved in:

Bibliographic Details
Published in:	PloS one 2023-05, Vol.18 (5), p.e0285668-e0285668
Main Authors:	Rafat, Kazi, Islam, Sadia, Mahfug, Abdullah Al, Hossain, Md Ismail, Rahman, Fuad, Momen, Sifat, Rahman, Shafin, Mohammed, Nabeel
Format:	Article
Language:	English
Subjects:	Accuracy Analysis Artificial intelligence Benchmarking Biology and Life Sciences Carbon Dioxide Carbon dioxide emissions Carbon Footprint Classification Climate change Compression Computer and Information Sciences Costs Datasets Deep Learning Distillation Ecological footprint Edge computing Emissions Energy consumption Energy costs Engineering and Technology Environment models Equivalence Footprint analysis Forecasts and trends Image classification Information management Investigations Knowledge Life on Earth Lightweight Machine learning Mobile computing Model accuracy Modelling Object recognition People and Places Performance measurement Physical Phenomena Physical Sciences Research and Analysis Methods Social Sciences Stochasticity Teachers Tuning VOCs Volatile organic compounds
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Deep learning techniques have recently demonstrated remarkable success in numerous domains. Typically, the success of these deep learning models is measured in terms of performance metrics such as accuracy and mean average precision (mAP). Generally, a model's high performance is highly valued, but it frequently comes at the expense of substantial energy costs and carbon footprint emissions during the model building step. Massive emission of CO2 has a deleterious impact on life on earth in general and is a serious ethical concern that is largely ignored in deep learning research. In this article, we mainly focus on environmental costs and the means of mitigating carbon footprints in deep learning models, with a particular focus on models created using knowledge distillation (KD). Deep learning models typically contain a large number of parameters, resulting in a 'heavy' model. A heavy model scores high on performance metrics but is incompatible with mobile and edge computing devices. Model compression techniques such as knowledge distillation enable the creation of lightweight, deployable models for these low-resource devices. KD generates lighter models and typically performs with slightly less accuracy than the heavier teacher model (model accuracy by the teacher model on CIFAR 10, CIFAR 100, and TinyImageNet is 95.04%, 76.03%, and 63.39%; model accuracy by KD is 91.78%, 69.7%, and 60.49%). Although the distillation process makes models deployable on low-resource devices, they were found to consume an exorbitant amount of energy and have a substantial carbon footprint (15.8, 17.9, and 13.5 times more carbon compared to the corresponding teacher model). The enormous environmental cost is primarily attributable to the tuning of the hyperparameter, Temperature (τ). In this article, we propose measuring the environmental costs of deep learning work (in terms of GFLOPS in millions, energy consumption in kWh, and CO2 equivalent in grams). In order to create lightweight models with low environmental costs, we propose a straightforward yet effective method for selecting a hyperparameter (τ) using a stochastic approach for each training batch fed into the models. We applied knowledge distillation (including its data-free variant) to problems involving image classification and object detection. To evaluate the robustness of our method, we ran experiments on various datasets (CIFAR 10, CIFAR 100, Tiny ImageNet, and PASCAL VOC) and models (ResNet18, MobileNetV2, Wrn
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0285668