Loading…

FMGNet: An efficient feature-multiplex group network for real-time vision task

Lightweight network design is crucial for optimizing speed and accuracy in computer vision tasks on mobile platforms with limited resources. Widely adopted models, such as EfficientNet and RegNet have achieved significant success by integrating key elements like Pointwise Convolutions (PWConvs) and...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition 2024-12, Vol.156, p.110698, Article 110698
Main Authors: Zhang, Hao, Ma, Yongqiang, Zhang, Kaipeng, Zheng, Nanning, Lai, Shenqi
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Lightweight network design is crucial for optimizing speed and accuracy in computer vision tasks on mobile platforms with limited resources. Widely adopted models, such as EfficientNet and RegNet have achieved significant success by integrating key elements like Pointwise Convolutions (PWConvs) and Squeeze-and-Excitation (SE) blocks. However, a notable observation is that the output feature of the PWConv closely resembles its input, particularly in the absence of an activation function. This similarity and redundancy lead to wasted computational complexity and adversely affect the inference speed. To address these issues, we propose an efficient lightweight network called Efficient Feature-Multiplex Group Network (FMGNet). FMGNet is composed of two key components: the Cross-layer Feature-multiplex Group (CFG) block and the CFG-aligned Cross-layer Attention (CCA) block. The CFG block enables more compact feature learning with fewer parameters by multiplexing the input features of the PWConv. Meanwhile, the CCA block leverages the pre-modified features derived from the CFG block’s PWConv, allowing for simultaneous and parallel channel attention modeling. Our extensive experiments across various tasks, including image classification (ImageNet), object detection (PASCAL VOC), human pose estimation (MPII), person re-identification (Market-1501, DukeMTMC-ReID, CUHK03-NP), and semantic segmentation (Cityscapes), indicate that FMGNet achieves comparable performance to state-of-the-art lightweight convolutional neural networks, offering faster inference times. Remarkably, FMGNet even surpasses recent transformer-based models, such as SwiftFormer and EfficientFormerV2, achieving superior results with lower inference latency. •We propose a novel CFG block that effectively utilizes the feature redundancy before and after the PWConv without any accompanying activation function. And we introduce a more efficient attention mechanism, the CCA block, which is compatible with the CFG block.•We build a more efficient lightweight network FMGNet based on CFG block and CCA block. FMGNet achieves comparable performance with state-of-the-art models in image classification and even outperforms transformer-based models with lower latency.•Extensive experiments on wide tasks including object detection, human pose estimation, person re-identification, and semantic segmentation also indicate that our network is an excellent backbone.
ISSN:0031-3203
DOI:10.1016/j.patcog.2024.110698