Loading…

EMPViT: Efficient multi-path vision transformer for security risks detection in power distribution network

To maintain the safe operation of power distribution network (PDN) equipment, it is important to accurately and promptly identify security risks. However, conventional drone-based object detection methods face challenges due to noise and similarity features in risk targets, as well as limited comput...

Full description

Saved in:
Bibliographic Details
Published in:Neurocomputing (Amsterdam) 2025-02, Vol.617, p.128967, Article 128967
Main Authors: Li, Pan, Yuan, Xiaofang, Xu, Haozhi, Wang, Jinlei, Wang, Yaonan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:To maintain the safe operation of power distribution network (PDN) equipment, it is important to accurately and promptly identify security risks. However, conventional drone-based object detection methods face challenges due to noise and similarity features in risk targets, as well as limited computing resources of unmanned aerial vehicles (UAVs). To address these challenges, an efficient embedding-based multi-path fusion architecture is proposed. This architecture uses a re-parameterized depthwise block to embed local context information at different scales, enhancing the extraction of tiny features while preserving inference speed. Additionally, a coordinated self-attention module is proposed to reduce computational complexity while maintaining the performance of global information. By fusing fine and coarse feature representations without requiring a lot of computation, this module efficiently learns from both local and global features from images. The goal is to create an efficient multi-path vision transformer (EMPViT) architecture that achieves a balance between accuracy and efficiency. The proposed EMPViT has been evaluated on two different drone image dataset, demonstrating better performance compared to other architectures. Specifically, the EMPViT-S improves the detection mAP by 1.2%, and the inference speed is improved to 1.24 times on average on Drone-PDN dataset. It has achieved the same performance improvement on VisDrone-DET2019 dataset, gaining detection performance by 1.3% and 1.2 times acceleration on average. [Display omitted] •To enhance the extraction of small-scale features while maintaining inference speed, we employ reparameterization to separate token and channel mixers, which integrate re-parameterized depth blocks into multi-scale structures.•To reduce computational complexity, we have introduced a coordinated self-attention module that utilizes element-wise multiplication for self-attention calculations, instead of dot-product self-attention with high complexity.•The EMPViT effectively combines local and global features without requiring extensive computation, enabling it to learn fine and coarse features of images and strike a more optimal balance between accuracy and efficiency.
ISSN:0925-2312
DOI:10.1016/j.neucom.2024.128967