Loading…
An enhanced SSD with feature fusion and visual reasoning for object detection
Single Shot Multibox Detector (SSD) is one of the top performing object detection algorithms in terms of both accuracy and speed. SSD achieves impressive performance on various datasets by using different output layers for object detection. However, each layer in the feature pyramid is used independ...
Saved in:
Published in: | Neural computing & applications 2019-10, Vol.31 (10), p.6549-6558 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Single Shot Multibox Detector (SSD) is one of the top performing object detection algorithms in terms of both accuracy and speed. SSD achieves impressive performance on various datasets by using different output layers for object detection. However, each layer in the feature pyramid is used independently, and SSD considers only the fine-grained details of the objects but ignores the context surrounding objects. In this paper, we proposed an enhanced SSD, called ESSD, that improved the performance of the conventional SSD by fusing feature maps of different output layers, instead of growing layers close to the input data. Our method used two-way transfer of feature information and feature fusion to enhance the network. To assist further with object detection, we proposed a visual reasoning method that utilized fully the relationships between objects instead of using only the features of the objects themselves. This addition of visual reasoning proved very effective for detecting objects that are too small or have small features. To evaluate the proposed ESSD, we trained the model with VOC2007 and VOC2012 training sets and evaluated the performance on the Pascal VOC2007 test set. For
300
×
300
input, ESSD achieved 79.2% mean average precision (mAP) at 52.0 frames per second (FPS), and for
512
×
512
input, this approach achieved 82.4% mAP at 18.6 FPS. These results demonstrated that our proposed method can achieve state-of-the-art mAP, which is a better result than provided by the conventional SSD and other advanced detectors. |
---|---|
ISSN: | 0941-0643 1433-3058 |
DOI: | 10.1007/s00521-018-3486-1 |