Loading…

Multi-modal Data Analysis and Fusion for Robust Object Detection in 2D/3D Sensing

Multi-modal data is useful for complex imaging scenarios due to the exclusivity of information found in each modality, but there is a lack of meaningful comparisons of different modalities for object detection. In our work, we propose three contributions: (1) Release of a multi-modal, ground-based s...

Full description

Saved in:
Bibliographic Details
Main Authors: Schierl, Jonathan, Graehling, Quinn, Aspiras, Theus, Asari, Vijay, Van Rynbach, Andre, Rabb, Dave
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Multi-modal data is useful for complex imaging scenarios due to the exclusivity of information found in each modality, but there is a lack of meaningful comparisons of different modalities for object detection. In our work, we propose three contributions: (1) Release of a multi-modal, ground-based small object detection dataset, (2) A performance comparison of 2D and 3D imaging modalities using state-of-the-art algorithms, and (3) a multi-modal fusion framework for 2D/3D sensing. The new dataset encompasses various small objects for detection in EO, IR, and LiDAR modalities. The labeled data has comparable resolutions across each modality for better performance analysis. The modality comparison conducted in this work uses advanced deep learning algorithms, such as Mask R-CNN for 2D imaging and PointNet++ for 3D imaging. The comparisons are conducted with similar parameter sizes and the results are analyzed for specific instances where each modality performed the best. To complement the effectiveness of different data modalities, we developed a fusion strategy to combine detection networks operating on different modalities into a single detection output for accurate object detection and region segmentation. Our fusion strategy utilized the state of the art networks listed above as backbone networks to obtain a confidence score from each modality. The network then determines which modality to base the object detection off of based on those confidences. The effectiveness of the proposed fusion method is being evaluated on the multi-modal dataset for object detection and segmentation and we observe superior performance when compared to single-modality algorithms.
ISSN:2332-5615
DOI:10.1109/AIPR50011.2020.9425039