Loading…

A Comparison Study: Building Identification in Aerial Imagery Using Multi-Task Learning

There is an increase in the availability of remote sensing images which has made it possible to track the impact that human activities have on their environments at a global scale. CNN-based encoder-decoder models achieve noteworthy results on semantic segmentation tasks but often struggle to correc...

Full description

Saved in:
Bibliographic Details
Main Authors: Proscovia, N., Grobler, T. L.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:There is an increase in the availability of remote sensing images which has made it possible to track the impact that human activities have on their environments at a global scale. CNN-based encoder-decoder models achieve noteworthy results on semantic segmentation tasks but often struggle to correctly identify border pixels. To correctly detect buildings in an image it is, however, equally important to achieve highly accurate results for the segmentation and the border identification task. In this paper, we carry out experiments on four state-of-the-art architectures namely FCN-8s, U-Net, DeepLabV3+, Pyramid Scene Parsing Network(PSPNet) and compare them both quantitatively and qualitatively with one another when they are employed to perform building segmentation and boundary detection. The impact on performance when these two tasks are performed simultaneously are also estimated. In particular, we perform multi-task learning by adding a second prediction branch to the decoder networks, whose primary purpose is to do boundary prediction. We evaluate the performance of our models on the Inria aerial image labeling dataset. The qualitative and quantitative experimental results show that on average the multi-task models perform better than the baseline models. The performance of the models was also evaluated using the boundary error rate at various boundary thicknesses. The multi-task U-Net model performs the best; achieving an accuracy of 95.91%. Moreover, the multi-task DeepLabV3+ model gives the best qualitative results, while the multi-task PSPNet model offers the best trade-off in terms of accuracy and computational resources. It achieves an accuracy of 95.86% and requires fewer parameters than U-Net.
ISSN:2153-7003
DOI:10.1109/IGARSS52108.2023.10282906