Loading…

Multi-modal fusion of satellite and street-view images for urban village classification based on a dual-branch deep neural network

•A novel dual-branch deep neural network is proposed for urban village classification.•Multi-modal satellite and street-view images are integrated to increase the performance.•The proposed model yields a high accuracy of 92.61% in Jing-Jin-Ji region of China.•The dataset, Satellite & Street-view...

Full description

Saved in:
Bibliographic Details
Published in:International journal of applied earth observation and geoinformation 2022-05, Vol.109, p.102794, Article 102794
Main Authors: Chen, Boan, Feng, Quanlong, Niu, Bowen, Yan, Fengqin, Gao, Bingbo, Yang, Jianyu, Gong, Jianhua, Liu, Jiantao
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•A novel dual-branch deep neural network is proposed for urban village classification.•Multi-modal satellite and street-view images are integrated to increase the performance.•The proposed model yields a high accuracy of 92.61% in Jing-Jin-Ji region of China.•The dataset, Satellite & Street-view images for Urban Village classification (S2UV), has been released publicly. With the rapid urbanization process in China, numerous urban villages have been appeared, which are surrounded by the newly-built urban blocks. Due to the high population density, poor hygiene, chaotic waste discharge, and inadequate public facilities, urban villages have many negative impacts on both urban environment and management. The objective of this study is to propose a dual-branch deep learning model for multi-modal satellite and street-view data fusion to detect urban villages in Beijing, Tianjin and Shijiazhuang, which are the core cities of Jing-Jin-Ji region of China. Specifically, the proposed model consists of a satellite branch, a street-view branch and a gated-fusion module. As for the satellite branch, a Trans-MDCNN (multi-scale dilated convolutional neural network) is proposed to learn multi-level local features and global contextual features from high resolution satellite imagery, while for the street-view branch, an MVRAN (multi-view recurrent attention network) is constructed to learn and fuse multi-angle features from street-view images. A gated-fusion module is designed to aggregate the important features of the dual-branches. Experimental results indicate that the proposed model has achieved good performance with an overall accuracy (OA) of 92.61%. Ablation study shows that compared with satellite data alone, the integration of street-view images could increase the OA by about 2%. Besides, 1-D feature fusion outperforms its 2-D counterpart and the classic feature concatenation method. The proposed model also yields a better performance than other deep learning models. Finally, the dataset of this study, S2UV (Satellite & Street-view images for Urban Village classification), is publicly available: https://doi.org/10.11922/sciencedb.01410.
ISSN:1569-8432
1872-826X
DOI:10.1016/j.jag.2022.102794