Loading…

Learning Visual Representation Clusters for Cross-View Geo-Location

Cross-view geo-location is a crucial research field that determines the geographic location from images taken from different viewpoints. It is often studied as a retrieval task, where the query images are with unknown locations, and the database includes images with geo-tags from a different platfor...

Full description

Saved in:
Bibliographic Details
Published in:IEEE geoscience and remote sensing letters 2023, Vol.20, p.1-5
Main Authors: Song, Haoshuai, Wang, Zhen, Lei, Yi, Shi, Dianxi, Tong, Xiaochong, Lei, Yaxian, Qiu, Chunping
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Cross-view geo-location is a crucial research field that determines the geographic location from images taken from different viewpoints. It is often studied as a retrieval task, where the query images are with unknown locations, and the database includes images with geo-tags from a different platform. Learning image representations by neural networks is an important step, and one typical training method is using a classification loss, where cross-view images of the same locations are considered the same category. However, existing methods only focus on pushing the representation distances of different categories while ignoring the intracategory representation distances of samples from different platforms. Considering that controlling the intracategory distance can help to guide the model to extract compact category-sharing representations from cross-view images, we propose a categorized cluster loss to learn separate and compact representation clusters. Categorized cluster loss can supervise the network to learn invariant information from samples of different platforms by constraining both the intercategory and intracategory feature distances. Meanwhile, we design a category-view-stratified sampling strategy, which samples balanced inputs in terms of both category and view in each batch during the learning process. We implemented our approach with a lightweight OSNet-based network and achieved higher accuracy with fewer parameters on a typical and challenging cross-view geo-location dataset than most state-of-the-art (SOTA) methods.
ISSN:1545-598X
1558-0571
DOI:10.1109/LGRS.2023.3326005