Loading…
Learning Visual Representation Clusters for Cross-View Geo-Location
Cross-view geo-location is a crucial research field that determines the geographic location from images taken from different viewpoints. It is often studied as a retrieval task, where the query images are with unknown locations, and the database includes images with geo-tags from a different platfor...
Saved in:
Published in: | IEEE geoscience and remote sensing letters 2023, Vol.20, p.1-5 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Cross-view geo-location is a crucial research field that determines the geographic location from images taken from different viewpoints. It is often studied as a retrieval task, where the query images are with unknown locations, and the database includes images with geo-tags from a different platform. Learning image representations by neural networks is an important step, and one typical training method is using a classification loss, where cross-view images of the same locations are considered the same category. However, existing methods only focus on pushing the representation distances of different categories while ignoring the intracategory representation distances of samples from different platforms. Considering that controlling the intracategory distance can help to guide the model to extract compact category-sharing representations from cross-view images, we propose a categorized cluster loss to learn separate and compact representation clusters. Categorized cluster loss can supervise the network to learn invariant information from samples of different platforms by constraining both the intercategory and intracategory feature distances. Meanwhile, we design a category-view-stratified sampling strategy, which samples balanced inputs in terms of both category and view in each batch during the learning process. We implemented our approach with a lightweight OSNet-based network and achieved higher accuracy with fewer parameters on a typical and challenging cross-view geo-location dataset than most state-of-the-art (SOTA) methods. |
---|---|
ISSN: | 1545-598X 1558-0571 |
DOI: | 10.1109/LGRS.2023.3326005 |