Loading…
Improving First-stage Retrieval of Point-of-interest Search by Pre-training Models
Point-of-interest (POI) search is important for location-based services, such as navigation and online ride-hailing service. The goal of POI search is to find the most relevant destinations from a large-scale POI database given a text query. To improve the effectiveness and efficiency of POI search,...
Saved in:
Published in: | ACM transactions on information systems 2023-12, Vol.42 (3), p.1-27, Article 74 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Point-of-interest (POI) search is important for location-based services, such as navigation and online ride-hailing service. The goal of POI search is to find the most relevant destinations from a large-scale POI database given a text query. To improve the effectiveness and efficiency of POI search, most existing approaches are based on a multi-stage pipeline that consists of an efficiency-oriented retrieval stage and one or more effectiveness-oriented re-rank stages. In this article, we focus on the first efficiency-oriented retrieval stage of the POI search. We first identify the limitations of existing first-stage POI retrieval models in capturing the semantic-geography relationship and modeling the fine-grained geographical context information. Then, we propose a Geo-Enhanced Dense Retrieval framework for POI search to alleviate the above problems. Specifically, the proposed framework leverages the capacity of pre-trained language models (e.g., BERT) and designs a pre-training approach to better model the semantic match between the query prefix and POIs. With the POI collection, we first perform a token-level pre-training task based on a geographical-sensitive masked language prediction and design two retrieval-oriented pre-training tasks that link the address of each POI to its name and geo-location. With the user behavior logs collected from an online POI search system, we design two additional pre-training tasks based on users’ query reformulation behavior and the transitions between POIs. We also utilize a late-interaction network structure to model the fine-grained interactions between the text and geographical context information within an acceptable query latency. Extensive experiments on the real-world datasets collected from the Didichuxing application demonstrate that the proposed framework can achieve superior retrieval performance over existing first-stage POI retrieval methods. |
---|---|
ISSN: | 1046-8188 1558-2868 |
DOI: | 10.1145/3631937 |