Loading…

Parallel grid-based density peak clustering of big trajectory data

With the widespread adoption of data intensive applications such as navigation systems for mobile devices and unmanned vehicles, analyzing trajectory data has become a key research area. One of the main tasks is trajectory clustering, which consists of automatically grouping similar trajectories int...

Full description

Saved in:
Bibliographic Details
Published in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2022-12, Vol.52 (15), p.17042-17057
Main Authors: Niu, Xinzheng, Zheng, Yunhong, Fournier-Viger, Philippe, Wang, Bing
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the widespread adoption of data intensive applications such as navigation systems for mobile devices and unmanned vehicles, analyzing trajectory data has become a key research area. One of the main tasks is trajectory clustering, which consists of automatically grouping similar trajectories into clusters. To perform this task, Density Peak Clustering (DPC) is widely used due to its speed and small number of artificial parameters. However, a major problem is that its performance does not scale well for large datasets. To address this issue, this paper proposes an efficient parallel trajectory clustering algorithm, named Tra-PDPC (Trajectory-Parallel DPC). It is applied in three steps, namely trajectory division and partition, trajectory similarity calculation, and clustering. Those steps are all designed to run in a distributed fashion using the Spark programming model. For the first step, a scheme is proposed to divide sub-trajectories based on local grid area density. Then, a combined similarity measurement method based on Euclidean space and grid space is defined for sub-trajectories similarity calculation. Finally, a version of DPC is applied, which dramatically improves clustering speed. Experiments on multiple large realistic trajectory datasets have demonstrated that the proposed Tra-PDPC algorithm can considerably decrease runtime while providing a high accuracy.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-021-02757-w