Loading…
Accelerating Point Clouds Classification in Dynamic Graph CNN with GPU Tensor Core
Point clouds play a crucial role in various fields such as robotics, 3D modeling, and autonomous driving. DGCNN, as a representative work in this domain, has exhibited superior performance compared to classic models like PointNet++. Such models often display significant computational demands and enc...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Point clouds play a crucial role in various fields such as robotics, 3D modeling, and autonomous driving. DGCNN, as a representative work in this domain, has exhibited superior performance compared to classic models like PointNet++. Such models often display significant computational demands and encounter notable performance bottlenecks, so how to accelerate their calculations has become a noteworthy issue. Nevertheless, due to the inherent disparities between point cloud tasks and conventional convolution tasks, existing algorithms and architectural paradigms designed for convolution frequently prove to be ill-suited for point clouds, such as Tensor Core (TC). There have been studies focused on reducing DGCNN model complexity to improve computational efficiency, but research specifically tailored to TC acceleration remains relatively limited. Therefore, we present an encapsulation module that bridges the computation of model in Torch level with the components of the CUDA backend, introducing TC into DGCNN. It dynamically adjusts parameters and computation flow based on input data and transforms batch computations into streaming computations. Additionally, we propose a new computational organization approach, which involves the restructuring of two critical computation steps within DGCNN: the Get Graph Feature operation and the Conv operation. It significantly enhances the computational speed of these two operations while reducing memory redundancy of tensor data involved in the intermediate processes. In the experiment, our modifications achieve an average acceleration of 1.80X for the Get Graph Feature operation and 1.46X for the Conv operation, all while not compromising result accuracy. |
---|---|
ISSN: | 2690-5965 |
DOI: | 10.1109/ICPADS60453.2023.00240 |