Loading…

Case Study: Optimization Methods with TVM Hybrid-OP on RISC-V Packed SIMD

In recent years, considerable research has focused on the use of custom hardware to accelerate deep learning on edge devices. However, the end-to-end flow of deep learning includes preprocessing and postprocessing. Deep learning hardware accelerators cannot accelerate these operations, which consequ...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2024-01, Vol.12, p.1-1
Main Authors: Yu, Meng-Shiun, Yuan, Chuan-Yue, Chen, Tai-Liang, Lee, Jenq-Kuen
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, considerable research has focused on the use of custom hardware to accelerate deep learning on edge devices. However, the end-to-end flow of deep learning includes preprocessing and postprocessing. Deep learning hardware accelerators cannot accelerate these operations, which consequently becomes a performance bottleneck in the execution flow. In this study, we propose optimization methods to improve preprocessing and postprocessing at the edge devices. For this purpose, we adopt Tensor Virtual Machine (TVM), an end-to-end machine learning compiler framework. TVM provides hybrid script, which is a front-end language that allows users to write programs for preprocessing and postprocessing.We propose rewriting strategies to improve the performance of operators written in hybrid script through the RISC-V Packed SIMD extension (P extension). RISC-V is an open instruction set architecture (ISA) that provides base instructions and many extensions for different use cases. The P extension defines specific subword single-instruction multiple-data (SIMD) instructions that allow complex computations to be efficiently performed on edge devices. In this study, we design custom instructions based on the RISC-V P extension for rewriting strategies to accelerate deep learning operations. Experimental results indicate that our methods improve performance by a factor of 1.28 to 15.29.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3397195