Loading…

Dense or Sparse : Elastic SPMM Implementation for Optimal Big-Data Processing

Many real-world graph datasets can be represented using a sparse matrix format, and they are widely used for various big-data applications. The multiplication of two sparse matrices (SPMM) is a major kernel for various machine learning algorithms when using a sparsely expressed dataset. Apache Spark...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on big data 2023-04, Vol.9 (2), p.637-652
Main Authors:	Choi, Unho, Lee, Kyungyong
Format:	Article
Language:	English
Subjects:	Cluster computing Indexes Machine learning algorithms optimal SPMM recommendation Predictive models spark optimization Sparks Sparse matrices Sparse matrix multiplication Task analysis
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Many real-world graph datasets can be represented using a sparse matrix format, and they are widely used for various big-data applications. The multiplication of two sparse matrices (SPMM) is a major kernel for various machine learning algorithms when using a sparsely expressed dataset. Apache Spark, a general-purpose big-data processing engine, includes the SPMM operation in its linear algebra package. The default Spark SPMM implementation, however, always converts a right sparse matrix to a dense format before performing multiplication, which can result in significant performance overhead for diverse SPMM scenarios. To address a limitation of the current Spark implementation, we describe an SPMM implementation that keeps the right matrix in a Compressed Sparse Column (CSC) format and propose an SPMM task latency prediction model based on a Deep Neural Network (DNN) architecture. Using the SPMM latency prediction model, we implement an elastic SPMM implementation recommendation service, which we name DoS ( D ense o r S parse). The proposed DoS recommends an optimal SPMM implementation method of either transforming a right matrix to a dense format or keeping it as a sparse format during the multiplication. Through evaluation of the proposed system using a real-world graph reveals that the proposed service can improve the SPMM latency of default Spark implementation by 2.2 times while shortening the overall execution time.
ISSN:	2332-7790
DOI:	10.1109/TBDATA.2022.3199197