Loading…

SQL/MapReduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions

A user-defined function (UDF) is a powerful database feature that allows users to customize database functionality. Though useful, present UDFs have numerous limitations, including install-time specification of input and output schema and poor ability to parallelize execution. We present a new appro...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings of the VLDB Endowment 2009-08, Vol.2 (2), p.1402-1413
Main Authors:	Friedman, Eric, Pawlowski, Peter, Cieslewicz, John
Format:	Article
Language:	English
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	A user-defined function (UDF) is a powerful database feature that allows users to customize database functionality. Though useful, present UDFs have numerous limitations, including install-time specification of input and output schema and poor ability to parallelize execution. We present a new approach to implementing a UDF, which we call SQL/MapReduce (SQL/MR), that overcomes many of these limitations. We leverage ideas from the MapReduce programming paradigm to provide users with a straightforward API through which they can implement a UDF in the language of their choice. Moreover, our approach allows maximum flexibility as the output schema of the UDF is specified by the function itself at query plan-time . This means that a SQL/MR function is polymorphic. It can process arbitrary input because its behavior as well as output schema are dynamically determined by information available at query plan-time, such as the function's input schema and arbitrary user-provided parameters. This also increases reusability as the same SQL/MR function can be used on inputs with many different schemas or with different user-specified parameters. In this paper we describe the motivation for this new approach to UDFs as well as the implementation within Aster Data Systems' n Cluster database. We demonstrate that in the context of massively parallel, shared-nothing database systems, this model of computation facilitates highly scalable computation within the database. We also include examples of new applications that take advantage of this novel UDF framework.
ISSN:	2150-8097 2150-8097
DOI:	10.14778/1687553.1687567