Loading…
SQL/MapReduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions
A user-defined function (UDF) is a powerful database feature that allows users to customize database functionality. Though useful, present UDFs have numerous limitations, including install-time specification of input and output schema and poor ability to parallelize execution. We present a new appro...
Saved in:
Published in: | Proceedings of the VLDB Endowment 2009-08, Vol.2 (2), p.1402-1413 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A user-defined function (UDF) is a powerful database feature that allows users to customize database functionality. Though useful, present UDFs have numerous limitations, including
install-time
specification of input and output schema and poor ability to parallelize execution. We present a new approach to implementing a UDF, which we call SQL/MapReduce (SQL/MR), that overcomes many of these limitations. We leverage ideas from the MapReduce programming paradigm to provide users with a straightforward API through which they can implement a UDF in the language of their choice. Moreover, our approach allows maximum flexibility as the output schema of the UDF is specified by the function itself at
query plan-time
. This means that a SQL/MR function is polymorphic. It can process arbitrary input because its behavior as well as output schema are dynamically determined by information available at query plan-time, such as the function's input schema and arbitrary user-provided parameters. This also increases reusability as the same SQL/MR function can be used on inputs with many different schemas or with different user-specified parameters.
In this paper we describe the motivation for this new approach to UDFs as well as the implementation within Aster Data Systems'
n
Cluster database. We demonstrate that in the context of massively parallel, shared-nothing database systems, this model of computation facilitates highly scalable computation within the database. We also include examples of new applications that take advantage of this novel UDF framework. |
---|---|
ISSN: | 2150-8097 2150-8097 |
DOI: | 10.14778/1687553.1687567 |