Loading…
A Parallel Butterfly Algorithm
The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform $\int_{\mathbb{R}^d} K(x,y) g(y) dy$ at large numbers of target points when the kernel, $K(x,y)$, is approximately low-rank when restricted to subdomains satisfying a certain simpl...
Saved in:
Published in: | SIAM journal on scientific computing 2014-01, Vol.36 (1), p.C49-C65 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform $\int_{\mathbb{R}^d} K(x,y) g(y) dy$ at large numbers of target points when the kernel, $K(x,y)$, is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In $d$ dimensions with $O(N^d)$ quasi-uniformly distributed source and target points, when each appropriate submatrix of $K$ is approximately rank-$r$, the running time of the algorithm is at most $O(r^2 N^d \log N)$. A parallelization of the butterfly algorithm is introduced which, assuming a message latency of $\alpha$ and per-process inverse bandwidth of $\beta$, executes in at most $O(r^2 \frac{N^d}{p} \log N + (\beta r\frac{N^d}{p}+\alpha)\log p)$ time using $p$ processes. This parallel algorithm was then instantiated in the form of the open-source \textttDistButterfly library for the special case where $K(x,y)=\exp(i \Phi(x,y))$, where $\Phi(x,y)$ is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. [PUBLICATION ABSTRACT] |
---|---|
ISSN: | 1064-8275 1095-7197 |
DOI: | 10.1137/130921544 |