Meeting Banner
Abstract #2630

nuFFTW: A Parallel Auto-Tuning Library for Performance Optimization of the NuFFT

Mark Murphy1, Michal Zarrouk2, Kurt Keutzer2, Michael Lustig2

1Google, Mountain View, CA, United States; 2EECS, UC Berkeley, Berkeley, CA, United States

We present a fast, autotuned, Gridding-based non-uniform FFT library with parallel implementions on CPUs and GPUs for reconstructing from non-Cartesian data. The influence of a nuFFT implementation and parameter selection on the resulting runtime is non-trivial. Our auto-tuning approach empirically selects an optimal implementation per trajectory by searching over algorithms and parameters, and saves it for future reconstructions (i.e. parallel imaging). We show that the optimal implementation depends also on the target platform and the sampling pattern itself. We also present a heuristic for near-optimal selection when exhaustive search is prohibitively expensive.

Keywords

able acceleration account accuracy achieved achieves afforded aliasing amplitude architectural architecture arithmetic audience auto available bandwidth become benchmark beneficial best calculations chosen complete computation cones consequently convolution core currently define degree depend depends direct dynamic effective empirical empirically engineers equivalent error example exhaustive existing expensive external extremes fast faster flexibility full future generally generations grid gridded heuristic implementation implemented impractical infeasible influence installation isotropic kaiser kernel larger libraries library many mark matrices matrix measure measured memory micro minimized moreover mountain near none numerical optimal optimization optimized oversampling pairs parallel partial particularly pattern performance physics platform precomputed prime processing processor prohibitively propose proven ranging receivers reconstructing reconstruction reconstructions rely representing requires resolution runtime sample sampled sampling satisfy saves search searching selected selection selections selects simplified since space sparse spatial spectrum speed substantially suggested system target thesis throughput trajectories trajectory trans transfer trivial tuned tuning underlying uniform usage variety various view