Mark Murphy1,
Michal Zarrouk2, Kurt Keutzer2, Michael Lustig2
1Google,
Mountain View, CA, United States; 2EECS, UC Berkeley, Berkeley,
CA, United States
We present a fast, autotuned, Gridding-based non-uniform FFT library with parallel implementions on CPUs and GPUs for reconstructing from non-Cartesian data. The influence of a nuFFT implementation and parameter selection on the resulting runtime is non-trivial. Our auto-tuning approach empirically selects an optimal implementation per trajectory by searching over algorithms and parameters, and saves it for future reconstructions (i.e. parallel imaging). We show that the optimal implementation depends also on the target platform and the sampling pattern itself. We also present a heuristic for near-optimal selection when exhaustive search is prohibitively expensive.