Meeting Banner
Abstract #2551

High-Performance Gridding on Modern X86-Based Multi-Core Systems for 3D Non-Cartesian MRI

Dhiraj D. Kalamkar1, Joshua D. Trzasko2, Srinivas Sridharan1, Mikhail Smelyanskiy3, Daehyun Kim3, Yunhong Shu4, Matt A. Bernstein4, Bharat Kaul1, Pradeep Dubey3, Armando Manduca2

1Parallel Computing Lab, Intel Labs, Bangalore, KA, India; 2Mayo Clinic, Rochester, MN, United States; 3Parallel Computing Lab, Intel Labs, Santa Clara, CA, United States; 4Department of Radiology, Mayo Clinic, Rochester, MN, United States

With increasing usage of higher-resolution acquisitions, more receiver channels, and iterative reconstruction strategies, the ability to quickly and accurately transform an image to and from k-space, known as reverse gridding and gridding, is crucial for non-Cartesian MRI applications. In practice, both of these operations are typically realized via the non-uniform fast Fourier transform (NUFFT). In this work, we propose a novel preprocessing and parallelization strategy for both the forward and adjoint NUFFT targeted for x86 architectures. We demonstrate that this implementation strategy, which is based on a variable-size geometric partitioning along with a barrier-free task queue, and selective privatization, is substantially faster than contemporary x86 implementations, and computationally competitive with state-of-the-art GPU implementations.

Keywords

accurately achieve acquisitions adjoint advanced advances applications approaches arising available barrier beginning beyond blocks breakdown challenging circumvent clinic code competitive comprises computational computationally computing conditions contained convolution core create dependence dependency despite direct distribution domain dominates dual earlier easily entire equal evaluation even example execute execution exploit fact fast faster forward graph greater grid hardware identical implementation implementations incorporating interpolation iterative kaiser labs least library like limited long magnetization make mayo modern moreover need normally novel oblique onto operation operations optimization optimizations overhead oversampled overview parallel parallelization parameterized particularly partition performance practical prepared preprocessing privatization problem proposed queue quickly race realizable realized reconstruction reconstructions reduced reduces reduction reorder require requiring reverse running sample samples sampling scalability schedule scheduled scheduling sensing separated series share significantly simulated since socket space speedup speedups still strategies strategy substantially successive swirls system systems takes targeted task tasks trajectories transform trivial typical uniform usage utilized variable width