Welcome to PyNUFFT’s User Manual!

Overview

The Python Non-uniform fast Fourier transform (PyNUFFT)

Purpose

The PyNUFFT user manual documents the Python non-uniform fast Fourier transform, a Python package for non-uniform fast Fourier transform.

PyNUFFT was created for practical purposes in industry and in research.

If you find PyNUFFT useful, please cite:

Lin, Jyh-Miin. “Python Non-Uniform Fast Fourier Transform (PyNUFFT): An Accelerated Non-Cartesian MRI Package on a Heterogeneous Platform (CPU/GPU).” Journal of Imaging 4.3 (2018): 51.

and

J.-M. Lin and H.-W. Chung, Pynufft: python non-uniform fast Fourier transform for MRI Building Bridges in Medical Sciences 2017, St John’s College, CB2 1TP Cambridge, UK

Users of PyNUFFT should be familiar with discrete Fourier transform (DFT).

The min-max interpolator

  • PyNUFFT translates the min-max interpolator to Python. The min-max interpolator is described in the literature:

Fessler JA, Sutton BP. Nonuniform fast Fourier transforms using min-max interpolation. IEEE Trans Signal Process 2003;51(2):560-574.

Current status of PyNUFFT

  • The current PyNUFFT offers NUFFT() or Reikan/PyCUDA/PyOpenCL (NUFFT(device), device is in helper.device_list()).

Switch between CPU and GPU by selecting device = pynufft.helper.device_list[0] (0 is the first device in the system)

  • LGPLv3 and AGPL (for web service)

_images/speed_accuracy_comparisons.png

Multi-dimensional NUFFT

Multi-dimensional transforms are supported by PyNUFFT.

The dimensionality of an imaging reconstruction problem is revealed as the number of axes of the Nd (or Kd, Jd) tuples.

For example, Nd = (256,256) indicates a 2D imaging reconstruction problem, in which the sizes of the x-y axes are 256 and 256, respectively.

Normally, the matrix size of the k-space is twice the size of Nd. For example, Kd = (512,512) is appropriate for the above Nd = (256,256) problem.

In batch mode, the ‘batch’ argument controls the number of channels. This will not affect the dimensionality of the image reconstruction problem. The batch model will be detailed in the ‘batched NUFFT’ section.

Fig. 1 illustrates the variables for 1D, 2D, 3D NUFFT.

_images/configuration_nufft.png

Fig. 1 Configuration of 1D, 2D, and 3D NUFFT. (A) 1D NUFFT: om is a numpy.array of the shape (M,1). M is the number of non-Cartesian points. Nd = (8, ) is the image domain grid size and Kd = (16, ) is the oversampled grid size. Jd = (6, ) is the interpolator size. (B) 2D NUFFT: om is a numpy.array of the shape (M,2). M is the number of non-Cartesian points. Nd = (8, 8 ) is the image domain grid size and Kd = (16, 16 ) is the oversampled grid size. Jd = (6, 6 ) is the interpolator size. (C) 3D NUFFT: om is a numpy.array of the shape (M,3). M is the number of non-Cartesian points. Nd = (8, 8, 8 ) is the image domain grid size and Kd = (16, 16, 16 ) is the oversampled grid size. Jd = (6, 6, 6 ) is the interpolator size.

CPU and GPU (HSA)

The PyNUFFT ran originally on Numpy/Scipy. Unfortunately the default Numpy/Scipy is most efficient on a single CPU core.

Later it was ported to PyCUDA and PyOpenCL, which allows us to leverage the speed of multi-core CPU and GPU.

Mixing NUFFTs with CPU/GPU is possible but has no warranty.

The class methods are listed in Table 1

Table 1 Methods implemented in NUFFT

method name

NUFFT()

NUFFT(helper.device_list[0])

References

__init__()

Constructor

plan()

Planning the instance

forward()

Forward NUFFT A

adjoint()

Adjoint NUFFT A^H

offload()

×

Offload the NUFFT_hsa() to device.

x2xx()

Apply the scaling factor

xx2k()

Oversampled FFT

k2y()

Interpolation

k2vec()

×

Reshape the k-space to the vector

vec2y()

×

Multiply the vector to generate the data

vec2k()

×

Reshape the vector to k-space

y2vec()

×

Multiply the data to get the vector

y2k()

Adjoint of k2y()

k2xx()

Inverse FFT (excessive parts are cropped)

xx2x()

Apply the scaling factor

_precompute

Apply the scaling factor

Parameters of PyNUFFT

Below we summarize the required variables in Table 2

Table 2 Parameters of the plan() method

Parameter

NUFFT

NUFFT(helper.device_list()[0])

References

om (Numpy Array)

Non-Cartesian coordinates (M, dim)

Nd (tuple)

Size of the image grid

Kd (tuple)

Size of the oversampled Fourier grid

Jd (tuple)

Size of the interpolator

ft_axes (tuple)

optional

optional

FFT on the given axes (default = None (all axes))

radix (int)

×

optional

radix (default = 1)

Installation

System requirements

CPU

Each PyNUFFT instance is designed to be executed on a single node. PyNUFFT has no built-in distributed computing on multiple nodes, but the users can design their own one.

For example, users may install PyNUFFT on multiple nodes and control them through the network.

Multple NUFFT instances on a single node provided that the total memory is sufficient to keep all instances.

We recommend one or more modern x86_64 processors on a single node. Successful stories include Intel® Core™ i7-6700HQ Processor, Intel® Xeon® W-2125, Intel® Core™ i9-7900X.

Memory

A general instruction is that the memory should be sufficient for computing a single NUFFT object, which is dependent on the type of problem.

A single 2D problem of 256 × 256 matrix can be computed on a system with 8GB memory.

For 3D NUFFT, it is not uncommon for a single NUFFT object to consume more than 200GB memory.

GPU

Each PyNUFFT instance is initiated on a single GPU. An instance cannot be distributed across multiple GPUs. (PyNUFFT does not use cuFFT.)

However, multiple NUFFT_hsa instances may be initiated both on a single GPU and on multiple GPUs, but the performance may be impacted (limited by the memory, PCI-E bus or GPU cores).

To use GPU, a recent NVIDIA’s GPU (after Maxwell, Pascal) with the recent drivers should be working properly.

The newest nvidia-driver versions of 510.54 or later is recommended. Earlier versions may work but please be informed that Nvidia may discontinue support for outdated drivers.

A general rule is that the memory on the GPU has to be sufficient for computing the NUFFT problem. Successful stories include NVIDIA Geforce GTX 965m/GTX 1070 maxQ/1060 6GB, NVIDIA Titan V100 (Amazon Web Services), NVIDIA Titan X Pascal, and NVIDIA Quadro P6000.

Operating System

Ubunut 16.04 - 18.04 are recommended.

Windows 10 has been tested but it requires Microsoft Studio 2015 community. Please refer to the following special topic about the installation under Windows 10.

Software

Python

Users must be familiar with Python and its pip packaging system. Python 3.9 - 3.10 are currently supported (Python 2 has been discontinued).

To run the NUFFT, the basic CPython, Numpy and Scipy packages must be available on the system. IronPython is compatible with CPython so ipython might be useful.

PyNUFFT can be installed through the pip command. Optionally, users can clone the github repository and build the package from the local folder.

Compiler

NUFFT class does not require a compiler.

However, NUFFT relies on the JIT (just-in-time) compilation mechanism of Reikna/PyCUDA/PyOpenCL. The supporting compiler may be:

  • gcc-11.2.1

  • Microsoft (R) Visual Studio 2022 community edition.

(Please refer to the following section: special topic: Installation under Windows 10).

To accelerate the code on the graphic processing unit (GPU), Reikna, PyCUDA, PyOpencl must be available. Please refer the following special topic: Installation of OpenCL.

CUDA programming skills are not strictly needed. However, it may be helpful if users understand GPU programming.

General Installation

Continuum’s Anaconda environment should provide all the above packages.

Installation using pip

Install pynufft by using the pip command:

$ pip install pynufft

Installation from Git Repository

git is a version control program, which allows you to clone the latest code base from the pynufft repository:

$ git clone https://github.com/jyhmiinlin/pynufft
$ cd pynufft
$ python setup.py install --user

Uninstall pynufft

Simply use “pip uninstall” to remove pynufft:

$ pip uninstall pynufft

Test whether the installation is successful (deprecated)

Special topics

Installation under Linux

Ubuntu 18.04 - 20.04

A successful example is Ubuntu.

The working kernel shown in uname -a is 4.15.0-42-generic #45-Ubuntu.

The Nvidia related packages are: Nvidia-driver Version: 410.78 CUDA Version: 10.0

Gentoo

PyNUFF works with the package in Gentoo system: dev-util/nvidia-cuda-toolkit-11.6.1 dev-util/intel-ocl-sdk-4.4.0.117-r1, and x11-drivers/nvidia-drivers-510.54.

Installation under Windows 10

PyNUFFT has been tested under the Windows 10 home edition, with Anaconda3-2021.11 64-bit, PyCUDA 2021.1 from official pip, Microsoft Visual Studio 2022 Community, and CUDA 11.6.

Pre-requisites

  • A NVIDIA GPU and a clean Windows 10

  • Install the Microsoft Visual Studio 2022 Community (check cl.exe, see troubleshooting).

  • Install CUDA-11.6 and the related driver.

Now open command prompt cmd, type

nvcc -V

You will see

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:09_12_Central_Daylight_time_2018
Cuda compilation tool, release 9.2, V9.2.148
  • Troubleshotting: Add the environmental variable path.

If the system cannot find cl.exe when you type cl:

C:\Users\User>cl
`cl` is not recognized as an internal or external command,
operable program or batch file.

this error is due to the fact that Visual Studio has not been added to the system path, so the system cannot find cl.

Add Path “C:\Program Files\Microsoft Visual Studio\2022\Community\VC\ToolsMSVC14.31.31103\bin\Hostx64x64” to system variables. (see troubleshotting)

Once Visual Studio has been added to the system, open Windows cmd and it should find cl.exe

C:\Users\User>cl
Microsoft (R) C/C++ Optimizing Compiler Version 19.0024215.1 for x86
Copyright (C) Microsoft Corpooration. All rights reserved.

usage: cl [ option... ] filename... [ /link linkoption... ]

without the earlier error message. If the error persists, the path must be modified again.

Installation of Anaconda3

  • Now install Anaconda3. I downloaded Anaconda3-2021.11 64-bit. Once this is done you can follow the general installation procedure as described above.

Installation of PyCUDA, Reikna and PyNUFFT

  • Open Anaconda3 command prompt, type:

    pip install pycuda
    pip install reikna
    pip install pynufft
    
  • Test PyNUFFT:

    python
    from pynufft import tests
    tests.test_init()
    

Installation of OpenCL

OpenCL is one of the backends that PyNUFFT supports. Up to the present, PyNUFFT has used OpenCL-1.2. One missing feature of OpenCL-1.2 is atomicAdd for the array with floating point numbers. PyNUFFT makes use of atomic_cmpxchg (compare and exchange) to implement the atomic_add_float subroutine, which can be seen in the pynufft.src.re_subroutine.atomic_add. This code has appeared in many resources, e.g. http://simpleopencl.blogspot.com/2013/05/atomic-operations-and-floats-in-opencl.html and https://github.com/clMathLibraries/clSPARSE/blob/master/src/library/kernels/csrmv_adaptive.cl.

Note that the OpenCL standard is still evolving and all of the OpenCL supports may change quickly. Old sdk may not work with the newest Intel chipsets. Please try different versions of the hardware and software.

The current compiler version is gcc version 7.3.0. Other compilers may be used on the target system but I haven’t tested any of them.

Intel OpenCL

Intel HD graphics after the Skylake generation usually support OpenCL as long as the suitable intel-sdk is installed.

One OpenCL example is a Gigabyte aero 15 Gentoo Linux on a machine with Intel Corporation HD Graphics 530 (rev 06). The dev-util/intel-ocl-sdk-4.4.0.117-r1 installed the intel_sdk_for_ocl_applications_2014_ubuntu_4.4.0.117_x64.tgz opencl package. The clinfo command shows

Platform Name                                   Intel(R) OpenCL
Number of devices                                 1
Device Name                                     Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Device Vendor                                   Intel(R) Corporation
Device Vendor ID                                0x8086
Device Version                                  OpenCL 1.2 (Build 117)
Driver Version                                  1.2.0.117
Device OpenCL C Version                         OpenCL C 1.2
Device Type                                     CPU
Device Profile                                  FULL_PROFILE
Device Available                                Yes
Compiler Available                              Yes
Linker Available                                Yes
Max compute units                               8
Max clock frequency                             2600MHz
Device Partition                                (core)
  Max number of sub-devices                     8
  Supported partition types                     by counts, equally, by names (Intel)
  Supported affinity domains                    (n/a)
Max work item dimensions                        3
Max work item sizes                             8192x8192x8192
Max work group size                             8192
Preferred work group size multiple              128
Preferred / native vector sizes
  char                                                 1 / 32
  short                                                1 / 16
  int                                                  1 / 8
  long                                                 1 / 4
  half                                                 0 / 0        (n/a)
  float                                                1 / 8
  double                                               1 / 4        (cl_khr_fp64)
Half-precision Floating-point support           (n/a)
Single-precision Floating-point support         (core)
  Denormals                                     Yes
  Infinity and NANs                             Yes
  Round to nearest                              Yes
  Round to zero                                 No
  Round to infinity                             No
  IEEE754-2008 fused multiply-add               No
  Support is emulated in software               No
  Correctly-rounded divide and sqrt operations  No
Double-precision Floating-point support         (cl_khr_fp64)
  Denormals                                     Yes
  Infinity and NANs                             Yes
  Round to nearest                              Yes
  Round to zero                                 Yes
  Round to infinity                             Yes
  IEEE754-2008 fused multiply-add               Yes
  Support is emulated in software               No
Address bits                                    64, Little-Endian
Global memory size                              33613447168 (31.3GiB)
Error Correction support                        No
Max memory allocation                           8403361792 (7.826GiB)
Unified memory for Host and Device              Yes
Minimum alignment for any data type             128 bytes
Alignment of base address                       1024 bits (128 bytes)
Global Memory cache type                        Read/Write
Global Memory cache size                        262144 (256KiB)
Global Memory cache line size                   64 bytes
Image support                                   Yes
  Max number of samplers per kernel             480
  Max size for 1D images from buffer            525210112 pixels
  Max 1D or 2D image array size                 2048 images
  Max 2D image size                             16384x16384 pixels
  Max 3D image size                             2048x2048x2048 pixels
  Max number of read image args                 480
  Max number of write image args                480
Local memory type                               Global
Local memory size                               32768 (32KiB)
Max number of constant args                     480
Max constant buffer size                        131072 (128KiB)
Max size of kernel argument                     3840 (3.75KiB)
Queue properties
  Out-of-order execution                        Yes
  Profiling                                     Yes
  Local thread execution (Intel)                Yes
Prefer user sync for interop                    No
Profiling timer resolution                      1ns
Execution capabilities
  Run OpenCL kernels                            Yes
  Run native kernels                            Yes
  SPIR versions                                 1.2
printf() buffer size                            1048576 (1024KiB)
Built-in kernels                                (n/a)
Device Extensions                               cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_intel_exec_by_local_thread cl_khr_depth_images cl_khr_3d_image_writes cl_khr_fp64

Pure CPU system without Intel HD graphics may require the newest Intel SDK for OpenCL https://software.intel.com/en-us/intel-opencl and https://software.intel.com/en-us/articles/opencl-drivers. One pure CPU system with Intel i7 7900X can make use of Intel Studio 2019.

Nvidia OpenCL

NVIDIA also supports OpenCL 1.2. A successful installation made use of nvidia-driver 417.18 and CUDA-SDK-9.2.88 and gcc 7.3.0. clinfo shows

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     GeForce GTX 1060
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.2 CUDA
  Driver Version                                  415.18
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               10
  Max clock frequency                             1670MHz
  Compute Capability (NV)                         6.1
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple              32
  Warp size (NV)                                  32
  Preferred / native vector sizes
    char                                                 1 / 1
    short                                                1 / 1
    int                                                  1 / 1
    long                                                 1 / 1
    half                                                 0 / 0        (n/a)
    float                                                1 / 1
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              6373572608 (5.936GiB)
  Error Correction support                        No
  Max memory allocation                           1593393152 (1.484GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        163840 (160KiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             16384x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                16
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Kernel execution timeout (NV)                 No
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer

AMD GPU

AMD has a very good support for OpenCL. See AMDGPU-PRO driver. AMD usually performs very well for fp64 (double precision PyNUFFT is available on request).

Open-source Intel Compute OpenCL (Intel-NEO and Intel SDK)

Try to install Intel’s proprietary OpenCL sdk.

Tutorial

Basic use of PyNUFFT

This section navigates you through the basic use of PyNUFFT.

Initiating a PyNUFFT object

We can initiate a PyNUFFT by importing the NUFFT object:

# import NUFFT class
from pynufft import NUFFT

# Initiate the NufftObj object
NufftObj = NUFFT()

The NufftObj object has been created but at this point it is still empty.

Now we have to plan the NufftObj by calling the plan() method. The plan() method takes the input variables and plans for the object. Now we can plan for the NufftObj object given the non-Cartesian coordinates (om).

In the following code we have 100 random samples spreading across the 2D plane.

# generating 2D random coordinates
import numpy
om = numpy.random.randn(100, 2)

Plan for the NUFFT object

Now we call:

NufftObj.plan(om, Nd, Kd, Jd)

See :py:class: pynufft.NUFFT

Forward NUFFT

The forward NUFFT transforms the image into non-Cartesian samples.

y = NufftObj.forward(image)

The image.shape is equal to Nd. The returned y has a shape which is equal to (M, )

See pynufft.NUFFT.forward()

Adjoint NUFFT

The adjoint NUFFT transforms the non-Cartesian samples into the image

x2 = NufftObj.adjoint(y)

y has a shape which is equal to (M, ). The returned image.shape is equal to Nd. Note that the result is divided by a scaling factor numpy.prod(NufftObj.Kd), which is equal to the default normalization factor of numpy.fft.ifft.

The 1D example

Import pynufft module

In python environment, import pynufft module:

from pynufft import NUFFT

Create a pynufft object NufftObj:

NufftObj = NUFFT()

Planning

The M locations of the non-uniform samples (om) must be provided:

import numpy
om = numpy.random.randn(1512,1)
# om is an M x 1 ndarray: locations of M points. *om* is normalized between [-pi, pi]
# Here M = 1512

In addition, the size of time series (Nd), oversampled grid (Kd), and interpolatro size (Jd) are:

Nd = (256,)
Kd = (512,)
Jd = (6,)

Now provide NufftObj with these parameters:

NufftObj.plan(om, Nd, Kd, Jd)

Forward transform

Now NufftObj has been prepared and is ready for computations. Continue with an example.:

import numpy
import matplotlib.pyplot as pyplot
time_data = numpy.zeros(256, )
time_data[96:128+32] = 1.0
pyplot.plot(time_data)
pyplot.ylim(-1,2)
pyplot.show()

This generates a time series Fig. 2.

_images/box_function.png

Fig. 2 A box function time series

NufftObj transform the time_data to non-Cartesian locations:

nufft_freq_data =NufftObj.forward(time_data)
pyplot.plot(om,nufft_freq_data.real,'.', label='real')
pyplot.plot(om,nufft_freq_data.imag,'r.', label='imag')
pyplot.legend()
pyplot.show()

This displays the non-Cartesian spectrum Fig. 3.

_images/non_Cartesian_spectrum.png

Fig. 3 Non-Cartesian spectrum of box function in Fig. 2. Note the non-uniform density.

Signal restoration through “solve()”

The signal can be solved by the solve() method

restore_time = NufftObj.solve(nufft_freq_data,'cg', maxiter=30)
restore_time2 = NufftObj.solve(nufft_freq_data,'L1TVOLS', maxiter=30,rho=1)

Now display the restored signals:

im1,=pyplot.plot(numpy.abs(time_data),'r',label='original signal')
im3,=pyplot.plot(numpy.abs(restore_time2),'k--',label='L1TVOLS')
im4,=pyplot.plot(numpy.abs(restore_time),'r:',label='conjugate_gradient_method')
pyplot.legend([im1, im3,im4])
pyplot.show()
_images/script_1D_solve.png

Fig. 4 Signals restored by “solve()”. L1TVOLS and L1TVOLS are close to Fig. 2, whereas cg is subject to distortion.

The complete code is:

The 2D example

Import pynufft module

In python environment, import pynufft module and other packages:

import numpy
import scipy.misc
import matplotlib.pyplot

from pynufft import NUFFT

Loading the X-Y locations(“om”)

It requires the x-y coordinates of M points to plan NufftObj.

A 2D trajectory from my PROPELLER MRI research is provided in the pynufft package.:

import pkg_resources
DATA_PATH = pkg_resources.resource_filename('pynufft', './src/data/')
om = numpy.load(DATA_PATH+'om2D.npz')['arr_0']

The M locations of non-uniform samples (om) forms an M x 2 numpy.ndarray

print(om)

[[-3.12932086  0.28225246]
[-3.1047771   0.28225246]
[-3.08023357  0.28225246]
 ....
[-2.99815702  0.76063216]
[-3.02239823  0.76447165]
[-3.04663992  0.76831114]]

You can see the 2D M locations by plotting x versus y:

matplotlib.pyplot.plot(om[::10,0],om[::10,1],'o')
matplotlib.pyplot.title('non-uniform coordinates')
matplotlib.pyplot.xlabel('axis 0')
matplotlib.pyplot.ylabel('axis 1')
matplotlib.pyplot.show()

As can be seen in Fig. 5:

_images/propeller_trajectory.png

Fig. 5 The 2D PROPELLER trajectory of M points.

Planning Create a pynufft object NufftObj:

NufftObj = NUFFT()

Provided om, the size of time series (Nd), oversampled grid (Kd), and interpolatro size (Jd)

Nd = (256, 256)  # image size
print('setting image dimension Nd...', Nd)
Kd = (512, 512)  # k-space size
print('setting spectrum dimension Kd...', Kd)
Jd = (6, 6)  # interpolation size
print('setting interpolation size Jd...', Jd)

Now we can plan NufftObj with these parameters:

NufftObj.plan(om, Nd, Kd, Jd)

Forward transform

Now NufftObj has been prepared and is ready for computations. We continue with an example.:

image = scipy.misc.ascent()[::2, ::2]
image=image/numpy.max(image[...])

print('loading image...')

matplotlib.pyplot.imshow(image.real, cmap=matplotlib.cm.gray)
matplotlib.pyplot.show()

This displays the image Fig. 6.

_images/2d_example_image.png

Fig. 6 The 2D image from scipy.misc.ascent()

NufftObj transform the time_data to non-Cartesian locations:

y = NufftObj.forward(image)

Image restoration with solve():

The image can be restored from non-Cartesian samples y:

image0 = NufftObj.solve(y, solver='cg',maxiter=50)
image3 = NufftObj.solve(y, solver='L1TVOLS',maxiter=50,rho=0.1)

image2 = NufftObj.adjoint(y ) # adjoint


matplotlib.pyplot.subplot(1,3,1)
matplotlib.pyplot.title('Restored image (cg)')
matplotlib.pyplot.imshow(image0.real, cmap=matplotlib.cm.gray, norm=matplotlib.colors.Normalize(vmin=0.0, vmax=1))


matplotlib.pyplot.subplot(1,3,2)
matplotlib.pyplot.imshow(image2.real, cmap=matplotlib.cm.gray, norm=matplotlib.colors.Normalize(vmin=0.0, vmax=5))
matplotlib.pyplot.title('Adjoint transform')


matplotlib.pyplot.subplot(1,3,3)
matplotlib.pyplot.title('L1TV OLS')
matplotlib.pyplot.imshow(image3.real, cmap=matplotlib.cm.gray, norm=matplotlib.colors.Normalize(vmin=0.0, vmax=1))

matplotlib.pyplot.show()
_images/2D_restoration.png

Fig. 7 Image restoration through solve() ‘cg’, ‘L1TVOLS’, ‘L1TVLAD’ and adjoint().

The spectrum of the restored image:

_images/2D_spectrum.png

Fig. 8 The spectrum of the restored image solved by cg.

Manual

NUFFT

The NUFFT class

The NUFFT class encapsulates the NUFFT function using the Numpy/Scipy environment. This allows portability so the NUFFT() can be easily ported to Windows and Linux. Users can install their favourite Python distribution. So far, I have tested Anaconda, intel-python, intel-numpy and open-source python.

However, the performance of NUFFT is therefore impacted by the Numpy implementation.

Defining the equispaced to non-Cartesian transform as operator A, the NUFFT class provides the following methods:

  • forward() method computes the single forward operation A.

  • adjoint() method computes the scaled adjoint operation A^H. The result is divided by a scaling factor of numpy.prod(Kd).

  • selfadjoint() method computes the single selfadjoint operation A^H A.

  • solve() method links many solvers in pynufft.linalg.solver,

    which is based on the solvers of scipy.sparse.linalg.cg, scipy.sparse.linalg. ‘lsqr’, ‘dc’, ‘bicg’, ‘bicgstab’, ‘cg’, ‘gmres’, ‘lgmres’

Attributes

  • NUFFT.ndims: the dimension

  • NUFFT.ft_axes: the axes where the FFT takes place

  • NUFFT.Nd: Tuple, the dimensions of the image

  • NUFFT.Kd: Tuple, the dimensions of the oversampled k-space

The life-cycle of an NUFFT object

NUFFT employs the plan-execution two-stage model. This can maximize the runtime speed, at the cost of the extra precomputation times and extra memory.

Instantiating an NUFFT instance defines only some instance attributes. Instance attributes will be replaced later once the method plan() is called.

Then the plan() method will call the helper.plan() function, which constructs the scaling factor and the interpolator. The interpolator is precomputed and stored in the Compressed Sparse Row (CSR) format. See scipy.sparse.csr_matrix and CSR in Wikipedia.

Once the object has been planned, the forward() and adjoint() methods reuse the saved scaling factors and interpolators.

NUFFT(device)

The NUFFT(device) class

NUFFT(device) computes NUFFT on the CUDA/OpenCL device.

Defining the equispaced to non-Cartesian transform as operator A, the NUFFT(device) class provides the following methods:

  • forward() method computes the single forward operation A.

  • adjoint() method computes the scaled adjoint operation A^H. The result is divided by the scaling factor of numpy.prod(Kd).

  • selfadjoint() method computes the single selfadjoint operation A^H A.

  • solve() method now only offers ‘cg’, ‘dc’, ‘L1TVOLS’

Attributes

  • NUFFT.ndims: the dimension

  • NUFFT.ft_axes: the axes where the FFT takes place

  • NUFFT.Nd: Tuple, the dimensions of the image

  • NUFFT.Kd: Tuple, the dimensions of oversampled k-space

Acceleration on PyCUDA/PyOpenCL

The NUFFT_hsa was designed to accelerate the NUFFT function on the multi-core CPU and GPU, using PyOpenCL and PyCUDA backends. This was made possible by using Reikna meta-package.

If multiple NUFFT(device) objects are created with the PyCUDA backend, each call can be executed only after the context has ‘popped up’. This is achieved by the decorator function push_cuda_context(): calling NUFFT(device) methods will trigger the decorator and get the context popped up. However, PyOpenCL has no such restriction and the call will automatically bypass the decorator for the NUFFT(device) with the PyOpenCL backend.

Different objects can be constructed on different PyCUDA and PyOpenCL backends.

The life-cycle of the PyNUFFT(device) object

NUFFT(device) employs the plan-execution two-stage model. This can maximize the runtime speed, at the cost of the extra precomputation times and extra memory.

Instantiating an NUFFT(device) instance also initiates the context and defines some instance attributes. The context is linked to the accelerator and the kernels are compiled on the chosen device. Instance attributes will be replaced later when the method plan() is called.

Then the plan() method calls the helper.plan() function, which constructs the scaling factor and the interpolator. The interpolator is precomputed and stored as multiple 1D ELLpack (ELL) sparse matrices. Each ELL matrix preserves the sparse matrix as the data and column indices. Multi-dimensional interpolators are implemented as a concatenated multiple 1D ELL sparse matrices. The actual data and column indices are inferred from the meshindex. At the end of the plan() method, the offload() method transfers the precomputed arrays to the accelerator.

The run-time computations reuse the saved scaling factors and interpolators.

Multiple NUFFT instances

Multiple NUFFT instances (of mixed types) can coexist at the same time. Each instance has its own memory. However, multiple instances may be penalized by hardware and reduced runtime speed.

Multiple instances can be planned after all the instances have been created. Alternatively, each instance can be planned immediately after being created.

Note that multiple PyCUDA instances were made possible since 2019.1.1, by introducing the push_cuda_context() decorating function. Versions earlier than 2019.1.1 do not support multiple PyCUDA backends. This is because every call to a CUDA context will require the current context to pop up to the top of the stack of the contexts.

PyOpenCL does not have the context pop-up issue but please always use the newest version.

Multiple NUFFT instances

Multiple instances can be planned after all the instances have been created:

# Create the first NUFFT
NufftObj1 = NUFFT()
# Create the second NUFFT
NufftObj2 = NUFFT()
# Plan the first instance
NufftObj1.plan(om1, Nd, Kd, Jd)
NufftObj2.plan(om2, Nd, Kd, Jd)

or each instance can be planned once it has been created:

# Create the first NUFFT
NufftObj1 = NUFFT()
NufftObj1.plan(om1, Nd, Kd, Jd)

# Create the second NUFFT
NufftObj2 = NUFFT()
NufftObj2.plan(om2, Nd, Kd, Jd)

y1 = NufftObj1.forward(x)
y2 = NufftObj2.forward(x)

Multiple NUFFT_hsa instances

Like NUFFT, each instance can be planned immediately after being created:

from pynufft import NUFFT, helper
# Create the first NUFFT(device)
NufftObj1 = NUFFT(helper.device_list()[0])
NufftObj1.plan(om1, Nd, Kd, Jd)

# Create the second NUFFT(device)
NufftObj2 = NUFFT(helper.device_list()[0])
NufftObj2.plan(om2, Nd, Kd, Jd)

y1 = NufftObj1.forward(x)
y2 = NufftObj2.forward(x)

Multiprocessing (experimental)

The multiprocessing module is the built-in parallel method of C-Python. PyNUFFT may (experimentally) work together with the multiprocessing module of Python.

The mutliprocessing module relies on pickle to serialize the data, whereas the PyCUDA and PyOpenCL contexts are “unpicklable”. Thus, I found that multiprocessing for PyNUFFT must fulfil the following conditions: (1) each NUFFT instance should be created and then executed in a separate process; (2) any CUDA/PyOpenCL related object cannot be sent or planned in advance, and (3) taskset should be used to assign a process to a specified CPU core.

It is the user’s responsibility to take care of the hardware (total memory and IO).

One example of working with multiprocessing (mixed CUDA and OpenCL backends) is as follows. In this example, an “atomic_NUFFT” class is created as a high-level wrapper for the creation and execution of NUFFT. This example has only been tested in Linux because parallel computing is highly platform dependent.

"""
An example of multiprocessing with NUFFT using PyCUDA backend, 
wrapped inside the atomic_NUFFT wrapper class.  
The two processes are running on two CPU cores.
Each process creates one NUFFT and offloads the computations to GPU.
nvidia-smi confirms that two python programs are using the GPU.
"""

import numpy
from pynufft import NUFFT, helper
import scipy.misc
import matplotlib.pyplot
import multiprocessing
import os 
    


class atomic_NUFFT:
    def __init__(self, om, Nd, Kd, Jd, device_indx):
        """
        This caches the parameters only.
        Any other GPU related stuffs are carried out in run()
        """
        self.om = om
        self.Nd = Nd
        self.Kd = Kd
        self.Jd = Jd
#         self.API = API
        self.device_indx = device_indx
        
    def run(self, x, cpu_cores):
        """
        In this method, the NUFFT are created and executed on a fixed CPU core.
        """
        pid= os.getpid()
        print('pid=', pid)
        os.system("taskset -p -c %d-%d %d" % (cpu_cores[0], cpu_cores[1], pid))
        """
        Control the CPU affinity. Otherwise the process on one core can be switched to another core.
        """

        # create NUFFT
#         NUFFT = NUFFT(self.API, )
        
        # plan the NUFFT
        
        device_list = helper.device_list()  
        self.NUFFT = NUFFT(device_list[self.device_indx])
        self.NUFFT.plan(self.om, self.Nd, self.Kd, self.Jd)
        # send the image to device
        gx = self.NUFFT.to_device(x)
        
        # carry out 10000 forward transform
        for pp in range(0, 10000):
            gy = self.NUFFT._forward_device(gx)

        # return the object
        return gy.get()

Nd = (256,256)
Kd = (512,512)
Jd = (6,6)
om = numpy.random.randn(35636, 2) 
x = scipy.misc.ascent()[::2,::2]
om1 = om[om[:,0]>0, :]
om2 = om[om[:,0]<=0, :]

# create pool
pool = multiprocessing.Pool(2)

# create the list to receive the return values
results = []

# Now enter the first process
# This is the standard multiprocessing Pool
D = atomic_NUFFT(om1, Nd, Kd, Jd, 0)
# async won't obstruct the next line of code
result = pool.apply_async(D.run, args = (x, (0,3)))
# the result is appended
results.append(result)

# Now enter the second process
# This is the standard multiprocessing Pool
D = atomic_NUFFT(om2, Nd, Kd, Jd, 0)
# Non-obstructive
result = pool.apply_async(D.run, args = (x, (4,7)))
results.append(result)

# closing the pool 
pool.close()
pool.join()

# results are appended
# Now print the outputs
result1 = results[0].get()
result2 = results[1].get()

# check CPU results

NUFFT_cpu1 = NUFFT()
NUFFT_cpu1.plan(om1, Nd, Kd, Jd)
y1 = NUFFT_cpu1.forward(x)
print('norm = ', numpy.linalg.norm(y1 - result1) / numpy.linalg.norm(y1))

NUFFT_cpu2 = NUFFT()
NUFFT_cpu2.plan(om2, Nd, Kd, Jd)
y2 = NUFFT_cpu2.forward(x)
print('norm = ', numpy.linalg.norm(y2 - result2) / numpy.linalg.norm(y2))

k-Space trajectories (om)

Cartesian k-space

This section aims to provide a good example to show that NUFFT can be used to compute many different trajectories, including the Cartesian k-space.

However, Cartesian k-space should mostly be computed by FFT and this section is provided only for testing.

In the example, we generate a PyNUFFT object and make a plan using Cartesian k-space, followed by the NUFFT transform.

The data is created by NUFFT but on the Cartesian grid.

Last the Cartesian data are transformed back to image by IFFT (with two ifftshifts before and after ifftn):

# Generating trajectories for Cartesian k-space
import numpy
import matplotlib.pyplot
matplotlib.pyplot.gray()

def fake_Cartesian(Nd):
    dim = len(Nd) # dimension
    M = numpy.prod(Nd)
    om = numpy.zeros((M, dim), dtype = numpy.float)
    grid = numpy.indices(Nd)
    for dimid in range(0, dim):
        om[:, dimid] = (grid[dimid].ravel() *2/ Nd[dimid] - 1.0)*numpy.pi
    return om

import scipy.misc

from pynufft import NUFFT


Nd = (256,256)
Kd = (512,512)
Jd = (6,6)

image = scipy.misc.ascent()[::2,::2]
om = fake_Cartesian(Nd)


print('Number of samples (M) = ', om.shape[0])
print('Dimension = ', om.shape[1])
print('Nd = ', Nd)
print('Kd = ', Kd)
print('Jd = ', Jd)

NufftObj = NUFFT()
NufftObj.plan(om, Nd, Kd, Jd)
y = NufftObj.forward(image)

y2 = y.reshape(Nd, order='C')
x2 = numpy.fft.ifftshift(numpy.fft.ifftn(numpy.fft.ifftshift(y2)))
matplotlib.pyplot.subplot(1,3,1)
matplotlib.pyplot.imshow(image.real, vmin = 0, vmax = 255)
matplotlib.pyplot.title('Original image')
matplotlib.pyplot.subplot(1,3,2)
matplotlib.pyplot.imshow(x2.real, vmin = 0, vmax = 255)
matplotlib.pyplot.title('Restored image')
matplotlib.pyplot.subplot(1,3,3)
matplotlib.pyplot.imshow(abs(image - x2), vmin = 0, vmax = 255)
matplotlib.pyplot.title('Difference map')
matplotlib.pyplot.show()

As you can see, the resulting images (Fig. 9) confirm that NUFFT + IFFT can restore the original image.

_images/fake_Cartesian.png

Fig. 9 A Cartesian example generates the contrived Cartesian data using NUFFT, followed by IFFT.

Radial k-space

We can generate the radial spokes on the 2D plane. Each radial spoke spans the range of [-\pi, \pi] at the angle \theta and each point is fully determined by the polar coordinate (R, \theta). See Fig. 10 for more information.

_images/radial_spoke.png

Fig. 10 Illustration of five radial spokes. Each point of the spoke can be described by the polar coordinate (R, \theta), which can be transformed to Cartesian coordinates (R cos(\theta), R sin(\theta)).

The following code generates 360 radial spokes:

# generating 2D radial coordinates
import numpy

spoke_range = (numpy.arange(0, 512) - 256.0 )* numpy.pi/ 256  # normalized between -pi and pi
M = 512*360
om = numpy.empty((M,2), dtype = numpy.float32)


for angle in range(0, 360):
   radian = angle * 2 * numpy.pi/ 360.0
   spoke_x =  spoke_range * numpy.cos(radian)
   spoke_y =  spoke_range * numpy.sin(radian)
   om[512*angle : 512*(angle + 1) ,0] = spoke_x
   om[512*angle : 512*(angle + 1) ,1] = spoke_y


import matplotlib.pyplot
matplotlib.pyplot.plot(om[:,0], om[:,1],'.')
matplotlib.pyplot.show()

API documentation

NUFFT class

class pynufft.nufft.__init__.NUFFT(device_indx=None, legacy=None)

A super class of cpu and gpu NUFFT functions.

Note: NUFFT does not inherit NUFFT_cpu (deprecated) and NUFFT_hsa (deprecated).

__init__(device_indx=None, legacy=None)

Constructor.

Parameters

None (Python NoneType) –

Returns

NUFFT: the pynufft.NUFFT instance

Return type

NUFFT: the pynufft.NUFFT class

Example

>>> from pynufft import NUFFT
>>> NufftObj = NUFFT()

or

>>> from pynufft import NUFFT, helper
>>> device = helper.device_list()[0]
>>> NufftObj = NUFFT(device) # for first acceleration device in the system
__weakref__

list of weak references to the object (if defined)

_adjoint_cpu(y)

Adjoint NUFFT on CPU

Parameters

y – The input numpy array, with the size of (M,)

Type

numpy array with the dtype of numpy.complex64

Returns

x: The output numpy array, with the size of Nd or Nd

Return type

numpy array with the dtype of numpy.complex64

_adjoint_device(gy)

Adjoint NUFFT on the heterogeneous device

Parameters

gy – The input gpu array, with size=(M,)

Type

reikna gpu array with dtype =numpy.complex64

Returns

gx: The output gpu array, with size=Nd

Return type

reikna gpu array with dtype =numpy.complex64

_adjoint_legacy(gy)

Adjoint NUFFT on the heterogeneous device

Parameters

gy – The input gpu array, with size=(M,)

Type

reikna gpu array with dtype =numpy.complex64

Returns

gx: The output gpu array, with size=Nd

Return type

reikna gpu array with dtype =numpy.complex64

_forward_cpu(x)

Forward NUFFT on CPU

Parameters

x – The input numpy array, with the size of Nd

Type

numpy array with the dtype of numpy.complex64

Returns

y: The output numpy array, with the size of (M,)

Return type

numpy array with the dtype of numpy.complex64

_forward_device(gx)

Forward NUFFT on the heterogeneous device

Parameters

gx (reikna gpu array with dtype = numpy.complex64) – The input gpu array, with size = Nd

Returns

gy: The output gpu array, with size = (M,)

Return type

reikna gpu array with dtype = numpy.complex64

_forward_legacy(gx)

Forward NUFFT on the heterogeneous device

Parameters

gx (reikna gpu array with dtype = numpy.complex64) – The input gpu array, with size = Nd

Returns

gy: The output gpu array, with size = (M,)

Return type

reikna gpu array with dtype = numpy.complex64

_init__cpu()

Constructor.

Parameters

None (Python NoneType) –

Returns

NUFFT: the pynufft_hsa.NUFFT instance

Return type

NUFFT: the pynufft_hsa.NUFFT class

Example

>>> from pynufft import NUFFT
>>> NufftObj = NUFFT()
_init__device(device_indx=None)

Constructor.

Parameters
  • API (string) – The API for the heterogeneous system. API=’cuda’ or API=’ocl’

  • platform_number (integer) – The number of the platform found by the API.

  • device_number (integer) – The number of the device found on the platform.

  • verbosity (integer) – Defines the verbosity level, default value is 0

Returns

0

Return type

int, float

Example

>>> from pynufft import NUFFT
>>> NufftObj = NUFFT_device(API='cuda', platform_number=0,
                                 device_number=0, verbosity=0)
_k2xx_cpu(k)
Private: the inverse FFT and image cropping (which is the reverse of

_xx2k() method)

_k2xx_device(k)

Private: the inverse FFT and image cropping (which is the reverse of _xx2k() method)

_k2xx_one2one_cpu(k)
Private: the inverse FFT and image cropping

(which is the reverse of _xx2k() method)

_k2y2k_cpu(k)
Private: the integrated interpolation-gridding by the Sparse

Matrix-Vector Multiplication

_k2y_cpu(k)

Private: interpolation by the Sparse Matrix-Vector Multiplication

_k2y_device(k)

Private: interpolation by the Sparse Matrix-Vector Multiplication

_k2y_legacy(k)

Private: interpolation by the Sparse Matrix-Vector Multiplication

_offload_device()

self.offload():

Off-load NUFFT to the opencl or cuda device(s)

Parameters
  • API (string) – define the device type, which can be ‘cuda’ or ‘ocl’

  • platform_number (int) – define which platform to be used. The default platform_number = 0.

  • device_number (int) – define which device to be used. The default device_number = 0.

Returns

self: instance

_offload_legacy()

self.offload():

Off-load NUFFT to the opencl or cuda device(s)

Parameters
  • API (string) – define the device type, which can be ‘cuda’ or ‘ocl’

  • platform_number (int) – define which platform to be used. The default platform_number = 0.

  • device_number (int) – define which device to be used. The default device_number = 0.

Returns

self: instance

_plan_cpu(om, Nd, Kd, Jd, ft_axes=None)

Plan the NUFFT object with the geometry provided.

Parameters
  • om (numpy.float array, matrix size = M * ndims) – The M off-grid locates in the frequency domain, which is normalized between [-pi, pi]

  • Nd (tuple, ndims integer elements.) –

    The matrix size of the equispaced image. Example: Nd=(256,256) for a 2D image;

    Nd = (128,128,128) for a 3D image

  • Kd (tuple, ndims integer elements.) –

    The matrix size of the oversampled frequency grid. Example: Kd=(512,512) for 2D image;

    Kd = (256,256,256) for a 3D image

  • Jd (tuple, ndims integer elements.) –

    The interpolator size. Example: Jd=(6,6) for 2D image;

    Jd = (6,6,6) for a 3D image

  • ft_axes (None, or tuple with optional integer elements.) – (Optional) The axes for Fourier transform. The default is all axes if ‘None’ is given.

  • batch – (Optional) Batch mode. If the batch is provided, the last appended axis is the number of identical NUFFT to be transformed. The default is ‘None’.

Returns

0

Return type

int, float

Variables
  • Nd – initial value: Nd

  • Kd – initial value: Kd

  • Jd – initial value: Jd

  • ft_axes – initial value: None

Example

>>> from pynufft import NUFFT
>>> NufftObj = NUFFT()
>>> NufftObj.plan(om, Nd, Kd, Jd)

or

>>> NufftObj.plan(om, Nd, Kd, Jd, ft_axes)
_plan_device(om, Nd, Kd, Jd, ft_axes=None, radix=None)

Design the multi-coil or single-coil memory reduced interpolator.

Parameters
  • om (numpy.float array, matrix size = (M, ndims)) – The M off-grid locations in the frequency domain. Normalized between [-pi, pi]

  • Nd (tuple, ndims integer elements.) –

    The matrix size of equispaced image. Example: Nd=(256, 256) for a 2D image;

    Nd = (128, 128, 128) for a 3D image

  • Kd (tuple, ndims integer elements.) – The matrix size of the oversampled frequency grid. Example: Kd=(512,512) for 2D image; Kd = (256,256,256) for a 3D image

  • Jd (tuple, ndims integer elements.) – The interpolator size. Example: Jd=(6,6) for 2D image; Jd = (6,6,6) for a 3D image

  • ft_axes (tuple, selected axes to be transformed.) – The dimensions to be transformed by FFT. Example: ft_axes = (0, 1) for 2D, ft_axes = (0, 1, 2) for 3D; ft_axes = None for all dimensions.

  • radix – expert mode. If provided, the shape is Nd. The last axis is the number of parallel coils.

Returns

0

Return type

int, float

Example

>>> import pynufft
>>> device=pynufft.helper.device_list()[0]
>>> NufftObj = pynufft.NUFFT(device)
>>> NufftObj.plan(om, Nd, Kd, Jd)
_plan_legacy(om, Nd, Kd, Jd, ft_axes=None)

Design the min-max interpolator.

Parameters
  • om (numpy.float array, matrix size = M * ndims) – The M off-grid locations in the frequency domain. Normalized between [-pi, pi]

  • Nd (tuple, ndims integer elements.) – The matrix size of equispaced image. Example: Nd=(256,256) for a 2D image; Nd = (128,128,128) for a 3D image

  • Kd (tuple, ndims integer elements.) – The matrix size of the oversampled frequency grid. Example: Kd=(512,512) for 2D image; Kd = (256,256,256) for a 3D image

  • Jd (tuple, ndims integer elements.) – The interpolator size. Example: Jd=(6,6) for 2D image; Jd = (6,6,6) for a 3D image

Returns

0

Return type

int, float

Example

>>> import pynufft
>>> NufftObj = pynufft.NUFFT_cpu()
>>> NufftObj.plan(om, Nd, Kd, Jd) 
_precompute_sp_cpu()
Private: Precompute adjoint (gridding) and Toepitz interpolation

matrix.

Parameters

None (Python Nonetype) –

Returns

self: instance

_selfadjoint_cpu(x)

selfadjoint NUFFT on CPU

Parameters

x – The input numpy array, with size=Nd

Type

numpy array with dtype =numpy.complex64

Returns

x: The output numpy array, with size=Nd

Return type

numpy array with dtype =numpy.complex64

_selfadjoint_device(gx)

selfadjoint NUFFT on the heterogeneous device

Parameters

gx (reikna gpu array with dtype =numpy.complex64) – The input gpu array, with size=Nd

Returns

gx: The output gpu array, with size=Nd

Return type

reikna gpu array with dtype =numpy.complex64

_selfadjoint_legacy(gx)

selfadjoint NUFFT on the heterogeneous device

Parameters

gx (reikna gpu array with dtype =numpy.complex64) – The input gpu array, with size=Nd

Returns

gx: The output gpu array, with size=Nd

Return type

reikna gpu array with dtype =numpy.complex64

_solve_cpu(y, solver=None, *args, **kwargs)

Solve NUFFT. :param y: data, numpy.complex64. The shape = (M,) or (M, batch) :param solver: ‘cg’, ‘L1TVOLS’, ‘lsmr’, ‘lsqr’, ‘dc’, ‘bicg’,

‘bicgstab’, ‘cg’, ‘gmres’,’lgmres’

Parameters

maxiter (int) – the number of iterations

Returns

numpy array with size. The shape = Nd (‘L1TVOLS’) or Nd (‘lsmr’, ‘lsqr’, ‘dc’,’bicg’,’bicgstab’,’cg’, ‘gmres’,’lgmres’)

_solve_device(gy, solver=None, *args, **kwargs)

The solver of NUFFT

Parameters
  • gy (reikna array, dtype = numpy.complex64) – data, reikna array, (M,) size

  • solver (string) – could be ‘cg’, ‘L1TVOLS’, ‘L1TVLAD’

  • maxiter (int) – the number of iterations

Returns

reikna array with size Nd

_solve_legacy(gy, solver=None, *args, **kwargs)

The solver of NUFFT

Parameters
  • gy (reikna array, dtype = numpy.complex64) – data, reikna array, (M,) size

  • solver (string) – could be ‘cg’, ‘L1TVOLS’, ‘L1TVLAD’

  • maxiter (int) – the number of iterations

Returns

reikna array with size Nd

_vec2k_cpu(k_vec)

Sorting the vector to k-spectrum Kd array

_vec2y_cpu(k_vec)

gridding:

_x2xx_cpu(x)

Private: Scaling on CPU Inplace multiplication of self.x_Nd by the scaling factor self.sn.

_xx2k_cpu(xx)

Private: oversampled FFT on CPU

Firstly, zeroing the self.k_Kd array Second, copy self.x_Nd array to self.k_Kd array by cSelect Third, inplace FFT

_xx2k_device(xx)

Private: oversampled FFT on the heterogeneous device

First, zeroing the self.k_Kd array Second, copy self.x_Nd array to self.k_Kd array by cSelect Third, inplace FFT

_xx2k_one2one_cpu(xx)

Private: oversampled FFT on CPU

First, zeroing the self.k_Kd array Second, copy self.x_Nd array to self.k_Kd array by cSelect Third, inplace FFT

_xx2x_cpu(xx)

Private: rescaling, which is identical to the _x2xx() method

_y2k_cpu(y)

Private: gridding by the Sparse Matrix-Vector Multiplication

_y2k_device(y)

Private: gridding by the Sparse Matrix-Vector Multiplication Atomic_twosum together provide better accuracy than generic atomic_add. See: ocl_add and cuda_add code-strings in atomic_add(), inside the re_subroutine.py.

_y2k_legacy(y)

Private: gridding by the Sparse Matrix-Vector Multiplication

_y2vec_cpu(y)

regridding non-uniform data (unsorted vector)

adjoint(*args, **kwargs)

Adjoint NUFFT (host code)

Parameters

y – The input numpy array, with the size of (M,)

Type

numpy array with the dtype of numpy.complex64

Returns

x: The output numpy array, with the size of Nd or Nd

Return type

numpy array with the dtype of numpy.complex64

forward(*args, **kwargs)

Forward NUFFT (host code)

Parameters

x – The input numpy array, with the size of Nd

Type

numpy array with the dtype of numpy.complex64

Returns

y: The output numpy array, with the size of (M,)

Return type

numpy array with the dtype of numpy.complex64

plan(*args, **kwargs)

Plan the NUFFT object with the geometry provided.

Parameters
  • om (numpy.float array, matrix size = M * ndims) – The M off-grid locates in the frequency domain, which is normalized between [-pi, pi]

  • Nd (tuple, ndims integer elements.) –

    The matrix size of the equispaced image. Example: Nd=(256,256) for a 2D image;

    Nd = (128,128,128) for a 3D image

  • Kd (tuple, ndims integer elements.) –

    The matrix size of the oversampled frequency grid. Example: Kd=(512,512) for 2D image;

    Kd = (256,256,256) for a 3D image

  • Jd (tuple, ndims integer elements.) –

    The interpolator size. Example: Jd=(6,6) for 2D image;

    Jd = (6,6,6) for a 3D image

  • ft_axes (None, or tuple with optional integer elements.) – (Optional) The axes for Fourier transform. The default is all axes if ‘None’ is given.

Returns

0

Return type

int, float

Variables
  • Nd – initial value: Nd

  • Kd – initial value: Kd

  • Jd – initial value: Jd

  • ft_axes – initial value: None

Example

>>> from pynufft import NUFFT
>>> NufftObj = NUFFT()
>>> NufftObj.plan(om, Nd, Kd, Jd)

or

>>> NufftObj.plan(om, Nd, Kd, Jd, ft_axes)
selfadjoint(*args, **kwargs)

selfadjoint NUFFT (host code)

Parameters

x – The input numpy array, with size=Nd

Type

numpy array with dtype =numpy.complex64

Returns

x: The output numpy array, with size=Nd

Return type

numpy array with dtype =numpy.complex64

solve(*args, **kwargs)

Solve NUFFT (host code) :param y: data, numpy.complex64. The shape = (M,) :param solver: ‘cg’, ‘L1TVOLS’, ‘lsmr’, ‘lsqr’, ‘dc’, ‘bicg’,

‘bicgstab’, ‘cg’, ‘gmres’,’lgmres’

Parameters

maxiter (int) – the number of iterations

Returns

numpy array with size Nd.

class pynufft.nufft.__init__.NUFFT_cupy

A tentative torch interface

Jd

initial value: ()

__init__()
__weakref__

list of weak references to the object (if defined)

debug

initial value: 0

plan(om, Nd, Kd, Jd)

Plan the NUFFT object with the geometry provided.

Parameters
  • om (numpy.float array, matrix size = M * ndims) – The M off-grid locates in the frequency domain, which is normalized between [-pi, pi]

  • Nd (tuple, ndims integer elements.) –

    The matrix size of the equispaced image. Example: Nd=(256,256) for a 2D image;

    Nd = (128,128,128) for a 3D image

  • Kd (tuple, ndims integer elements.) –

    The matrix size of the oversampled frequency grid. Example: Kd=(512,512) for 2D image;

    Kd = (256,256,256) for a 3D image

  • Jd (tuple, ndims integer elements.) –

    The interpolator size. Example: Jd=(6,6) for 2D image;

    Jd = (6,6,6) for a 3D image

  • ft_axes (None, or tuple with optional integer elements.) – (Optional) The axes for Fourier transform. The default is all axes if ‘None’ is given.

  • batch – (Optional) Batch mode. If the batch is provided, the last appended axis is the number of identical NUFFT to be transformed. The default is ‘None’.

Returns

0

Return type

int, float

Variables
  • Nd – initial value: Nd

  • Kd – initial value: Kd

  • Jd – initial value: Jd

  • ft_axes – initial value: None

Example

>>> from pynufft import NUFFT
>>> NufftObj = NUFFT()
>>> NufftObj.plan(om, Nd, Kd, Jd)

or

>>> NufftObj.plan(om, Nd, Kd, Jd, ft_axes)
class pynufft.nufft.__init__.NUFFT_tf_eager

A tentative torch interface

Jd

initial value: ()

__init__()
__weakref__

list of weak references to the object (if defined)

debug

initial value: 0

plan(om, Nd, Kd, Jd)

Plan the NUFFT object with the geometry provided.

Parameters
  • om (numpy.float array, matrix size = M * ndims) – The M off-grid locates in the frequency domain, which is normalized between [-pi, pi]

  • Nd (tuple, ndims integer elements.) –

    The matrix size of the equispaced image. Example: Nd=(256,256) for a 2D image;

    Nd = (128,128,128) for a 3D image

  • Kd (tuple, ndims integer elements.) –

    The matrix size of the oversampled frequency grid. Example: Kd=(512,512) for 2D image;

    Kd = (256,256,256) for a 3D image

  • Jd (tuple, ndims integer elements.) –

    The interpolator size. Example: Jd=(6,6) for 2D image;

    Jd = (6,6,6) for a 3D image

  • ft_axes (None, or tuple with optional integer elements.) – (Optional) The axes for Fourier transform. The default is all axes if ‘None’ is given.

  • batch – (Optional) Batch mode. If the batch is provided, the last appended axis is the number of identical NUFFT to be transformed. The default is ‘None’.

Returns

0

Return type

int, float

Variables
  • Nd – initial value: Nd

  • Kd – initial value: Kd

  • Jd – initial value: Jd

  • ft_axes – initial value: None

Example

>>> from pynufft import NUFFT
>>> NufftObj = NUFFT()
>>> NufftObj.plan(om, Nd, Kd, Jd)

or

>>> NufftObj.plan(om, Nd, Kd, Jd, ft_axes)
class pynufft.nufft.__init__.NUFFT_torch

A tentative torch interface

Jd

initial value: ()

__init__()
__weakref__

list of weak references to the object (if defined)

debug

initial value: 0

plan(om, Nd, Kd, Jd)

Plan the NUFFT object with the geometry provided.

Parameters
  • om (numpy.float array, matrix size = M * ndims) – The M off-grid locates in the frequency domain, which is normalized between [-pi, pi]

  • Nd (tuple, ndims integer elements.) –

    The matrix size of the equispaced image. Example: Nd=(256,256) for a 2D image;

    Nd = (128,128,128) for a 3D image

  • Kd (tuple, ndims integer elements.) –

    The matrix size of the oversampled frequency grid. Example: Kd=(512,512) for 2D image;

    Kd = (256,256,256) for a 3D image

  • Jd (tuple, ndims integer elements.) –

    The interpolator size. Example: Jd=(6,6) for 2D image;

    Jd = (6,6,6) for a 3D image

  • ft_axes (None, or tuple with optional integer elements.) – (Optional) The axes for Fourier transform. The default is all axes if ‘None’ is given.

  • batch – (Optional) Batch mode. If the batch is provided, the last appended axis is the number of identical NUFFT to be transformed. The default is ‘None’.

Returns

0

Return type

int, float

Variables
  • Nd – initial value: Nd

  • Kd – initial value: Kd

  • Jd – initial value: Jd

  • ft_axes – initial value: None

Example

>>> from pynufft import NUFFT
>>> NufftObj = NUFFT()
>>> NufftObj.plan(om, Nd, Kd, Jd)

or

>>> NufftObj.plan(om, Nd, Kd, Jd, ft_axes)
sparse_mx_to_torch_sparse_tensor(sparse_mx)

Convert a scipy sparse matrix to a torch sparse tensor.

https://github.com/DSE-MSU/DeepRobust

pynufft.nufft.__init__.push_cuda_context(hsa_method)

Decorator: Push cude context to the top of the stack for current use Add @push_cuda_context before the methods of NUFFT_device()

CPU solvers

pynufft.linalg.solve_cpu.L1TVOLS(nufft, y, maxiter, rho)

L1-total variation regularized ordinary least square

pynufft.linalg.solve_cpu._create_kspace_sampling_density(nufft)

Compute k-space sampling density

pynufft.linalg.solve_cpu._pipe_density(nufft, maxiter)

Create the density function by iterative solution Generate pHp matrix

pynufft.linalg.solve_cpu.cDiff(x, d_indx)

Compute image gradient, which needs the results of indxmap_diff(Nd) :param x: The image array :param d_indx: The index of the shifted image :type x: numpy.float array, matrix size = Nd :type d_indx: int32 :returns: x_diff: Image gradient determined by d_indx :rtype: x_diff: numpy.complex64

pynufft.linalg.solve_cpu.solve(nufft, y, solver=None, *args, **kwargs)

Solve NUFFT. The current version supports solvers = ‘cg’ or ‘L1TVOLS’ or ‘L1TVLAD’.

Parameters
  • nufft – NUFFT_cpu object

  • y – (M,) array, non-uniform data

Returns

x: image

HSA solvers

pynufft.linalg.solve_hsa.L1TVLAD(nufft, gy, maxiter, rho)

(testing) L1-total variation regularized least absolute deviation

pynufft.linalg.solve_hsa.L1TVOLS(nufft, gy, maxiter, rho)

L1-total variation regularized ordinary least square

pynufft.linalg.solve_hsa._create_kspace_sampling_density(nufft)

(stable) Compute k-space sampling density from the nufft object

pynufft.linalg.solve_hsa._pipe_density(nufft, maxiter)

Private: create the density function in the data space by a iterative solution Pipe et al. 1999

pynufft.linalg.solve_hsa.cDiff(x, d_indx)

(stable) Compute image gradient Work with indxmap_diff(Nd). …

pynufft.linalg.solve_hsa.solve(nufft, gy, solver=None, maxiter=30, *args, **kwargs)

The solve function of NUFFT_hsa. The current version supports solvers = ‘cg’ or ‘L1TVOLS’.

Parameters
  • nufft – NUFFT_hsa object

  • y (numpy.complex64 reikna array) – (M,) array, non-uniform data. If batch is provided, ‘cg’ and ‘L1TVOLS’ returns different image shape.

Returns

x: Nd image. L1TVOLS always returns Nd. ‘cg’ returns Nd.

Return type

x: reikna array, complex64.

NUDFT class

class pynufft.linalg.nudft_cpu.NUDFT

The non-uniform DFT operator

__init__()

Constructor.

Parameters

None (Python NoneType) –

Returns

NUFFT: the pynufft.NUDFT instance

Return type

NUFFT: the pynufft.NUFFT class

Example

>>> from pynufft import NUDFT
>>> NufftObj = NUDFT()
__weakref__

list of weak references to the object (if defined)

debug

initial value: 0

Helper functions

class pynufft.src._helper.helper.ELL(elldata, ellcol)

ELL is slow on a single core CPU

__init__(elldata, ellcol)
__weakref__

list of weak references to the object (if defined)

pynufft.src._helper.helper.OMEGA_k(J, K, omd, Kd, dimid, dd, ft_flag)

Compute the index of k-space k_indx

pynufft.src._helper.helper.QR_process(om, N, J, K, sn)

1D QR method for generating min-max interpolator

Parameters
  • om (numpy.float32) – non-Cartesian coordinate. shape = (M, dims)

  • N (int) – length

  • J (int) – size of interpolator

  • K (int) – size of oversampled grid

  • sn (numpy.float32 shape = (N,)) – scaling factor as a length-N vector

class pynufft.src._helper.helper.Tensor_sn(snd, radix)

Not implemented:

__init__(snd, radix)
__weakref__

list of weak references to the object (if defined)

pynufft.src._helper.helper.block_outer_prod(x1, x2)

Multiply x1 (J1 x M) and x2 (J2xM) and extend the dimensions to 3D (J1xJ2xM)

pynufft.src._helper.helper.block_outer_sum(x1, x2)

Update the new index after adding a new axis

pynufft.src._helper.helper.block_outer_sum0(x1, x2)

Multiply x1 (J1 x M) and x2 (J2xM) and extend the dimensions to 3D (J1xJ2xM)

pynufft.src._helper.helper.cat_snd(snd)
Parameters

snd (tuple) – tuple of input 1D vectors

Returns

tensor_sn: vector of concatenated scaling factor, shape = (numpy.sum(Nd), )

Return type

tensor_sn: numpy.float32

pynufft.src._helper.helper.create_laplacian_kernel(nufft)

Create the multi-dimensional laplacian kernel in k-space

Parameters

nufft – the NUFFT object

Returns

uker: the multi-dimensional laplacian kernel in k-space (no fft shift used)

Return type

numpy ndarray

pynufft.src._helper.helper.create_partialELL(ud, kd, Jd, M)
Parameters
  • ud (tuple of numpy.complex64 arrays) – tuple of all 1D interpolators

  • kd (tuple of numpy.int32 arrays) – tuple of all 1D indices

  • Jd (tuple of int32) – tuple of interpolation sizes

  • M (int) – number of samples

Returns

partialELL:

Return type

partialELL: pELL instance

pynufft.src._helper.helper.crop_slice_ind(Nd)

(Deprecated in v.0.3.4) Return the “slice” of Nd size to index multi-dimensional array. “Slice” functions as the index of the array. This function is superseded by preindex_copy(), which avoid run-time indexing.

pynufft.src._helper.helper.device_list()

device_list() returns available devices for acceleration as a tuple. If no device is available, it returns an empty tuple.

pynufft.src._helper.helper.diagnose(verbosity=0)

Diagnosis function Find available devices when NUFFT.offload() fails.

pynufft.src._helper.helper.get_sn(J, K, N)

Compute the 1D scaling factor for the given J, K, N

Parameters
  • J (int) – size of interpolator

  • K (int) – size of oversampled grid

  • N (int) – length

Returns

sn: scaling factor as a length-N vector

Rtype sn

numpy.float32 shape = (N,)

pynufft.src._helper.helper.indxmap_diff(Nd)

Preindixing for rapid image gradient. Diff(x) = x.flat[d_indx[0]] - x.flat Diff_t(x) = x.flat[dt_indx[0]] - x.flat

Parameters

Nd (tuple with integers) – the dimension of the image

Returns

d_indx: image gradient

Returns

dt_indx: the transpose of the image gradient

Return type

d_indx: lists with numpy ndarray

Return type

dt_indx: lists with numpy ndarray

pynufft.src._helper.helper.kaiser_bessel_ft(u, J, alpha, kb_m, d)

Interpolation weight for given J/alpha/kb-m

pynufft.src._helper.helper.kronecker_scale(snd)

Compute the Kronecker product of the scaling factor.

Parameters
  • snd (tuple of 1D numpy.array) – Tuple of 1D scaling factors

  • dd – Number of dimensions

Returns

sn: The multi-dimensional Kronecker of the scaling factors

Return type

Nd array

pynufft.src._helper.helper.nufft_T(N, J, K, alpha, beta)

Equation (29) and (26) in Fessler and Sutton 2003. Create the overlapping matrix CSSC (diagonal dominant matrix) of J points, then find the pseudo-inverse of CSSC

pynufft.src._helper.helper.nufft_alpha_kb_fit(N, J, K)

Find parameters alpha and beta for scaling factor st[‘sn’] The alpha is hardwired as [1,0,0…] when J = 1 (uniform scaling factor)

Parameters
  • N (int) – size of image

  • J (int) – size of interpolator

  • K (int) – size of oversampled k-space

Returns

alphas:

Returns

beta:

Return type

alphas: list of float

Return type

beta:

pynufft.src._helper.helper.nufft_offset(om, J, K)

For every om point (outside regular grids), find the nearest central grid (from Kd dimension)

pynufft.src._helper.helper.nufft_r(om, N, J, K, alpha, beta)

Equation (30) of Fessler & Sutton’s paper

pynufft.src._helper.helper.nufft_scale1(N, K, alpha, beta, Nmid)

Calculate image space scaling factor

pynufft.src._helper.helper.outer_sum(xx, yy)

Superseded by numpy.add.outer() function

class pynufft.src._helper.helper.pELL(M, Jd, curr_sumJd, meshindex, kindx, udata)

class pELL: partial ELL format

__init__(M, Jd, curr_sumJd, meshindex, kindx, udata)

Constructor

Parameters
  • M (int) – Number of samples

  • Jd (tuple of int) – Interpolator size

  • curr_sumJd (tuple of int) – Summation of Jd[0:d-1], for fast shift computing

  • meshindex (numpy.uint32, shape = (numpy.prod(Jd), dd)) – The tensor indices to all interpolation points

  • kindx (numpy.uint32, shape = (M, numpy.sum(Jd))) – Premixed k-indices to be combined

  • udata (numpy.complex64, shape = (M, numpy.sum(Jd))) – Premixed interpolation data values

Returns

pELL: partial ELLpack class with the given values

Return type

pELL: partial ELLpack class

__weakref__

list of weak references to the object (if defined)

pynufft.src._helper.helper.plan(om, Nd, Kd, Jd, ft_axes=None, format='CSR', radix=None)

Plan for the NUFFT object.

Parameters
  • om (numpy.float) – Coordinate

  • Nd (tuple of int) – Image shape

  • Kd (tuple of int) – Oversampled grid shape

  • Jd (tuple of int) – Interpolator size

  • ft_axes (tuple of int) – Axes where FFT takes place

  • format (string, 'CSR' or 'pELL') – Output format of the interpolator. ‘CSR’: the precomputed Compressed Sparse Row (CSR) matrix. ‘pELL’: partial ELLPACK which precomputes the concatenated 1D interpolators.

Return st

dictionary for NUFFT

pynufft.src._helper.helper.preindex_copy(Nd, Kd)

Building the array index for copying two arrays of sizes Nd and Kd. Only the front parts of the input/output arrays are copied. The oversize parts of the input array are truncated (if Nd > Kd), and the smaller size are zero-padded (if Nd < Kd)

Parameters
  • Nd (tuple with integer elements) – tuple, the dimensions of array1

  • Kd (tuple with integer elements) – tuple, the dimensions of array2

Returns

inlist: the index of the input array

Returns

outlist: the index of the output array

Returns

nelem: the length of the inlist and outlist (equal length)

Return type

inlist: list with integer elements

Return type

outlist: list with integer elements

Return type

nelem: int

pynufft.src._helper.helper.rdx_kron(ud, kd, Jd, radix=None)

Radix-n Kronecker product of multi-dimensional array

Parameters
  • ud (tuple of (M, Jd[d]) numpy.complex64 arrays) – 1D interpolators

  • kd (tuple of (M, Jd[d]) numpy.uint arrays) – 1D indices to interpolators

  • Jd (tuple of int) – 1D interpolator sizes

  • radix (int) – radix of Kronecker product

  • kk (tuple of (M, Jd[d]) numpy.uint arrays) – 1D indices to interpolators

  • JJ (tuple of int) – 1D interpolator sizes

Returns

uu: 1D interpolators

pynufft.src._helper.helper.strides_divide_itemsize(Nd)

strides_divide_itemsize function computes the step_size (strides/itemsize) along different axes, and its inverse as float32. For fast GPU computing, preindexing allows for fast Hadamard product and copy. However preindexing costs some memory. strides_divide_itemsize aims to replace preindexing by run-time calculation of the index, given the invNd_elements.

Parameters

Nd (tuple of int) – Input shape

Returns

Nd_elements: strides/itemsize of the Nd.

Returns

invNd_elements: (float32)(1/Nd_elements). Division on GPU is slow but multiply is fast. Thus we can precompute the inverse and then multiply the inverse on GPU.

Return type

Nd_elements: tuple of int

Return type

invNd_elements: tuple of float32

See also

pynufft.NUFFT_hsa

Metaprogramming subroutines (using reikna, pyopencl, pycuda)

pynufft.src.re_subroutine.atomic_add(API)

Return the atomic_add for the given API. Overcome the missing atomic_add_float for OpenCL-1.2. Note: will be checked if OpenCL 2.0 provided by all GPU vendors.

pynufft.src.re_subroutine.cAddScalar()

Return the kernel source for cAddScalar.

pynufft.src.re_subroutine.cAddVec()

Return the kernel source for cAddVec.

pynufft.src.re_subroutine.cAnisoShrink()

Return the kernel source of cAnisoShrink

pynufft.src.re_subroutine.cCopy()

Return the kernel source for cCopy

pynufft.src.re_subroutine.cDiff()

Return the kernel source of cDiff.

pynufft.src.re_subroutine.cHadamard()

Return the Hadamard operations related kernel sources.

pynufft.src.re_subroutine.cHypot()

Return the kernel code for hypot, which computes the sqrt(x*x + y*y) without intermediate overflow.

pynufft.src.re_subroutine.cMultiplyConjVec()

Return the kernel source of cMultiplyConjVec.

pynufft.src.re_subroutine.cMultiplyConjVecInplace()

Return the kernel source of cMultiplyConjVecInplace

pynufft.src.re_subroutine.cMultiplyRealInplace()

Return the kernel source of cMultiplyRealInplace.

pynufft.src.re_subroutine.cMultiplyScalar()

Return the kernel source for cMultiplyScalar.

pynufft.src.re_subroutine.cMultiplyVec()

Return the kernel source of cMultiplyVec

pynufft.src.re_subroutine.cMultiplyVecInplace()

Return the kernel source of cMultiplyVecInplace.

pynufft.src.re_subroutine.cRealShrink()

Return the kernel source of xAnisoShrink

pynufft.src.re_subroutine.cSelect()

Return the kernel source of cSelect.

pynufft.src.re_subroutine.cSpmv()

Return the kernel sources for cSpmv related operations, providing cCSR_spmv_vector and cpELL_spmv_mCoil.

pynufft.src.re_subroutine.cSpmvh()

Return the cSpmvh related kernel source. Only pELL_spmvh_mCoil is provided for Spmvh. NUFFT_hsa_legacy reuses the cCSR_spmv() function, which doubles the storage.

pynufft.src.re_subroutine.cSqrt()

Return the kernel source of cSqrt.

pynufft.src.re_subroutine.cTensorCopy()

Return the kernel source for cTensorCopy.

pynufft.src.re_subroutine.cTensorMultiply()

Return the kernel source for cTensorMultiply

pynufft.src.re_subroutine.create_kernel_sets(API)

Create the kernel from the kernel sets. Note that in some tests (Benoit’s and my tests) CUDA shows some degraded accuracy. This loss of accuracy was due to undefined shared memory behavior, which I don’t fully understand. This has been fixed in 2019.2.0 as the operations are moved to global memory.

Version history

v2022.2.4rc1

  • Explain the scaling factor for NUFFT.adjoint in the document.

v2022.2.3

  • Same as v2022.2.3rc2

v2022.2.3-rc2

  • Add experimental NUFFT_cupy for cupy backend.

  • Change scipy.linalg.pinv2 (deprecate) to pinv. Track for possible errors.

v2022.2.3-rc1

  • Add experimental NUFFT_torch, NUFFT_tf

v2022.2.2

  • Maintenance release

v2022.2.1

  • Maintenance release

v2022.1.1

  • some typos fixed.

  • move to Python 3.10

  • update installation for MSVC 2022 and Win 10

v2021.2.0

  • typo fixed: ocl_add() imaglptr -> imagptr

v2021.1.0

  • removed twosum algorithm.

v2020.2.1

  • tested with an AMD card.

  • Deprecate batch as this is not the most essential function in NUFFT.

  • Deprecate NUFFT_cpu, NUFFT_hsa

  • Update some docs.

v2020.1.2

  • new_index: change the khatri_rao_k(), OMEGA_k() in helper.py; col += kindx[index];// + 1

  • Add code of conduct of Contributor Covenant v2.0

  • Update some documentations

v2020.1.0

  • add batch mode to nudft_cpu

v2020.0.0

  • fix batch=1. This can cause error in Riekna fft.

v2019.2.3

-(experimental) Add the unified NUFFT() class. Now CPU, GPU, and legacy GPU mode are encapsuled in a single class.

-Tested using Intel Neo OpenCL driver (see https://github.com/intel/compute-runtime) and IntelPython3.

-The old NUFFT_cpu() and NUFFT_legacy() will be kept in the system for compatibility.

v2019.2.1-2019.2.2

-Remove lsmr as scipy 1.13 has caused unknown error.

v2019.2.0

  • Bump

v2019.1.2

  • BUGFIX: fix the loss of accuracy in cSpmvh(). Replace the group/local by global memory (the group/local sizes have caused the unknown run-time behaviour on cuda)

v2019.1.1

  • Refactor the re_subroutine.py

  • Adopt tensor form

v0.4.0.0

  • 0.4.0.0 is a beta version.

  • Major updates for the NUFFT_hsa class, including memory reduction and split-radix. Multiple NUFFT_hsa() using cuda backend becomes possible, by pushing the context to the top of the stack when a method is called.

  • Tested in Windows 10 with PyCUDA 2018.1.1, nvidia-driver 417.35, CUDA 9.2, Visual Studio 2015 Community, and Anaconda Python 3.7 64-bit. PyOpenCL in Windows is yet to be tested.

  • Add batch mode.

v0.3.3.12

  • 0.3.3.12 is a bug-fixed version.

  • Removal of the keyword async for compatibility reasons because Reikna has changed the keyword to async_.

v0.3.3.8

  • Bugfix: mm = numpy.tile(mm, [numpy.prod(Jd).astype(int), 1]) to fix the wrong type when numpy.prod(Jd) is not cast as int

  • Bugfix: fix rcond=None error in Anaconda 3.6.5 and Numpy 1.13.1 (the recommended None in Numpy 1.14 is backward incompatible with 1.13)

  • Bugfix: indx1 = indx.copy() is replaced by indx1 = list(indx) for Python2 compatibility

v0.3.3.7

  • Bugfix in 0.3.3.7 Toeplitz is removed from the NUFFT_cpu and NUFFT_gpu to avoid the MemoryError.

v0.3.3.6

  • Bugfix: correct the error of import. Now import NUFFT_cpu, NUFFT_hsa at the top level.

v0.3.3

  • Note: GPU support is superseded by Heterogeneous System Architecture (HSA).

  • A variety of nonlinear solvers are provided, including the conjugate gradient method (cg), L1 total-variation ordinary least square (L1TVOLS), and L1 total-variation least absolute deviation (L1TVLAD).

  • The CPU version support other nonlinear solvers, lsqr, gmr, cg, bicgstab, bicg, cgs, gmres, lgmres , apart from cg, L1TVOLS and L1TVLAD.

  • Support multi-dimensional transform and reconstruction (experimentally).

v0.3.2.9

  • Experimental support of NVIDIA’s graphic processing unit (GPU).

  • The experimental class gpuNUFFT requires pycuda, scikit-cuda, and python-cuda-cffi. scikit-cuda can be installed from standard command.

v0.3.2.8

  • Tested under Linux and Windows Anaconda3

v0.3

  • Updated setup.py

  • Removal of pyfftw due to segfault under some Linux distributions

Acknowledgements

I would be more than grateful for what contributors have done (either contributing codes or fixing bugs in pynufft). However, The information of contributors and partners are kept anonymized without their prior express informed consent. If anyone would like to be identified as a contributors/partners, please contact pynufft@gmail.com.

Financial supports

Cambridge Commonwealth, European and International Trust (Cambridge, UK)

Ministry of Education (Taiwan)

NVIDIA Corp for donating Titan X Pascal.

The * project (The *, * and *)

The * project

Contributors and partners

R******

T******

H******

J******

W*****

I******

A*****

A*****

B*****

D******

M*****

C******

C*****

J*****

J******

The * project (The *, * and *)

Indices and tables

Contact information

pynufft@gmail.com