Welcome to PyNUFFT’s User Manual!
Overview
The Python Non-uniform fast Fourier transform (PyNUFFT)
Purpose
The PyNUFFT user manual documents the Python non-uniform fast Fourier transform, a Python package for non-uniform fast Fourier transform.
PyNUFFT was created for practical purposes in industry and in research.
If you find PyNUFFT useful, please cite:
Lin, Jyh-Miin. “Python Non-Uniform Fast Fourier Transform (PyNUFFT): An Accelerated Non-Cartesian MRI Package on a Heterogeneous Platform (CPU/GPU).” Journal of Imaging 4.3 (2018): 51.
and
J.-M. Lin and H.-W. Chung, Pynufft: python non-uniform fast Fourier transform for MRI Building Bridges in Medical Sciences 2017, St John’s College, CB2 1TP Cambridge, UK
Users of PyNUFFT should be familiar with discrete Fourier transform (DFT).
The min-max interpolator
PyNUFFT translates the min-max interpolator to Python. The min-max interpolator is described in the literature:
Fessler JA, Sutton BP. Nonuniform fast Fourier transforms using min-max interpolation. IEEE Trans Signal Process 2003;51(2):560-574.
Current status of PyNUFFT
The current PyNUFFT offers NUFFT() or Reikan/PyCUDA/PyOpenCL (NUFFT(device), device is in helper.device_list()).
Switch between CPU and GPU by selecting device = pynufft.helper.device_list[0] (0 is the first device in the system)
LGPLv3 and AGPL (for web service)
Multi-dimensional NUFFT
Multi-dimensional transforms are supported by PyNUFFT.
The dimensionality of an imaging reconstruction problem is revealed as the number of axes of the Nd (or Kd, Jd) tuples.
For example, Nd = (256,256) indicates a 2D imaging reconstruction problem, in which the sizes of the x-y axes are 256 and 256, respectively.
Normally, the matrix size of the k-space is twice the size of Nd. For example, Kd = (512,512) is appropriate for the above Nd = (256,256) problem.
In batch mode, the ‘batch’ argument controls the number of channels. This will not affect the dimensionality of the image reconstruction problem. The batch model will be detailed in the ‘batched NUFFT’ section.
Fig. 1 illustrates the variables for 1D, 2D, 3D NUFFT.
CPU and GPU (HSA)
The PyNUFFT ran originally on Numpy/Scipy. Unfortunately the default Numpy/Scipy is most efficient on a single CPU core.
Later it was ported to PyCUDA and PyOpenCL, which allows us to leverage the speed of multi-core CPU and GPU.
Mixing NUFFTs with CPU/GPU is possible but has no warranty.
The class methods are listed in Table 1
method name |
NUFFT() |
NUFFT(helper.device_list[0]) |
References |
---|---|---|---|
__init__() |
✓ |
✓ |
Constructor |
plan() |
✓ |
✓ |
Planning the instance |
forward() |
✓ |
✓ |
Forward NUFFT A |
adjoint() |
✓ |
✓ |
Adjoint NUFFT A^H |
offload() |
× |
✓ |
Offload the NUFFT_hsa() to device. |
x2xx() |
✓ |
✓ |
Apply the scaling factor |
xx2k() |
✓ |
✓ |
Oversampled FFT |
k2y() |
✓ |
✓ |
Interpolation |
k2vec() |
✓ |
× |
Reshape the k-space to the vector |
vec2y() |
✓ |
× |
Multiply the vector to generate the data |
vec2k() |
✓ |
× |
Reshape the vector to k-space |
y2vec() |
✓ |
× |
Multiply the data to get the vector |
y2k() |
✓ |
✓ |
Adjoint of k2y() |
k2xx() |
✓ |
✓ |
Inverse FFT (excessive parts are cropped) |
xx2x() |
✓ |
✓ |
Apply the scaling factor |
_precompute |
✓ |
✓ |
Apply the scaling factor |
Parameters of PyNUFFT
Below we summarize the required variables in Table 2
Parameter |
NUFFT |
NUFFT(helper.device_list()[0]) |
References |
---|---|---|---|
om (Numpy Array) |
✓ |
✓ |
Non-Cartesian coordinates (M, dim) |
Nd (tuple) |
✓ |
✓ |
Size of the image grid |
Kd (tuple) |
✓ |
✓ |
Size of the oversampled Fourier grid |
Jd (tuple) |
✓ |
✓ |
Size of the interpolator |
ft_axes (tuple) |
optional |
optional |
FFT on the given axes (default = None (all axes)) |
radix (int) |
× |
optional |
radix (default = 1) |
Installation
System requirements
CPU
Each PyNUFFT instance is designed to be executed on a single node. PyNUFFT has no built-in distributed computing on multiple nodes, but the users can design their own one.
For example, users may install PyNUFFT on multiple nodes and control them through the network.
Multple NUFFT instances on a single node provided that the total memory is sufficient to keep all instances.
We recommend one or more modern x86_64 processors on a single node. Successful stories include Intel® Core™ i7-6700HQ Processor, Intel® Xeon® W-2125, Intel® Core™ i9-7900X.
Memory
A general instruction is that the memory should be sufficient for computing a single NUFFT object, which is dependent on the type of problem.
A single 2D problem of 256 × 256 matrix can be computed on a system with 8GB memory.
For 3D NUFFT, it is not uncommon for a single NUFFT object to consume more than 200GB memory.
GPU
Each PyNUFFT instance is initiated on a single GPU. An instance cannot be distributed across multiple GPUs. (PyNUFFT does not use cuFFT.)
However, multiple NUFFT_hsa instances may be initiated both on a single GPU and on multiple GPUs, but the performance may be impacted (limited by the memory, PCI-E bus or GPU cores).
To use GPU, a recent NVIDIA’s GPU (after Maxwell, Pascal) with the recent drivers should be working properly.
The newest nvidia-driver versions of 510.54 or later is recommended. Earlier versions may work but please be informed that Nvidia may discontinue support for outdated drivers.
A general rule is that the memory on the GPU has to be sufficient for computing the NUFFT problem. Successful stories include NVIDIA Geforce GTX 965m/GTX 1070 maxQ/1060 6GB, NVIDIA Titan V100 (Amazon Web Services), NVIDIA Titan X Pascal, and NVIDIA Quadro P6000.
Operating System
Ubunut 16.04 - 18.04 are recommended.
Windows 10 has been tested but it requires Microsoft Studio 2015 community. Please refer to the following special topic about the installation under Windows 10.
Software
Python
Users must be familiar with Python and its pip packaging system. Python 3.9 - 3.10 are currently supported (Python 2 has been discontinued).
To run the NUFFT, the basic CPython, Numpy and Scipy packages must be available on the system. IronPython is compatible with CPython so ipython might be useful.
PyNUFFT can be installed through the pip command. Optionally, users can clone the github repository and build the package from the local folder.
Compiler
NUFFT class does not require a compiler.
However, NUFFT relies on the JIT (just-in-time) compilation mechanism of Reikna/PyCUDA/PyOpenCL. The supporting compiler may be:
gcc-11.2.1
Microsoft (R) Visual Studio 2022 community edition.
(Please refer to the following section: special topic: Installation under Windows 10).
To accelerate the code on the graphic processing unit (GPU), Reikna, PyCUDA, PyOpencl must be available. Please refer the following special topic: Installation of OpenCL.
CUDA programming skills are not strictly needed. However, it may be helpful if users understand GPU programming.
General Installation
Continuum’s Anaconda environment should provide all the above packages.
Installation using pip
Install pynufft by using the pip command:
$ pip install pynufft
Installation from Git Repository
git is a version control program, which allows you to clone the latest code base from the pynufft repository:
$ git clone https://github.com/jyhmiinlin/pynufft
$ cd pynufft
$ python setup.py install --user
Uninstall pynufft
Simply use “pip uninstall” to remove pynufft:
$ pip uninstall pynufft
Test whether the installation is successful (deprecated)
Ask pynufft@gmail.com for technical support.
Special topics
Installation under Linux
Ubuntu 18.04 - 20.04
A successful example is Ubuntu.
The working kernel shown in uname -a is 4.15.0-42-generic #45-Ubuntu.
The Nvidia related packages are: Nvidia-driver Version: 410.78 CUDA Version: 10.0
Gentoo
PyNUFF works with the package in Gentoo system: dev-util/nvidia-cuda-toolkit-11.6.1 dev-util/intel-ocl-sdk-4.4.0.117-r1, and x11-drivers/nvidia-drivers-510.54.
Installation under Windows 10
PyNUFFT has been tested under the Windows 10 home edition, with Anaconda3-2021.11 64-bit, PyCUDA 2021.1 from official pip, Microsoft Visual Studio 2022 Community, and CUDA 11.6.
Pre-requisites
A NVIDIA GPU and a clean Windows 10
Install the Microsoft Visual Studio 2022 Community (check cl.exe, see troubleshooting).
Install CUDA-11.6 and the related driver.
Now open command prompt cmd, type
nvcc -V
You will see
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:09_12_Central_Daylight_time_2018
Cuda compilation tool, release 9.2, V9.2.148
Troubleshotting: Add the environmental variable path.
If the system cannot find cl.exe when you type cl:
C:\Users\User>cl
`cl` is not recognized as an internal or external command,
operable program or batch file.
this error is due to the fact that Visual Studio has not been added to the system path, so the system cannot find cl.
Add Path “C:\Program Files\Microsoft Visual Studio\2022\Community\VC\ToolsMSVC14.31.31103\bin\Hostx64x64” to system variables. (see troubleshotting)
Once Visual Studio has been added to the system, open Windows cmd and it should find cl.exe
C:\Users\User>cl
Microsoft (R) C/C++ Optimizing Compiler Version 19.0024215.1 for x86
Copyright (C) Microsoft Corpooration. All rights reserved.
usage: cl [ option... ] filename... [ /link linkoption... ]
without the earlier error message. If the error persists, the path must be modified again.
Installation of Anaconda3
Now install Anaconda3. I downloaded Anaconda3-2021.11 64-bit. Once this is done you can follow the general installation procedure as described above.
Installation of PyCUDA, Reikna and PyNUFFT
Open Anaconda3 command prompt, type:
pip install pycuda pip install reikna pip install pynufft
Test PyNUFFT:
python from pynufft import tests tests.test_init()
Installation of OpenCL
OpenCL is one of the backends that PyNUFFT supports. Up to the present, PyNUFFT has used OpenCL-1.2. One missing feature of OpenCL-1.2 is atomicAdd for the array with floating point numbers. PyNUFFT makes use of atomic_cmpxchg (compare and exchange) to implement the atomic_add_float subroutine, which can be seen in the pynufft.src.re_subroutine.atomic_add. This code has appeared in many resources, e.g. http://simpleopencl.blogspot.com/2013/05/atomic-operations-and-floats-in-opencl.html and https://github.com/clMathLibraries/clSPARSE/blob/master/src/library/kernels/csrmv_adaptive.cl.
Note that the OpenCL standard is still evolving and all of the OpenCL supports may change quickly. Old sdk may not work with the newest Intel chipsets. Please try different versions of the hardware and software.
The current compiler version is gcc version 7.3.0. Other compilers may be used on the target system but I haven’t tested any of them.
Intel OpenCL
Intel HD graphics after the Skylake generation usually support OpenCL as long as the suitable intel-sdk is installed.
One OpenCL example is a Gigabyte aero 15 Gentoo Linux on a machine with Intel Corporation HD Graphics 530 (rev 06). The dev-util/intel-ocl-sdk-4.4.0.117-r1 installed the intel_sdk_for_ocl_applications_2014_ubuntu_4.4.0.117_x64.tgz opencl package. The clinfo command shows
Platform Name Intel(R) OpenCL
Number of devices 1
Device Name Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 1.2 (Build 117)
Driver Version 1.2.0.117
Device OpenCL C Version OpenCL C 1.2
Device Type CPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 8
Max clock frequency 2600MHz
Device Partition (core)
Max number of sub-devices 8
Supported partition types by counts, equally, by names (Intel)
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 8192x8192x8192
Max work group size 8192
Preferred work group size multiple 128
Preferred / native vector sizes
char 1 / 32
short 1 / 16
int 1 / 8
long 1 / 4
half 0 / 0 (n/a)
float 1 / 8
double 1 / 4 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 33613447168 (31.3GiB)
Error Correction support No
Max memory allocation 8403361792 (7.826GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 262144 (256KiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 480
Max size for 1D images from buffer 525210112 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 480
Max number of write image args 480
Local memory type Global
Local memory size 32768 (32KiB)
Max number of constant args 480
Max constant buffer size 131072 (128KiB)
Max size of kernel argument 3840 (3.75KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Local thread execution (Intel) Yes
Prefer user sync for interop No
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels (n/a)
Device Extensions cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_intel_exec_by_local_thread cl_khr_depth_images cl_khr_3d_image_writes cl_khr_fp64
Pure CPU system without Intel HD graphics may require the newest Intel SDK for OpenCL https://software.intel.com/en-us/intel-opencl and https://software.intel.com/en-us/articles/opencl-drivers. One pure CPU system with Intel i7 7900X can make use of Intel Studio 2019.
Nvidia OpenCL
NVIDIA also supports OpenCL 1.2. A successful installation made use of nvidia-driver 417.18 and CUDA-SDK-9.2.88 and gcc 7.3.0. clinfo shows
Platform Name NVIDIA CUDA
Number of devices 1
Device Name GeForce GTX 1060
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 1.2 CUDA
Driver Version 415.18
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Topology (NV) PCI-E, 01:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 10
Max clock frequency 1670MHz
Compute Capability (NV) 6.1
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x64
Max work group size 1024
Preferred work group size multiple 32
Warp size (NV) 32
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (n/a)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 6373572608 (5.936GiB)
Error Correction support No
Max memory allocation 1593393152 (1.484GiB)
Unified memory for Host and Device No
Integrated memory (NV) No
Minimum alignment for any data type 128 bytes
Alignment of base address 4096 bits (512 bytes)
Global Memory cache type Read/Write
Global Memory cache size 163840 (160KiB)
Global Memory cache line size 128 bytes
Image support Yes
Max number of samplers per kernel 32
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 16384x32768 pixels
Max 3D image size 16384x16384x16384 pixels
Max number of read image args 256
Max number of write image args 16
Local memory type Local
Local memory size 49152 (48KiB)
Registers per block (NV) 65536
Max number of constant args 9
Max constant buffer size 65536 (64KiB)
Max size of kernel argument 4352 (4.25KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop No
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Kernel execution timeout (NV) No
Concurrent copy and kernel execution (NV) Yes
Number of async copy engines 2
printf() buffer size 1048576 (1024KiB)
Built-in kernels (n/a)
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer
AMD GPU
AMD has a very good support for OpenCL. See AMDGPU-PRO driver. AMD usually performs very well for fp64 (double precision PyNUFFT is available on request).
Open-source Intel Compute OpenCL (Intel-NEO and Intel SDK)
Try to install Intel’s proprietary OpenCL sdk.
Tutorial
Basic use of PyNUFFT
This section navigates you through the basic use of PyNUFFT.
Initiating a PyNUFFT object
We can initiate a PyNUFFT by importing the NUFFT object:
# import NUFFT class
from pynufft import NUFFT
# Initiate the NufftObj object
NufftObj = NUFFT()
The NufftObj object has been created but at this point it is still empty.
Now we have to plan the NufftObj by calling the plan() method. The plan() method takes the input variables and plans for the object. Now we can plan for the NufftObj object given the non-Cartesian coordinates (om).
In the following code we have 100 random samples spreading across the 2D plane.
# generating 2D random coordinates import numpy om = numpy.random.randn(100, 2)
Plan for the NUFFT object
Now we call:
NufftObj.plan(om, Nd, Kd, Jd)
See :py:class: pynufft.NUFFT
Forward NUFFT
The forward NUFFT transforms the image into non-Cartesian samples.
y = NufftObj.forward(image)
The image.shape is equal to Nd. The returned y has a shape which is equal to (M, )
See pynufft.NUFFT.forward()
Adjoint NUFFT
The adjoint NUFFT transforms the non-Cartesian samples into the image
x2 = NufftObj.adjoint(y)
y has a shape which is equal to (M, ). The returned image.shape is equal to Nd. Note that the result is divided by a scaling factor numpy.prod(NufftObj.Kd), which is equal to the default normalization factor of numpy.fft.ifft.
The 1D example
Import pynufft module
In python environment, import pynufft module:
from pynufft import NUFFT
Create a pynufft object NufftObj:
NufftObj = NUFFT()
Planning
The M locations of the non-uniform samples (om) must be provided:
import numpy
om = numpy.random.randn(1512,1)
# om is an M x 1 ndarray: locations of M points. *om* is normalized between [-pi, pi]
# Here M = 1512
In addition, the size of time series (Nd), oversampled grid (Kd), and interpolatro size (Jd) are:
Nd = (256,)
Kd = (512,)
Jd = (6,)
Now provide NufftObj with these parameters:
NufftObj.plan(om, Nd, Kd, Jd)
Forward transform
Now NufftObj has been prepared and is ready for computations. Continue with an example.:
import numpy
import matplotlib.pyplot as pyplot
time_data = numpy.zeros(256, )
time_data[96:128+32] = 1.0
pyplot.plot(time_data)
pyplot.ylim(-1,2)
pyplot.show()
This generates a time series Fig. 2.
NufftObj transform the time_data to non-Cartesian locations:
nufft_freq_data =NufftObj.forward(time_data)
pyplot.plot(om,nufft_freq_data.real,'.', label='real')
pyplot.plot(om,nufft_freq_data.imag,'r.', label='imag')
pyplot.legend()
pyplot.show()
This displays the non-Cartesian spectrum Fig. 3.
Signal restoration through “solve()”
The signal can be solved by the solve() method
restore_time = NufftObj.solve(nufft_freq_data,'cg', maxiter=30)
restore_time2 = NufftObj.solve(nufft_freq_data,'L1TVOLS', maxiter=30,rho=1)
Now display the restored signals:
im1,=pyplot.plot(numpy.abs(time_data),'r',label='original signal')
im3,=pyplot.plot(numpy.abs(restore_time2),'k--',label='L1TVOLS')
im4,=pyplot.plot(numpy.abs(restore_time),'r:',label='conjugate_gradient_method')
pyplot.legend([im1, im3,im4])
pyplot.show()
The complete code is:
The 2D example
Import pynufft module
In python environment, import pynufft module and other packages:
import numpy
import scipy.misc
import matplotlib.pyplot
from pynufft import NUFFT
Loading the X-Y locations(“om”)
It requires the x-y coordinates of M points to plan NufftObj.
A 2D trajectory from my PROPELLER MRI research is provided in the pynufft package.:
import pkg_resources
DATA_PATH = pkg_resources.resource_filename('pynufft', './src/data/')
om = numpy.load(DATA_PATH+'om2D.npz')['arr_0']
The M locations of non-uniform samples (om) forms an M x 2 numpy.ndarray
print(om)
[[-3.12932086 0.28225246]
[-3.1047771 0.28225246]
[-3.08023357 0.28225246]
....
[-2.99815702 0.76063216]
[-3.02239823 0.76447165]
[-3.04663992 0.76831114]]
You can see the 2D M locations by plotting x versus y:
matplotlib.pyplot.plot(om[::10,0],om[::10,1],'o')
matplotlib.pyplot.title('non-uniform coordinates')
matplotlib.pyplot.xlabel('axis 0')
matplotlib.pyplot.ylabel('axis 1')
matplotlib.pyplot.show()
As can be seen in Fig. 5:
Planning Create a pynufft object NufftObj:
NufftObj = NUFFT()
Provided om, the size of time series (Nd), oversampled grid (Kd), and interpolatro size (Jd)
Nd = (256, 256) # image size
print('setting image dimension Nd...', Nd)
Kd = (512, 512) # k-space size
print('setting spectrum dimension Kd...', Kd)
Jd = (6, 6) # interpolation size
print('setting interpolation size Jd...', Jd)
Now we can plan NufftObj with these parameters:
NufftObj.plan(om, Nd, Kd, Jd)
Forward transform
Now NufftObj has been prepared and is ready for computations. We continue with an example.:
image = scipy.misc.ascent()[::2, ::2]
image=image/numpy.max(image[...])
print('loading image...')
matplotlib.pyplot.imshow(image.real, cmap=matplotlib.cm.gray)
matplotlib.pyplot.show()
This displays the image Fig. 6.
NufftObj transform the time_data to non-Cartesian locations:
y = NufftObj.forward(image)
Image restoration with solve():
The image can be restored from non-Cartesian samples y:
image0 = NufftObj.solve(y, solver='cg',maxiter=50)
image3 = NufftObj.solve(y, solver='L1TVOLS',maxiter=50,rho=0.1)
image2 = NufftObj.adjoint(y ) # adjoint
matplotlib.pyplot.subplot(1,3,1)
matplotlib.pyplot.title('Restored image (cg)')
matplotlib.pyplot.imshow(image0.real, cmap=matplotlib.cm.gray, norm=matplotlib.colors.Normalize(vmin=0.0, vmax=1))
matplotlib.pyplot.subplot(1,3,2)
matplotlib.pyplot.imshow(image2.real, cmap=matplotlib.cm.gray, norm=matplotlib.colors.Normalize(vmin=0.0, vmax=5))
matplotlib.pyplot.title('Adjoint transform')
matplotlib.pyplot.subplot(1,3,3)
matplotlib.pyplot.title('L1TV OLS')
matplotlib.pyplot.imshow(image3.real, cmap=matplotlib.cm.gray, norm=matplotlib.colors.Normalize(vmin=0.0, vmax=1))
matplotlib.pyplot.show()
The spectrum of the restored image:
Manual
NUFFT
The NUFFT class
The NUFFT class encapsulates the NUFFT function using the Numpy/Scipy environment. This allows portability so the NUFFT() can be easily ported to Windows and Linux. Users can install their favourite Python distribution. So far, I have tested Anaconda, intel-python, intel-numpy and open-source python.
However, the performance of NUFFT is therefore impacted by the Numpy implementation.
Defining the equispaced to non-Cartesian transform as operator A, the NUFFT class provides the following methods:
forward() method computes the single forward operation A.
adjoint() method computes the scaled adjoint operation A^H. The result is divided by a scaling factor of numpy.prod(Kd).
selfadjoint() method computes the single selfadjoint operation A^H A.
- solve() method links many solvers in pynufft.linalg.solver,
which is based on the solvers of scipy.sparse.linalg.cg, scipy.sparse.linalg. ‘lsqr’, ‘dc’, ‘bicg’, ‘bicgstab’, ‘cg’, ‘gmres’, ‘lgmres’
Attributes
NUFFT.ndims: the dimension
NUFFT.ft_axes: the axes where the FFT takes place
NUFFT.Nd: Tuple, the dimensions of the image
NUFFT.Kd: Tuple, the dimensions of the oversampled k-space
The life-cycle of an NUFFT object
NUFFT employs the plan-execution two-stage model. This can maximize the runtime speed, at the cost of the extra precomputation times and extra memory.
Instantiating an NUFFT instance defines only some instance attributes. Instance attributes will be replaced later once the method plan() is called.
Then the plan() method will call the helper.plan() function, which constructs the scaling factor and the interpolator. The interpolator is precomputed and stored in the Compressed Sparse Row (CSR) format. See scipy.sparse.csr_matrix and CSR in Wikipedia.
Once the object has been planned, the forward() and adjoint() methods reuse the saved scaling factors and interpolators.
NUFFT(device)
The NUFFT(device) class
NUFFT(device) computes NUFFT on the CUDA/OpenCL device.
Defining the equispaced to non-Cartesian transform as operator A, the NUFFT(device) class provides the following methods:
forward() method computes the single forward operation A.
adjoint() method computes the scaled adjoint operation A^H. The result is divided by the scaling factor of numpy.prod(Kd).
selfadjoint() method computes the single selfadjoint operation A^H A.
solve() method now only offers ‘cg’, ‘dc’, ‘L1TVOLS’
Attributes
NUFFT.ndims: the dimension
NUFFT.ft_axes: the axes where the FFT takes place
NUFFT.Nd: Tuple, the dimensions of the image
NUFFT.Kd: Tuple, the dimensions of oversampled k-space
Acceleration on PyCUDA/PyOpenCL
The NUFFT_hsa was designed to accelerate the NUFFT function on the multi-core CPU and GPU, using PyOpenCL and PyCUDA backends. This was made possible by using Reikna meta-package.
If multiple NUFFT(device) objects are created with the PyCUDA backend, each call can be executed only after the context has ‘popped up’. This is achieved by the decorator function push_cuda_context(): calling NUFFT(device) methods will trigger the decorator and get the context popped up. However, PyOpenCL has no such restriction and the call will automatically bypass the decorator for the NUFFT(device) with the PyOpenCL backend.
Different objects can be constructed on different PyCUDA and PyOpenCL backends.
The life-cycle of the PyNUFFT(device) object
NUFFT(device) employs the plan-execution two-stage model. This can maximize the runtime speed, at the cost of the extra precomputation times and extra memory.
Instantiating an NUFFT(device) instance also initiates the context and defines some instance attributes. The context is linked to the accelerator and the kernels are compiled on the chosen device. Instance attributes will be replaced later when the method plan() is called.
Then the plan() method calls the helper.plan() function, which constructs the scaling factor and the interpolator. The interpolator is precomputed and stored as multiple 1D ELLpack (ELL) sparse matrices. Each ELL matrix preserves the sparse matrix as the data and column indices. Multi-dimensional interpolators are implemented as a concatenated multiple 1D ELL sparse matrices. The actual data and column indices are inferred from the meshindex. At the end of the plan() method, the offload() method transfers the precomputed arrays to the accelerator.
The run-time computations reuse the saved scaling factors and interpolators.
Multiple NUFFT instances
Multiple NUFFT instances (of mixed types) can coexist at the same time. Each instance has its own memory. However, multiple instances may be penalized by hardware and reduced runtime speed.
Multiple instances can be planned after all the instances have been created. Alternatively, each instance can be planned immediately after being created.
Note that multiple PyCUDA instances were made possible since 2019.1.1, by introducing the push_cuda_context() decorating function. Versions earlier than 2019.1.1 do not support multiple PyCUDA backends. This is because every call to a CUDA context will require the current context to pop up to the top of the stack of the contexts.
PyOpenCL does not have the context pop-up issue but please always use the newest version.
Multiple NUFFT instances
Multiple instances can be planned after all the instances have been created:
# Create the first NUFFT
NufftObj1 = NUFFT()
# Create the second NUFFT
NufftObj2 = NUFFT()
# Plan the first instance
NufftObj1.plan(om1, Nd, Kd, Jd)
NufftObj2.plan(om2, Nd, Kd, Jd)
or each instance can be planned once it has been created:
# Create the first NUFFT
NufftObj1 = NUFFT()
NufftObj1.plan(om1, Nd, Kd, Jd)
# Create the second NUFFT
NufftObj2 = NUFFT()
NufftObj2.plan(om2, Nd, Kd, Jd)
y1 = NufftObj1.forward(x)
y2 = NufftObj2.forward(x)
Multiple NUFFT_hsa instances
Like NUFFT, each instance can be planned immediately after being created:
from pynufft import NUFFT, helper
# Create the first NUFFT(device)
NufftObj1 = NUFFT(helper.device_list()[0])
NufftObj1.plan(om1, Nd, Kd, Jd)
# Create the second NUFFT(device)
NufftObj2 = NUFFT(helper.device_list()[0])
NufftObj2.plan(om2, Nd, Kd, Jd)
y1 = NufftObj1.forward(x)
y2 = NufftObj2.forward(x)
Multiprocessing (experimental)
The multiprocessing module is the built-in parallel method of C-Python. PyNUFFT may (experimentally) work together with the multiprocessing module of Python.
The mutliprocessing module relies on pickle to serialize the data, whereas the PyCUDA and PyOpenCL contexts are “unpicklable”. Thus, I found that multiprocessing for PyNUFFT must fulfil the following conditions: (1) each NUFFT instance should be created and then executed in a separate process; (2) any CUDA/PyOpenCL related object cannot be sent or planned in advance, and (3) taskset should be used to assign a process to a specified CPU core.
It is the user’s responsibility to take care of the hardware (total memory and IO).
One example of working with multiprocessing (mixed CUDA and OpenCL backends) is as follows. In this example, an “atomic_NUFFT” class is created as a high-level wrapper for the creation and execution of NUFFT. This example has only been tested in Linux because parallel computing is highly platform dependent.
"""
An example of multiprocessing with NUFFT using PyCUDA backend,
wrapped inside the atomic_NUFFT wrapper class.
The two processes are running on two CPU cores.
Each process creates one NUFFT and offloads the computations to GPU.
nvidia-smi confirms that two python programs are using the GPU.
"""
import numpy
from pynufft import NUFFT, helper
import scipy.misc
import matplotlib.pyplot
import multiprocessing
import os
class atomic_NUFFT:
def __init__(self, om, Nd, Kd, Jd, device_indx):
"""
This caches the parameters only.
Any other GPU related stuffs are carried out in run()
"""
self.om = om
self.Nd = Nd
self.Kd = Kd
self.Jd = Jd
# self.API = API
self.device_indx = device_indx
def run(self, x, cpu_cores):
"""
In this method, the NUFFT are created and executed on a fixed CPU core.
"""
pid= os.getpid()
print('pid=', pid)
os.system("taskset -p -c %d-%d %d" % (cpu_cores[0], cpu_cores[1], pid))
"""
Control the CPU affinity. Otherwise the process on one core can be switched to another core.
"""
# create NUFFT
# NUFFT = NUFFT(self.API, )
# plan the NUFFT
device_list = helper.device_list()
self.NUFFT = NUFFT(device_list[self.device_indx])
self.NUFFT.plan(self.om, self.Nd, self.Kd, self.Jd)
# send the image to device
gx = self.NUFFT.to_device(x)
# carry out 10000 forward transform
for pp in range(0, 10000):
gy = self.NUFFT._forward_device(gx)
# return the object
return gy.get()
Nd = (256,256)
Kd = (512,512)
Jd = (6,6)
om = numpy.random.randn(35636, 2)
x = scipy.misc.ascent()[::2,::2]
om1 = om[om[:,0]>0, :]
om2 = om[om[:,0]<=0, :]
# create pool
pool = multiprocessing.Pool(2)
# create the list to receive the return values
results = []
# Now enter the first process
# This is the standard multiprocessing Pool
D = atomic_NUFFT(om1, Nd, Kd, Jd, 0)
# async won't obstruct the next line of code
result = pool.apply_async(D.run, args = (x, (0,3)))
# the result is appended
results.append(result)
# Now enter the second process
# This is the standard multiprocessing Pool
D = atomic_NUFFT(om2, Nd, Kd, Jd, 0)
# Non-obstructive
result = pool.apply_async(D.run, args = (x, (4,7)))
results.append(result)
# closing the pool
pool.close()
pool.join()
# results are appended
# Now print the outputs
result1 = results[0].get()
result2 = results[1].get()
# check CPU results
NUFFT_cpu1 = NUFFT()
NUFFT_cpu1.plan(om1, Nd, Kd, Jd)
y1 = NUFFT_cpu1.forward(x)
print('norm = ', numpy.linalg.norm(y1 - result1) / numpy.linalg.norm(y1))
NUFFT_cpu2 = NUFFT()
NUFFT_cpu2.plan(om2, Nd, Kd, Jd)
y2 = NUFFT_cpu2.forward(x)
print('norm = ', numpy.linalg.norm(y2 - result2) / numpy.linalg.norm(y2))
k-Space trajectories (om)
Cartesian k-space
This section aims to provide a good example to show that NUFFT can be used to compute many different trajectories, including the Cartesian k-space.
However, Cartesian k-space should mostly be computed by FFT and this section is provided only for testing.
In the example, we generate a PyNUFFT object and make a plan using Cartesian k-space, followed by the NUFFT transform.
The data is created by NUFFT but on the Cartesian grid.
Last the Cartesian data are transformed back to image by IFFT (with two ifftshifts before and after ifftn):
# Generating trajectories for Cartesian k-space
import numpy
import matplotlib.pyplot
matplotlib.pyplot.gray()
def fake_Cartesian(Nd):
dim = len(Nd) # dimension
M = numpy.prod(Nd)
om = numpy.zeros((M, dim), dtype = numpy.float)
grid = numpy.indices(Nd)
for dimid in range(0, dim):
om[:, dimid] = (grid[dimid].ravel() *2/ Nd[dimid] - 1.0)*numpy.pi
return om
import scipy.misc
from pynufft import NUFFT
Nd = (256,256)
Kd = (512,512)
Jd = (6,6)
image = scipy.misc.ascent()[::2,::2]
om = fake_Cartesian(Nd)
print('Number of samples (M) = ', om.shape[0])
print('Dimension = ', om.shape[1])
print('Nd = ', Nd)
print('Kd = ', Kd)
print('Jd = ', Jd)
NufftObj = NUFFT()
NufftObj.plan(om, Nd, Kd, Jd)
y = NufftObj.forward(image)
y2 = y.reshape(Nd, order='C')
x2 = numpy.fft.ifftshift(numpy.fft.ifftn(numpy.fft.ifftshift(y2)))
matplotlib.pyplot.subplot(1,3,1)
matplotlib.pyplot.imshow(image.real, vmin = 0, vmax = 255)
matplotlib.pyplot.title('Original image')
matplotlib.pyplot.subplot(1,3,2)
matplotlib.pyplot.imshow(x2.real, vmin = 0, vmax = 255)
matplotlib.pyplot.title('Restored image')
matplotlib.pyplot.subplot(1,3,3)
matplotlib.pyplot.imshow(abs(image - x2), vmin = 0, vmax = 255)
matplotlib.pyplot.title('Difference map')
matplotlib.pyplot.show()
As you can see, the resulting images (Fig. 9) confirm that NUFFT + IFFT can restore the original image.
Radial k-space
We can generate the radial spokes on the 2D plane. Each radial spoke spans the range of [-\pi, \pi] at the angle \theta and each point is fully determined by the polar coordinate (R, \theta). See Fig. 10 for more information.
The following code generates 360 radial spokes:
# generating 2D radial coordinates
import numpy
spoke_range = (numpy.arange(0, 512) - 256.0 )* numpy.pi/ 256 # normalized between -pi and pi
M = 512*360
om = numpy.empty((M,2), dtype = numpy.float32)
for angle in range(0, 360):
radian = angle * 2 * numpy.pi/ 360.0
spoke_x = spoke_range * numpy.cos(radian)
spoke_y = spoke_range * numpy.sin(radian)
om[512*angle : 512*(angle + 1) ,0] = spoke_x
om[512*angle : 512*(angle + 1) ,1] = spoke_y
import matplotlib.pyplot
matplotlib.pyplot.plot(om[:,0], om[:,1],'.')
matplotlib.pyplot.show()
API documentation
NUFFT class
- class pynufft.nufft.__init__.NUFFT(device_indx=None, legacy=None)
A super class of cpu and gpu NUFFT functions.
Note: NUFFT does not inherit NUFFT_cpu (deprecated) and NUFFT_hsa (deprecated).
- __init__(device_indx=None, legacy=None)
Constructor.
- Parameters
None (Python NoneType) –
- Returns
NUFFT: the pynufft.NUFFT instance
- Return type
NUFFT: the pynufft.NUFFT class
- Example
>>> from pynufft import NUFFT >>> NufftObj = NUFFT()
or
>>> from pynufft import NUFFT, helper >>> device = helper.device_list()[0] >>> NufftObj = NUFFT(device) # for first acceleration device in the system
- __weakref__
list of weak references to the object (if defined)
- _adjoint_cpu(y)
Adjoint NUFFT on CPU
- Parameters
y – The input numpy array, with the size of (M,)
- Type
numpy array with the dtype of numpy.complex64
- Returns
x: The output numpy array, with the size of Nd or Nd
- Return type
numpy array with the dtype of numpy.complex64
- _adjoint_device(gy)
Adjoint NUFFT on the heterogeneous device
- Parameters
gy – The input gpu array, with size=(M,)
- Type
reikna gpu array with dtype =numpy.complex64
- Returns
gx: The output gpu array, with size=Nd
- Return type
reikna gpu array with dtype =numpy.complex64
- _adjoint_legacy(gy)
Adjoint NUFFT on the heterogeneous device
- Parameters
gy – The input gpu array, with size=(M,)
- Type
reikna gpu array with dtype =numpy.complex64
- Returns
gx: The output gpu array, with size=Nd
- Return type
reikna gpu array with dtype =numpy.complex64
- _forward_cpu(x)
Forward NUFFT on CPU
- Parameters
x – The input numpy array, with the size of Nd
- Type
numpy array with the dtype of numpy.complex64
- Returns
y: The output numpy array, with the size of (M,)
- Return type
numpy array with the dtype of numpy.complex64
- _forward_device(gx)
Forward NUFFT on the heterogeneous device
- Parameters
gx (reikna gpu array with dtype = numpy.complex64) – The input gpu array, with size = Nd
- Returns
gy: The output gpu array, with size = (M,)
- Return type
reikna gpu array with dtype = numpy.complex64
- _forward_legacy(gx)
Forward NUFFT on the heterogeneous device
- Parameters
gx (reikna gpu array with dtype = numpy.complex64) – The input gpu array, with size = Nd
- Returns
gy: The output gpu array, with size = (M,)
- Return type
reikna gpu array with dtype = numpy.complex64
- _init__cpu()
Constructor.
- Parameters
None (Python NoneType) –
- Returns
NUFFT: the pynufft_hsa.NUFFT instance
- Return type
NUFFT: the pynufft_hsa.NUFFT class
- Example
>>> from pynufft import NUFFT >>> NufftObj = NUFFT()
- _init__device(device_indx=None)
Constructor.
- Parameters
API (string) – The API for the heterogeneous system. API=’cuda’ or API=’ocl’
platform_number (integer) – The number of the platform found by the API.
device_number (integer) – The number of the device found on the platform.
verbosity (integer) – Defines the verbosity level, default value is 0
- Returns
0
- Return type
int, float
- Example
>>> from pynufft import NUFFT >>> NufftObj = NUFFT_device(API='cuda', platform_number=0, device_number=0, verbosity=0)
- _k2xx_cpu(k)
- Private: the inverse FFT and image cropping (which is the reverse of
_xx2k() method)
- _k2xx_device(k)
Private: the inverse FFT and image cropping (which is the reverse of _xx2k() method)
- _k2xx_one2one_cpu(k)
- Private: the inverse FFT and image cropping
(which is the reverse of _xx2k() method)
- _k2y2k_cpu(k)
- Private: the integrated interpolation-gridding by the Sparse
Matrix-Vector Multiplication
- _k2y_cpu(k)
Private: interpolation by the Sparse Matrix-Vector Multiplication
- _k2y_device(k)
Private: interpolation by the Sparse Matrix-Vector Multiplication
- _k2y_legacy(k)
Private: interpolation by the Sparse Matrix-Vector Multiplication
- _offload_device()
self.offload():
Off-load NUFFT to the opencl or cuda device(s)
- Parameters
API (string) – define the device type, which can be ‘cuda’ or ‘ocl’
platform_number (int) – define which platform to be used. The default platform_number = 0.
device_number (int) – define which device to be used. The default device_number = 0.
- Returns
self: instance
- _offload_legacy()
self.offload():
Off-load NUFFT to the opencl or cuda device(s)
- Parameters
API (string) – define the device type, which can be ‘cuda’ or ‘ocl’
platform_number (int) – define which platform to be used. The default platform_number = 0.
device_number (int) – define which device to be used. The default device_number = 0.
- Returns
self: instance
- _plan_cpu(om, Nd, Kd, Jd, ft_axes=None)
Plan the NUFFT object with the geometry provided.
- Parameters
om (numpy.float array, matrix size = M * ndims) – The M off-grid locates in the frequency domain, which is normalized between [-pi, pi]
Nd (tuple, ndims integer elements.) –
The matrix size of the equispaced image. Example: Nd=(256,256) for a 2D image;
Nd = (128,128,128) for a 3D image
Kd (tuple, ndims integer elements.) –
The matrix size of the oversampled frequency grid. Example: Kd=(512,512) for 2D image;
Kd = (256,256,256) for a 3D image
Jd (tuple, ndims integer elements.) –
The interpolator size. Example: Jd=(6,6) for 2D image;
Jd = (6,6,6) for a 3D image
ft_axes (None, or tuple with optional integer elements.) – (Optional) The axes for Fourier transform. The default is all axes if ‘None’ is given.
batch – (Optional) Batch mode. If the batch is provided, the last appended axis is the number of identical NUFFT to be transformed. The default is ‘None’.
- Returns
0
- Return type
int, float
- Variables
Nd – initial value: Nd
Kd – initial value: Kd
Jd – initial value: Jd
ft_axes – initial value: None
- Example
>>> from pynufft import NUFFT >>> NufftObj = NUFFT() >>> NufftObj.plan(om, Nd, Kd, Jd)
or
>>> NufftObj.plan(om, Nd, Kd, Jd, ft_axes)
- _plan_device(om, Nd, Kd, Jd, ft_axes=None, radix=None)
Design the multi-coil or single-coil memory reduced interpolator.
- Parameters
om (numpy.float array, matrix size = (M, ndims)) – The M off-grid locations in the frequency domain. Normalized between [-pi, pi]
Nd (tuple, ndims integer elements.) –
The matrix size of equispaced image. Example: Nd=(256, 256) for a 2D image;
Nd = (128, 128, 128) for a 3D image
Kd (tuple, ndims integer elements.) – The matrix size of the oversampled frequency grid. Example: Kd=(512,512) for 2D image; Kd = (256,256,256) for a 3D image
Jd (tuple, ndims integer elements.) – The interpolator size. Example: Jd=(6,6) for 2D image; Jd = (6,6,6) for a 3D image
ft_axes (tuple, selected axes to be transformed.) – The dimensions to be transformed by FFT. Example: ft_axes = (0, 1) for 2D, ft_axes = (0, 1, 2) for 3D; ft_axes = None for all dimensions.
radix – expert mode. If provided, the shape is Nd. The last axis is the number of parallel coils.
- Returns
0
- Return type
int, float
- Example
>>> import pynufft >>> device=pynufft.helper.device_list()[0] >>> NufftObj = pynufft.NUFFT(device) >>> NufftObj.plan(om, Nd, Kd, Jd)
- _plan_legacy(om, Nd, Kd, Jd, ft_axes=None)
Design the min-max interpolator.
- Parameters
om (numpy.float array, matrix size = M * ndims) – The M off-grid locations in the frequency domain. Normalized between [-pi, pi]
Nd (tuple, ndims integer elements.) – The matrix size of equispaced image. Example: Nd=(256,256) for a 2D image; Nd = (128,128,128) for a 3D image
Kd (tuple, ndims integer elements.) – The matrix size of the oversampled frequency grid. Example: Kd=(512,512) for 2D image; Kd = (256,256,256) for a 3D image
Jd (tuple, ndims integer elements.) – The interpolator size. Example: Jd=(6,6) for 2D image; Jd = (6,6,6) for a 3D image
- Returns
0
- Return type
int, float
- Example
>>> import pynufft >>> NufftObj = pynufft.NUFFT_cpu() >>> NufftObj.plan(om, Nd, Kd, Jd)
- _precompute_sp_cpu()
- Private: Precompute adjoint (gridding) and Toepitz interpolation
matrix.
- Parameters
None (Python Nonetype) –
- Returns
self: instance
- _selfadjoint_cpu(x)
selfadjoint NUFFT on CPU
- Parameters
x – The input numpy array, with size=Nd
- Type
numpy array with dtype =numpy.complex64
- Returns
x: The output numpy array, with size=Nd
- Return type
numpy array with dtype =numpy.complex64
- _selfadjoint_device(gx)
selfadjoint NUFFT on the heterogeneous device
- Parameters
gx (reikna gpu array with dtype =numpy.complex64) – The input gpu array, with size=Nd
- Returns
gx: The output gpu array, with size=Nd
- Return type
reikna gpu array with dtype =numpy.complex64
- _selfadjoint_legacy(gx)
selfadjoint NUFFT on the heterogeneous device
- Parameters
gx (reikna gpu array with dtype =numpy.complex64) – The input gpu array, with size=Nd
- Returns
gx: The output gpu array, with size=Nd
- Return type
reikna gpu array with dtype =numpy.complex64
- _solve_cpu(y, solver=None, *args, **kwargs)
Solve NUFFT. :param y: data, numpy.complex64. The shape = (M,) or (M, batch) :param solver: ‘cg’, ‘L1TVOLS’, ‘lsmr’, ‘lsqr’, ‘dc’, ‘bicg’,
‘bicgstab’, ‘cg’, ‘gmres’,’lgmres’
- Parameters
maxiter (int) – the number of iterations
- Returns
numpy array with size. The shape = Nd (‘L1TVOLS’) or Nd (‘lsmr’, ‘lsqr’, ‘dc’,’bicg’,’bicgstab’,’cg’, ‘gmres’,’lgmres’)
- _solve_device(gy, solver=None, *args, **kwargs)
The solver of NUFFT
- Parameters
gy (reikna array, dtype = numpy.complex64) – data, reikna array, (M,) size
solver (string) – could be ‘cg’, ‘L1TVOLS’, ‘L1TVLAD’
maxiter (int) – the number of iterations
- Returns
reikna array with size Nd
- _solve_legacy(gy, solver=None, *args, **kwargs)
The solver of NUFFT
- Parameters
gy (reikna array, dtype = numpy.complex64) – data, reikna array, (M,) size
solver (string) – could be ‘cg’, ‘L1TVOLS’, ‘L1TVLAD’
maxiter (int) – the number of iterations
- Returns
reikna array with size Nd
- _vec2k_cpu(k_vec)
Sorting the vector to k-spectrum Kd array
- _vec2y_cpu(k_vec)
gridding:
- _x2xx_cpu(x)
Private: Scaling on CPU Inplace multiplication of self.x_Nd by the scaling factor self.sn.
- _xx2k_cpu(xx)
Private: oversampled FFT on CPU
Firstly, zeroing the self.k_Kd array Second, copy self.x_Nd array to self.k_Kd array by cSelect Third, inplace FFT
- _xx2k_device(xx)
Private: oversampled FFT on the heterogeneous device
First, zeroing the self.k_Kd array Second, copy self.x_Nd array to self.k_Kd array by cSelect Third, inplace FFT
- _xx2k_one2one_cpu(xx)
Private: oversampled FFT on CPU
First, zeroing the self.k_Kd array Second, copy self.x_Nd array to self.k_Kd array by cSelect Third, inplace FFT
- _xx2x_cpu(xx)
Private: rescaling, which is identical to the _x2xx() method
- _y2k_cpu(y)
Private: gridding by the Sparse Matrix-Vector Multiplication
- _y2k_device(y)
Private: gridding by the Sparse Matrix-Vector Multiplication Atomic_twosum together provide better accuracy than generic atomic_add. See: ocl_add and cuda_add code-strings in atomic_add(), inside the re_subroutine.py.
- _y2k_legacy(y)
Private: gridding by the Sparse Matrix-Vector Multiplication
- _y2vec_cpu(y)
regridding non-uniform data (unsorted vector)
- adjoint(*args, **kwargs)
Adjoint NUFFT (host code)
- Parameters
y – The input numpy array, with the size of (M,)
- Type
numpy array with the dtype of numpy.complex64
- Returns
x: The output numpy array, with the size of Nd or Nd
- Return type
numpy array with the dtype of numpy.complex64
- forward(*args, **kwargs)
Forward NUFFT (host code)
- Parameters
x – The input numpy array, with the size of Nd
- Type
numpy array with the dtype of numpy.complex64
- Returns
y: The output numpy array, with the size of (M,)
- Return type
numpy array with the dtype of numpy.complex64
- plan(*args, **kwargs)
Plan the NUFFT object with the geometry provided.
- Parameters
om (numpy.float array, matrix size = M * ndims) – The M off-grid locates in the frequency domain, which is normalized between [-pi, pi]
Nd (tuple, ndims integer elements.) –
The matrix size of the equispaced image. Example: Nd=(256,256) for a 2D image;
Nd = (128,128,128) for a 3D image
Kd (tuple, ndims integer elements.) –
The matrix size of the oversampled frequency grid. Example: Kd=(512,512) for 2D image;
Kd = (256,256,256) for a 3D image
Jd (tuple, ndims integer elements.) –
The interpolator size. Example: Jd=(6,6) for 2D image;
Jd = (6,6,6) for a 3D image
ft_axes (None, or tuple with optional integer elements.) – (Optional) The axes for Fourier transform. The default is all axes if ‘None’ is given.
- Returns
0
- Return type
int, float
- Variables
Nd – initial value: Nd
Kd – initial value: Kd
Jd – initial value: Jd
ft_axes – initial value: None
- Example
>>> from pynufft import NUFFT >>> NufftObj = NUFFT() >>> NufftObj.plan(om, Nd, Kd, Jd)
or
>>> NufftObj.plan(om, Nd, Kd, Jd, ft_axes)
- selfadjoint(*args, **kwargs)
selfadjoint NUFFT (host code)
- Parameters
x – The input numpy array, with size=Nd
- Type
numpy array with dtype =numpy.complex64
- Returns
x: The output numpy array, with size=Nd
- Return type
numpy array with dtype =numpy.complex64
- solve(*args, **kwargs)
Solve NUFFT (host code) :param y: data, numpy.complex64. The shape = (M,) :param solver: ‘cg’, ‘L1TVOLS’, ‘lsmr’, ‘lsqr’, ‘dc’, ‘bicg’,
‘bicgstab’, ‘cg’, ‘gmres’,’lgmres’
- Parameters
maxiter (int) – the number of iterations
- Returns
numpy array with size Nd.
- class pynufft.nufft.__init__.NUFFT_cupy
A tentative torch interface
- Jd
initial value: ()
- __init__()
- __weakref__
list of weak references to the object (if defined)
- debug
initial value: 0
- plan(om, Nd, Kd, Jd)
Plan the NUFFT object with the geometry provided.
- Parameters
om (numpy.float array, matrix size = M * ndims) – The M off-grid locates in the frequency domain, which is normalized between [-pi, pi]
Nd (tuple, ndims integer elements.) –
The matrix size of the equispaced image. Example: Nd=(256,256) for a 2D image;
Nd = (128,128,128) for a 3D image
Kd (tuple, ndims integer elements.) –
The matrix size of the oversampled frequency grid. Example: Kd=(512,512) for 2D image;
Kd = (256,256,256) for a 3D image
Jd (tuple, ndims integer elements.) –
The interpolator size. Example: Jd=(6,6) for 2D image;
Jd = (6,6,6) for a 3D image
ft_axes (None, or tuple with optional integer elements.) – (Optional) The axes for Fourier transform. The default is all axes if ‘None’ is given.
batch – (Optional) Batch mode. If the batch is provided, the last appended axis is the number of identical NUFFT to be transformed. The default is ‘None’.
- Returns
0
- Return type
int, float
- Variables
Nd – initial value: Nd
Kd – initial value: Kd
Jd – initial value: Jd
ft_axes – initial value: None
- Example
>>> from pynufft import NUFFT >>> NufftObj = NUFFT() >>> NufftObj.plan(om, Nd, Kd, Jd)
or
>>> NufftObj.plan(om, Nd, Kd, Jd, ft_axes)
- class pynufft.nufft.__init__.NUFFT_tf_eager
A tentative torch interface
- Jd
initial value: ()
- __init__()
- __weakref__
list of weak references to the object (if defined)
- debug
initial value: 0
- plan(om, Nd, Kd, Jd)
Plan the NUFFT object with the geometry provided.
- Parameters
om (numpy.float array, matrix size = M * ndims) – The M off-grid locates in the frequency domain, which is normalized between [-pi, pi]
Nd (tuple, ndims integer elements.) –
The matrix size of the equispaced image. Example: Nd=(256,256) for a 2D image;
Nd = (128,128,128) for a 3D image
Kd (tuple, ndims integer elements.) –
The matrix size of the oversampled frequency grid. Example: Kd=(512,512) for 2D image;
Kd = (256,256,256) for a 3D image
Jd (tuple, ndims integer elements.) –
The interpolator size. Example: Jd=(6,6) for 2D image;
Jd = (6,6,6) for a 3D image
ft_axes (None, or tuple with optional integer elements.) – (Optional) The axes for Fourier transform. The default is all axes if ‘None’ is given.
batch – (Optional) Batch mode. If the batch is provided, the last appended axis is the number of identical NUFFT to be transformed. The default is ‘None’.
- Returns
0
- Return type
int, float
- Variables
Nd – initial value: Nd
Kd – initial value: Kd
Jd – initial value: Jd
ft_axes – initial value: None
- Example
>>> from pynufft import NUFFT >>> NufftObj = NUFFT() >>> NufftObj.plan(om, Nd, Kd, Jd)
or
>>> NufftObj.plan(om, Nd, Kd, Jd, ft_axes)
- class pynufft.nufft.__init__.NUFFT_torch
A tentative torch interface
- Jd
initial value: ()
- __init__()
- __weakref__
list of weak references to the object (if defined)
- debug
initial value: 0
- plan(om, Nd, Kd, Jd)
Plan the NUFFT object with the geometry provided.
- Parameters
om (numpy.float array, matrix size = M * ndims) – The M off-grid locates in the frequency domain, which is normalized between [-pi, pi]
Nd (tuple, ndims integer elements.) –
The matrix size of the equispaced image. Example: Nd=(256,256) for a 2D image;
Nd = (128,128,128) for a 3D image
Kd (tuple, ndims integer elements.) –
The matrix size of the oversampled frequency grid. Example: Kd=(512,512) for 2D image;
Kd = (256,256,256) for a 3D image
Jd (tuple, ndims integer elements.) –
The interpolator size. Example: Jd=(6,6) for 2D image;
Jd = (6,6,6) for a 3D image
ft_axes (None, or tuple with optional integer elements.) – (Optional) The axes for Fourier transform. The default is all axes if ‘None’ is given.
batch – (Optional) Batch mode. If the batch is provided, the last appended axis is the number of identical NUFFT to be transformed. The default is ‘None’.
- Returns
0
- Return type
int, float
- Variables
Nd – initial value: Nd
Kd – initial value: Kd
Jd – initial value: Jd
ft_axes – initial value: None
- Example
>>> from pynufft import NUFFT >>> NufftObj = NUFFT() >>> NufftObj.plan(om, Nd, Kd, Jd)
or
>>> NufftObj.plan(om, Nd, Kd, Jd, ft_axes)
- sparse_mx_to_torch_sparse_tensor(sparse_mx)
Convert a scipy sparse matrix to a torch sparse tensor.
- pynufft.nufft.__init__.push_cuda_context(hsa_method)
Decorator: Push cude context to the top of the stack for current use Add @push_cuda_context before the methods of NUFFT_device()
CPU solvers
- pynufft.linalg.solve_cpu.L1TVOLS(nufft, y, maxiter, rho)
L1-total variation regularized ordinary least square
- pynufft.linalg.solve_cpu._create_kspace_sampling_density(nufft)
Compute k-space sampling density
- pynufft.linalg.solve_cpu._pipe_density(nufft, maxiter)
Create the density function by iterative solution Generate pHp matrix
- pynufft.linalg.solve_cpu.cDiff(x, d_indx)
Compute image gradient, which needs the results of indxmap_diff(Nd) :param x: The image array :param d_indx: The index of the shifted image :type x: numpy.float array, matrix size = Nd :type d_indx: int32 :returns: x_diff: Image gradient determined by d_indx :rtype: x_diff: numpy.complex64
- pynufft.linalg.solve_cpu.solve(nufft, y, solver=None, *args, **kwargs)
Solve NUFFT. The current version supports solvers = ‘cg’ or ‘L1TVOLS’ or ‘L1TVLAD’.
- Parameters
nufft – NUFFT_cpu object
y – (M,) array, non-uniform data
- Returns
x: image
HSA solvers
- pynufft.linalg.solve_hsa.L1TVLAD(nufft, gy, maxiter, rho)
(testing) L1-total variation regularized least absolute deviation
- pynufft.linalg.solve_hsa.L1TVOLS(nufft, gy, maxiter, rho)
L1-total variation regularized ordinary least square
- pynufft.linalg.solve_hsa._create_kspace_sampling_density(nufft)
(stable) Compute k-space sampling density from the nufft object
- pynufft.linalg.solve_hsa._pipe_density(nufft, maxiter)
Private: create the density function in the data space by a iterative solution Pipe et al. 1999
- pynufft.linalg.solve_hsa.cDiff(x, d_indx)
(stable) Compute image gradient Work with indxmap_diff(Nd). …
- pynufft.linalg.solve_hsa.solve(nufft, gy, solver=None, maxiter=30, *args, **kwargs)
The solve function of NUFFT_hsa. The current version supports solvers = ‘cg’ or ‘L1TVOLS’.
- Parameters
nufft – NUFFT_hsa object
y (numpy.complex64 reikna array) – (M,) array, non-uniform data. If batch is provided, ‘cg’ and ‘L1TVOLS’ returns different image shape.
- Returns
x: Nd image. L1TVOLS always returns Nd. ‘cg’ returns Nd.
- Return type
x: reikna array, complex64.
NUDFT class
- class pynufft.linalg.nudft_cpu.NUDFT
The non-uniform DFT operator
- __init__()
Constructor.
- Parameters
None (Python NoneType) –
- Returns
NUFFT: the pynufft.NUDFT instance
- Return type
NUFFT: the pynufft.NUFFT class
- Example
>>> from pynufft import NUDFT >>> NufftObj = NUDFT()
- __weakref__
list of weak references to the object (if defined)
- debug
initial value: 0
Helper functions
- class pynufft.src._helper.helper.ELL(elldata, ellcol)
ELL is slow on a single core CPU
- __init__(elldata, ellcol)
- __weakref__
list of weak references to the object (if defined)
- pynufft.src._helper.helper.OMEGA_k(J, K, omd, Kd, dimid, dd, ft_flag)
Compute the index of k-space k_indx
- pynufft.src._helper.helper.QR_process(om, N, J, K, sn)
1D QR method for generating min-max interpolator
- Parameters
om (numpy.float32) – non-Cartesian coordinate. shape = (M, dims)
N (int) – length
J (int) – size of interpolator
K (int) – size of oversampled grid
sn (numpy.float32 shape = (N,)) – scaling factor as a length-N vector
- class pynufft.src._helper.helper.Tensor_sn(snd, radix)
Not implemented:
- __init__(snd, radix)
- __weakref__
list of weak references to the object (if defined)
- pynufft.src._helper.helper.block_outer_prod(x1, x2)
Multiply x1 (J1 x M) and x2 (J2xM) and extend the dimensions to 3D (J1xJ2xM)
- pynufft.src._helper.helper.block_outer_sum(x1, x2)
Update the new index after adding a new axis
- pynufft.src._helper.helper.block_outer_sum0(x1, x2)
Multiply x1 (J1 x M) and x2 (J2xM) and extend the dimensions to 3D (J1xJ2xM)
- pynufft.src._helper.helper.cat_snd(snd)
- Parameters
snd (tuple) – tuple of input 1D vectors
- Returns
tensor_sn: vector of concatenated scaling factor, shape = (numpy.sum(Nd), )
- Return type
tensor_sn: numpy.float32
- pynufft.src._helper.helper.create_laplacian_kernel(nufft)
Create the multi-dimensional laplacian kernel in k-space
- Parameters
nufft – the NUFFT object
- Returns
uker: the multi-dimensional laplacian kernel in k-space (no fft shift used)
- Return type
numpy ndarray
- pynufft.src._helper.helper.create_partialELL(ud, kd, Jd, M)
- Parameters
ud (tuple of numpy.complex64 arrays) – tuple of all 1D interpolators
kd (tuple of numpy.int32 arrays) – tuple of all 1D indices
Jd (tuple of int32) – tuple of interpolation sizes
M (int) – number of samples
- Returns
partialELL:
- Return type
partialELL: pELL instance
- pynufft.src._helper.helper.crop_slice_ind(Nd)
(Deprecated in v.0.3.4) Return the “slice” of Nd size to index multi-dimensional array. “Slice” functions as the index of the array. This function is superseded by preindex_copy(), which avoid run-time indexing.
- pynufft.src._helper.helper.device_list()
device_list() returns available devices for acceleration as a tuple. If no device is available, it returns an empty tuple.
- pynufft.src._helper.helper.diagnose(verbosity=0)
Diagnosis function Find available devices when NUFFT.offload() fails.
- pynufft.src._helper.helper.get_sn(J, K, N)
Compute the 1D scaling factor for the given J, K, N
- Parameters
J (int) – size of interpolator
K (int) – size of oversampled grid
N (int) – length
- Returns
sn: scaling factor as a length-N vector
- Rtype sn
numpy.float32 shape = (N,)
- pynufft.src._helper.helper.indxmap_diff(Nd)
Preindixing for rapid image gradient. Diff(x) = x.flat[d_indx[0]] - x.flat Diff_t(x) = x.flat[dt_indx[0]] - x.flat
- Parameters
Nd (tuple with integers) – the dimension of the image
- Returns
d_indx: image gradient
- Returns
dt_indx: the transpose of the image gradient
- Return type
d_indx: lists with numpy ndarray
- Return type
dt_indx: lists with numpy ndarray
- pynufft.src._helper.helper.kaiser_bessel_ft(u, J, alpha, kb_m, d)
Interpolation weight for given J/alpha/kb-m
- pynufft.src._helper.helper.kronecker_scale(snd)
Compute the Kronecker product of the scaling factor.
- Parameters
snd (tuple of 1D numpy.array) – Tuple of 1D scaling factors
dd – Number of dimensions
- Returns
sn: The multi-dimensional Kronecker of the scaling factors
- Return type
Nd array
- pynufft.src._helper.helper.nufft_T(N, J, K, alpha, beta)
Equation (29) and (26) in Fessler and Sutton 2003. Create the overlapping matrix CSSC (diagonal dominant matrix) of J points, then find the pseudo-inverse of CSSC
- pynufft.src._helper.helper.nufft_alpha_kb_fit(N, J, K)
Find parameters alpha and beta for scaling factor st[‘sn’] The alpha is hardwired as [1,0,0…] when J = 1 (uniform scaling factor)
- Parameters
N (int) – size of image
J (int) – size of interpolator
K (int) – size of oversampled k-space
- Returns
alphas:
- Returns
beta:
- Return type
alphas: list of float
- Return type
beta:
- pynufft.src._helper.helper.nufft_offset(om, J, K)
For every om point (outside regular grids), find the nearest central grid (from Kd dimension)
- pynufft.src._helper.helper.nufft_r(om, N, J, K, alpha, beta)
Equation (30) of Fessler & Sutton’s paper
- pynufft.src._helper.helper.nufft_scale1(N, K, alpha, beta, Nmid)
Calculate image space scaling factor
- pynufft.src._helper.helper.outer_sum(xx, yy)
Superseded by numpy.add.outer() function
- class pynufft.src._helper.helper.pELL(M, Jd, curr_sumJd, meshindex, kindx, udata)
class pELL: partial ELL format
- __init__(M, Jd, curr_sumJd, meshindex, kindx, udata)
Constructor
- Parameters
M (int) – Number of samples
Jd (tuple of int) – Interpolator size
curr_sumJd (tuple of int) – Summation of Jd[0:d-1], for fast shift computing
meshindex (numpy.uint32, shape = (numpy.prod(Jd), dd)) – The tensor indices to all interpolation points
kindx (numpy.uint32, shape = (M, numpy.sum(Jd))) – Premixed k-indices to be combined
udata (numpy.complex64, shape = (M, numpy.sum(Jd))) – Premixed interpolation data values
- Returns
pELL: partial ELLpack class with the given values
- Return type
pELL: partial ELLpack class
- __weakref__
list of weak references to the object (if defined)
- pynufft.src._helper.helper.plan(om, Nd, Kd, Jd, ft_axes=None, format='CSR', radix=None)
Plan for the NUFFT object.
- Parameters
om (numpy.float) – Coordinate
Nd (tuple of int) – Image shape
Kd (tuple of int) – Oversampled grid shape
Jd (tuple of int) – Interpolator size
ft_axes (tuple of int) – Axes where FFT takes place
format (string, 'CSR' or 'pELL') – Output format of the interpolator. ‘CSR’: the precomputed Compressed Sparse Row (CSR) matrix. ‘pELL’: partial ELLPACK which precomputes the concatenated 1D interpolators.
- Return st
dictionary for NUFFT
- pynufft.src._helper.helper.preindex_copy(Nd, Kd)
Building the array index for copying two arrays of sizes Nd and Kd. Only the front parts of the input/output arrays are copied. The oversize parts of the input array are truncated (if Nd > Kd), and the smaller size are zero-padded (if Nd < Kd)
- Parameters
Nd (tuple with integer elements) – tuple, the dimensions of array1
Kd (tuple with integer elements) – tuple, the dimensions of array2
- Returns
inlist: the index of the input array
- Returns
outlist: the index of the output array
- Returns
nelem: the length of the inlist and outlist (equal length)
- Return type
inlist: list with integer elements
- Return type
outlist: list with integer elements
- Return type
nelem: int
- pynufft.src._helper.helper.rdx_kron(ud, kd, Jd, radix=None)
Radix-n Kronecker product of multi-dimensional array
- Parameters
ud (tuple of (M, Jd[d]) numpy.complex64 arrays) – 1D interpolators
kd (tuple of (M, Jd[d]) numpy.uint arrays) – 1D indices to interpolators
Jd (tuple of int) – 1D interpolator sizes
radix (int) – radix of Kronecker product
kk (tuple of (M, Jd[d]) numpy.uint arrays) – 1D indices to interpolators
JJ (tuple of int) – 1D interpolator sizes
- Returns
uu: 1D interpolators
- pynufft.src._helper.helper.strides_divide_itemsize(Nd)
strides_divide_itemsize function computes the step_size (strides/itemsize) along different axes, and its inverse as float32. For fast GPU computing, preindexing allows for fast Hadamard product and copy. However preindexing costs some memory. strides_divide_itemsize aims to replace preindexing by run-time calculation of the index, given the invNd_elements.
- Parameters
Nd (tuple of int) – Input shape
- Returns
Nd_elements: strides/itemsize of the Nd.
- Returns
invNd_elements: (float32)(1/Nd_elements). Division on GPU is slow but multiply is fast. Thus we can precompute the inverse and then multiply the inverse on GPU.
- Return type
Nd_elements: tuple of int
- Return type
invNd_elements: tuple of float32
See also
pynufft.NUFFT_hsa
Metaprogramming subroutines (using reikna, pyopencl, pycuda)
- pynufft.src.re_subroutine.atomic_add(API)
Return the atomic_add for the given API. Overcome the missing atomic_add_float for OpenCL-1.2. Note: will be checked if OpenCL 2.0 provided by all GPU vendors.
- pynufft.src.re_subroutine.cAddScalar()
Return the kernel source for cAddScalar.
- pynufft.src.re_subroutine.cAddVec()
Return the kernel source for cAddVec.
- pynufft.src.re_subroutine.cAnisoShrink()
Return the kernel source of cAnisoShrink
- pynufft.src.re_subroutine.cCopy()
Return the kernel source for cCopy
- pynufft.src.re_subroutine.cDiff()
Return the kernel source of cDiff.
- pynufft.src.re_subroutine.cHadamard()
Return the Hadamard operations related kernel sources.
- pynufft.src.re_subroutine.cHypot()
Return the kernel code for hypot, which computes the sqrt(x*x + y*y) without intermediate overflow.
- pynufft.src.re_subroutine.cMultiplyConjVec()
Return the kernel source of cMultiplyConjVec.
- pynufft.src.re_subroutine.cMultiplyConjVecInplace()
Return the kernel source of cMultiplyConjVecInplace
- pynufft.src.re_subroutine.cMultiplyRealInplace()
Return the kernel source of cMultiplyRealInplace.
- pynufft.src.re_subroutine.cMultiplyScalar()
Return the kernel source for cMultiplyScalar.
- pynufft.src.re_subroutine.cMultiplyVec()
Return the kernel source of cMultiplyVec
- pynufft.src.re_subroutine.cMultiplyVecInplace()
Return the kernel source of cMultiplyVecInplace.
- pynufft.src.re_subroutine.cRealShrink()
Return the kernel source of xAnisoShrink
- pynufft.src.re_subroutine.cSelect()
Return the kernel source of cSelect.
- pynufft.src.re_subroutine.cSpmv()
Return the kernel sources for cSpmv related operations, providing cCSR_spmv_vector and cpELL_spmv_mCoil.
- pynufft.src.re_subroutine.cSpmvh()
Return the cSpmvh related kernel source. Only pELL_spmvh_mCoil is provided for Spmvh. NUFFT_hsa_legacy reuses the cCSR_spmv() function, which doubles the storage.
- pynufft.src.re_subroutine.cSqrt()
Return the kernel source of cSqrt.
- pynufft.src.re_subroutine.cTensorCopy()
Return the kernel source for cTensorCopy.
- pynufft.src.re_subroutine.cTensorMultiply()
Return the kernel source for cTensorMultiply
- pynufft.src.re_subroutine.create_kernel_sets(API)
Create the kernel from the kernel sets. Note that in some tests (Benoit’s and my tests) CUDA shows some degraded accuracy. This loss of accuracy was due to undefined shared memory behavior, which I don’t fully understand. This has been fixed in 2019.2.0 as the operations are moved to global memory.
Version history
v2022.2.4rc1
Explain the scaling factor for NUFFT.adjoint in the document.
v2022.2.3
Same as v2022.2.3rc2
v2022.2.3-rc2
Add experimental NUFFT_cupy for cupy backend.
Change scipy.linalg.pinv2 (deprecate) to pinv. Track for possible errors.
v2022.2.3-rc1
Add experimental NUFFT_torch, NUFFT_tf
v2022.2.2
Maintenance release
v2022.2.1
Maintenance release
v2022.1.1
some typos fixed.
move to Python 3.10
update installation for MSVC 2022 and Win 10
v2021.2.0
typo fixed: ocl_add() imaglptr -> imagptr
v2021.1.0
removed twosum algorithm.
v2020.2.1
tested with an AMD card.
Deprecate batch as this is not the most essential function in NUFFT.
Deprecate NUFFT_cpu, NUFFT_hsa
Update some docs.
v2020.1.2
new_index: change the khatri_rao_k(), OMEGA_k() in helper.py; col += kindx[index];// + 1
Add code of conduct of Contributor Covenant v2.0
Update some documentations
v2020.1.0
add batch mode to nudft_cpu
v2020.0.0
fix batch=1. This can cause error in Riekna fft.
v2019.2.3
-(experimental) Add the unified NUFFT() class. Now CPU, GPU, and legacy GPU mode are encapsuled in a single class.
-Tested using Intel Neo OpenCL driver (see https://github.com/intel/compute-runtime) and IntelPython3.
-The old NUFFT_cpu() and NUFFT_legacy() will be kept in the system for compatibility.
v2019.2.1-2019.2.2
-Remove lsmr as scipy 1.13 has caused unknown error.
v2019.2.0
Bump
v2019.1.2
BUGFIX: fix the loss of accuracy in cSpmvh(). Replace the group/local by global memory (the group/local sizes have caused the unknown run-time behaviour on cuda)
v2019.1.1
Refactor the re_subroutine.py
Adopt tensor form
v0.4.0.0
0.4.0.0 is a beta version.
Major updates for the NUFFT_hsa class, including memory reduction and split-radix. Multiple NUFFT_hsa() using cuda backend becomes possible, by pushing the context to the top of the stack when a method is called.
Tested in Windows 10 with PyCUDA 2018.1.1, nvidia-driver 417.35, CUDA 9.2, Visual Studio 2015 Community, and Anaconda Python 3.7 64-bit. PyOpenCL in Windows is yet to be tested.
Add batch mode.
v0.3.3.12
0.3.3.12 is a bug-fixed version.
Removal of the keyword async for compatibility reasons because Reikna has changed the keyword to async_.
v0.3.3.8
Bugfix: mm = numpy.tile(mm, [numpy.prod(Jd).astype(int), 1]) to fix the wrong type when numpy.prod(Jd) is not cast as int
Bugfix: fix rcond=None error in Anaconda 3.6.5 and Numpy 1.13.1 (the recommended None in Numpy 1.14 is backward incompatible with 1.13)
Bugfix: indx1 = indx.copy() is replaced by indx1 = list(indx) for Python2 compatibility
v0.3.3.7
Bugfix in 0.3.3.7 Toeplitz is removed from the NUFFT_cpu and NUFFT_gpu to avoid the MemoryError.
v0.3.3.6
Bugfix: correct the error of import. Now import NUFFT_cpu, NUFFT_hsa at the top level.
v0.3.3
Note: GPU support is superseded by Heterogeneous System Architecture (HSA).
A variety of nonlinear solvers are provided, including the conjugate gradient method (cg), L1 total-variation ordinary least square (L1TVOLS), and L1 total-variation least absolute deviation (L1TVLAD).
The CPU version support other nonlinear solvers, lsqr, gmr, cg, bicgstab, bicg, cgs, gmres, lgmres , apart from cg, L1TVOLS and L1TVLAD.
Support multi-dimensional transform and reconstruction (experimentally).
v0.3.2.9
Experimental support of NVIDIA’s graphic processing unit (GPU).
The experimental class gpuNUFFT requires pycuda, scikit-cuda, and python-cuda-cffi. scikit-cuda can be installed from standard command.
v0.3.2.8
Tested under Linux and Windows Anaconda3
v0.3
Updated setup.py
Removal of pyfftw due to segfault under some Linux distributions
Acknowledgements
I would be more than grateful for what contributors have done (either contributing codes or fixing bugs in pynufft). However, The information of contributors and partners are kept anonymized without their prior express informed consent. If anyone would like to be identified as a contributors/partners, please contact pynufft@gmail.com.
Financial supports
Cambridge Commonwealth, European and International Trust (Cambridge, UK)
Ministry of Education (Taiwan)
NVIDIA Corp for donating Titan X Pascal.
The * project (The *, * and *)
The * project
Contributors and partners
R******
T******
H******
J******
W*****
I******
A*****
A*****
B*****
D******
M*****
C******
C*****
J*****
J******