Diffractionless Tomography

Classical tomography is based on the assumption that the illuminating light travels in straight lines through the object. While this assumption is only valid for waves of very short wavelength such as, for example, x-rays, it has also been extensively used as an approximation for the light in the visible spectrum.

Radon Transform

The Radon transform describes the relationship between a D-dimensional function and its projection onto a D-1 dimensional hyperplane. In practice, we are often interested in the cases when D = 2 or D = 3, which correspond to the projection of a two-dimensional (2D) surface onto a line or of a three-dimensional (3D) volume onto a plane, respectively. Generally, it is sufficient to consider the problem in 2D, since, the 3D case can be solved by decomposing it into a succession of 2D slices.

Consider the tomographic problem illustrated in the following figure.


Left: Schematic representation of the tomographic scenario. Right: Sinogram of the Shepp-Logan phantom
Left: Schematic representation of the tomographic scenario. Right: Sinogram of the Shepp-Logan phantom

Let \Omega \in \mathbb{R}^2 be a bounded, 2D domain which contains an object characterized by a function f(\boldsymbol{x}) with \boldsymbol{x} = (x, y). We assume that f varies within \Omega, but is zero outside of it. We probe the object by illuminating it with parallel, indefinitely thin beams of light that travel in the direction making an angle \theta \in [0, \pi] with respect to the y-axis. We collect measurements along the t-axis, orthogonal to the direction of light propagation. Our objective is to reconstruct the object f given the measurements p_\theta(t) for \theta \in [0, \pi] and t \in \mathbb{R}. This measurement problem can be described mathematically as

p_\theta(t) = \mathcal{R}\{f\}(t, \theta) =\int_{\mathbb{R}^2} f(\boldsymbol{x}) \, \delta (\langle \boldsymbol{x}, \boldsymbol{\theta} \rangle - t ) \mathrm{d} \boldsymbol{x},

where \delta is the Dirac distribution, \boldsymbol{\theta} = (\cos \theta, \sin \theta) is a unit vector.  We can rewrite the equation by using the following variable change

\begin{pmatrix}x\\y\end{pmatrix} =\begin{bmatrix}\cos\theta&-\sin\theta\\\sin\theta&\cos\theta\end{bmatrix}\begin{pmatrix}t\\s\end{pmatrix}


p_\theta(t) =\int_\mathbb{R} f(t \cos \theta - s \sin \theta, t \sin \theta + s \cos \theta) \mathrm{d} s.

The operator \mathcal{R}: L_2(\mathbb{R}^2) \rightarrow L_2(\mathbb{R} \times [0, \pi]) is called the Radon transform after Johann Radon (1887-1956) who introduced it in 1917. The data \{p_\theta(t)\}_{\theta \in [0, \pi], t \in \mathbb{R}} generated by the transform is often called the sinogram, since it maps a point into a sinusoid. To see this consider a point \boldsymbol{x}_0 = (x_0, y_0), whose projection onto the t-axis will be t_0(\theta) = x_0 \cos \theta + y_0 \sin \theta, which can be represented in polar coordinates (r, \phi), with x_0 = r \cos \phi and y_0 = r \sin \phi, as t_0(\theta) = r \cos (\theta-\phi). Hence, the trajectory of the point \boldsymbol{x}_0 is a cosine of radius r and shift \phi.

Fourier Slice Theorem

Fourier slice theorem is one of the most important results from classical tomography, which establishes the connection between the data p_\theta(t) and the function f(\boldsymbol{x}) in Fourier space.

Theorem: The 1D Fourier transform of the projected data \hat{p}_\theta(\omega) = \mathcal{F}_{\text{1D}}\{p_\theta(t)\} corresponds to a slice of the 2D Fourier transform of the function \hat{f}(\boldsymbol{\omega}) = \mathcal{F}_{\text{2D}}\{f(\boldsymbol{x})\}, with \boldsymbol{\omega} = (\omega_x, \omega_y), taken along the angle \theta through the origin, or more concisely

\hat{p}_\theta(\omega) = \hat{f}(\omega \cos \theta, \omega \sin \theta).

To establish this result, we consider the Fourier transform

\hat{p}_\theta(\omega) &= \int_\mathbb{R} p_\theta(t) \mathrm{e}^{-\mathrm{j} \omega t} \mathrm{d} t \\
&= \int_\mathbb{R} \left [\int_{\mathbb{R}^2} f(\boldsymbol{x}) \delta(\langle \boldsymbol{x}, \boldsymbol{\theta} \rangle - t) \mathrm{d} \boldsymbol{x} \right] \mathrm{e}^{-\mathrm{j} \omega t} \mathrm{d} t \\
&= \int_{\mathbb{R}^2} f(\boldsymbol{x}) \mathrm{e}^{- \mathrm{j} \omega \langle \boldsymbol{x}, \boldsymbol{\theta} \rangle } \mathrm{d} \boldsymbol{x} \\
&= \hat{f}(\omega \cos \theta, \omega \sin \theta),

where in the second equality use used the definition of the Radon transform.

The remarkable aspect of the Fourier Slice Theorem is that it allows one to invert the Radon transform by (a) collecting measurements for all \theta \in [0, \pi], (b) evaluating the 1D Fourier transform of these measurements and using them to fill the 2D Fourier space, and (c) evaluating inverse 2D Fourier transform to form the image.


We now define the backprojection operator \mathcal{R}^\ast: L_2(\mathbb{R} \times [0, \pi]) \rightarrow L_2(\mathbb{R}^2)

b(\boldsymbol{x}) = \mathcal{R}^\ast\{p_\theta(t)\}(\boldsymbol{x}) = \int_0^\pi p_\theta (x \cos \theta + y \sin \theta) \mathrm{d} \theta

which is in fact the adjoint of the Radon transform, i.e., for all p \in L_2(\mathbb{R} \times [0, \pi]) and for all f \in L_2(\mathbb{R}^2), we have that

\langle p, \mathcal{R} f\rangle_{L_2(\mathbb{R} \times [0, \pi])} = \langle \mathcal{R}^\ast p, f\rangle_{L_2(\mathbb{R}^2)}.

Intuitively, for each point \boldsymbol{x}, backprojection collects all the projections that passed through that point.

Filtered Backprojection

Consider the following inverse Fourier transform

&= \frac{1}{(2\pi)^2} \int_{\mathbb{R}^2} \hat{f}(\boldsymbol{\omega}) \mathrm{e}^{\mathrm{j} \langle \boldsymbol{x}, \boldsymbol{\omega} \rangle} \mathrm{d} \boldsymbol{\omega} \\
&= \frac{1}{(2\pi)^2} \int_0^\pi \int_\mathbb{R} \hat{f}(\omega \cos \theta, \omega \sin \theta) \mathrm{e}^{\mathrm{j} \omega \langle \boldsymbol{x}, \boldsymbol{\theta} \rangle} |\omega| \mathrm{d} \omega \mathrm{d} \theta\\
&= \frac{1}{(2\pi)^2} \int_0^\pi \int_\mathbb{R} |\omega| \hat{p}_\theta(\omega) \mathrm{e}^{\mathrm{j} \omega \langle \boldsymbol{x}, \boldsymbol{\theta} \rangle} \mathrm{d} \omega \mathrm{d} \theta \\
&= \int_0^\pi \left[\frac{1}{2\pi}\int_\mathbb{R} \frac{|\omega|}{2\pi} \, \hat{p}_\theta(\omega) \mathrm{e}^{\mathrm{j} \omega \langle \boldsymbol{x}, \boldsymbol{\theta} \rangle} \mathrm{d} \omega\right] \mathrm{d} \theta \\
&= \int_0^\pi q_\theta(x \cos \theta + y \sin \theta) \mathrm{d} \theta
where we performed the variable change \omega_x = \omega \cos \theta and \omega_y = \omega \sin \theta, which gives the Jacobian J(\omega, \theta) = \omega. This implies that it is possible to reconstruct the function f by 1D filtering of the Radon data with

h(t) = \mathcal{F}^{-1}\left\{\frac{|\omega|}{2\pi}\right\}(t)

and then applying the backprojection operator \mathcal{R}^\ast to the result q_\theta(t) = (h \ast p_\theta)(t). To read the PDF version of this post click here.

A simple ISTA ↔ FISTA switch

Today, let us revisit the topic of enforcing sparsity and see an easy trick to accelerate the convergence speed of the algorithm (demo).

The basic algorithm that we discussed last time is the iterative shrinkage/thresholding algorithm (ISTA) that can be specified as follows

\mathbf{f}^t \leftarrow \eta(\mathbf{f}^{t-1} - \gamma \mathbf{H}^\mathrm{T}(\mathbf{H}\mathbf{f}^{t-1}-\mathbf{y}), \gamma \tau),

where t = 1, 2, 3, \dots is the iteration number, \mathbf{y} is the measurement vector, \mathbf{H} is the measurement matrix that models the acquisition system, \gamma > 0 is a step-size of the algorithm that we can always set to the inverse of the largest eigenvalue of \mathbf{H}^\mathrm{T}\mathbf{H} to ensure convergence (i.e., set \gamma = 1/L with L = \lambda_{\text{max}}(\mathbf{H}^\mathrm{T}\mathbf{H})), \tau > 0 is the regularization parameter that controls the sparsity of the final solution (larger \tau leads to a sparser solution), and finally \eta is a scalar thresholding function applied in a component-wise fashion. One of the most popular thresholding functions is the soft-thresholding defined as

\eta(x, \tau) \triangleq \mathrm{sgn}(x)(|x|-\tau)_{+}

where (\cdot)_+ returns the positive part of its argument, and \mathrm{sgn}(\cdot) is a signum function that returns +1 if its argument is positive and -1 when it is negative.

ISTA is a very well understood method, and it is well known that its rate of convergence corresponds to that of the gradient-descent method, which is O(1/t).

Let us considering the following simple iteration

\mathbf{f}^t \leftarrow \eta(\mathbf{s}^{t-1} - \gamma \mathbf{H}^\mathrm{T}(\mathbf{H}\mathbf{s}^{t-1}-\mathbf{y}), \gamma \tau)

\mathbf{s}^t \leftarrow \mathbf{f}^{t} + ((q_{t-1}-1)/q_t) (\mathbf{f}^t - \mathbf{f}^{t-1}),

where \mathbf{f}^0 = \mathbf{s}^0 = \mathbf{f}_{\text{init}}. When q_t = 1 for all t = 1, 2, 3, \dots the iteration corresponds to ISTA, however, an appropriate choice of\{q_t\}_{t \in [0, 1, \dots]} leads to a faster O(1/t^2)  convergence, which is crucial for larger scale problems where one tries to reduce the amount of matrix-vector products with \mathbf{H} and \mathbf{H}^\mathrm{T}. The faster version of ISTA was originally proposed by Beck & Teboulle 2009 and is widely known as fast ISTA (FISTA).

So, what is that appropriate choice of \{q_t\}_{t \in [0, 1, \dots]}?

Beck & Teboulle proposed to initialize q_0 = 1 and then setting the rest iteratively as follows

q_t \leftarrow \frac{1}{2}\left(1 +\sqrt{1+4q_{t-1}^2}\right).

A short note on the Python demo. It was done as an IPython notebook in a fully self-contained way. The first two cells create and save two files matrixmodel.py and algorithm.py that are re-usable as stand-alone files.

The switch from ISTA to FISTA is done in cell 8:

# Reconstruct with ISTA
[fhatISTA, costISTA] = fistaEst(y, forwardObj, tau, numIter, accelerate=False)

# Reconstruct with FISTA 
[fhatFISTA, costFISTA] = fistaEst(y, forwardObj, tau, numIter, accelerate=True)

The output of the demo is the following figure plotting the results of both algorithms within 100 iterations:

Comaprison of ISTA with FISTA
Comaprison of ISTA with FISTA