# Diffractionless Tomography

Classical tomography is based on the assumption that the illuminating light travels in straight lines through the object. While this assumption is only valid for waves of very short wavelength such as, for example, x-rays, it has also been extensively used as an approximation for the light in the visible spectrum.

The Radon transform describes the relationship between a $D$-dimensional function and its projection onto a $D-1$ dimensional hyperplane. In practice, we are often interested in the cases when $D = 2$ or $D = 3$, which correspond to the projection of a two-dimensional (2D) surface onto a line or of a three-dimensional (3D) volume onto a plane, respectively. Generally, it is sufficient to consider the problem in 2D, since, the 3D case can be solved by decomposing it into a succession of 2D slices.

Consider the tomographic problem illustrated in the following figure.

Let $\Omega \in \mathbb{R}^2$ be a bounded, 2D domain which contains an object characterized by a function $f(\boldsymbol{x})$ with $\boldsymbol{x} = (x, y)$. We assume that $f$ varies within $\Omega$, but is zero outside of it. We probe the object by illuminating it with parallel, indefinitely thin beams of light that travel in the direction making an angle $\theta \in [0, \pi]$ with respect to the $y$-axis. We collect measurements along the $t$-axis, orthogonal to the direction of light propagation. Our objective is to reconstruct the object $f$ given the measurements $p_\theta(t)$ for $\theta \in [0, \pi]$ and $t \in \mathbb{R}$. This measurement problem can be described mathematically as

where $\delta$ is the Dirac distribution, $\boldsymbol{\theta} = (\cos \theta, \sin \theta)$ is a unit vector.  We can rewrite the equation by using the following variable change

as

The operator $\mathcal{R}: L_2(\mathbb{R}^2) \rightarrow L_2(\mathbb{R} \times [0, \pi])$ is called the Radon transform after Johann Radon (1887-1956) who introduced it in 1917. The data $\{p_\theta(t)\}_{\theta \in [0, \pi], t \in \mathbb{R}}$ generated by the transform is often called the sinogram, since it maps a point into a sinusoid. To see this consider a point $\boldsymbol{x}_0 = (x_0, y_0)$, whose projection onto the $t$-axis will be $t_0(\theta) = x_0 \cos \theta + y_0 \sin \theta$, which can be represented in polar coordinates $(r, \phi)$, with $x_0 = r \cos \phi$ and $y_0 = r \sin \phi$, as $t_0(\theta) = r \cos (\theta-\phi)$. Hence, the trajectory of the point $\boldsymbol{x}_0$ is a cosine of radius $r$ and shift $\phi$.

#### Fourier Slice Theorem

Fourier slice theorem is one of the most important results from classical tomography, which establishes the connection between the data $p_\theta(t)$ and the function $f(\boldsymbol{x})$ in Fourier space.

Theorem: The 1D Fourier transform of the projected data $\hat{p}_\theta(\omega) = \mathcal{F}_{\text{1D}}\{p_\theta(t)\}$ corresponds to a slice of the 2D Fourier transform of the function $\hat{f}(\boldsymbol{\omega}) = \mathcal{F}_{\text{2D}}\{f(\boldsymbol{x})\}$, with $\boldsymbol{\omega} = (\omega_x, \omega_y)$, taken along the angle $\theta$ through the origin, or more concisely

To establish this result, we consider the Fourier transform

\begin{align*}
\hat{p}_\theta(\omega) &= \int_\mathbb{R} p_\theta(t) \mathrm{e}^{-\mathrm{j} \omega t} \mathrm{d} t \\
&= \int_\mathbb{R} \left [\int_{\mathbb{R}^2} f(\boldsymbol{x}) \delta(\langle \boldsymbol{x}, \boldsymbol{\theta} \rangle - t) \mathrm{d} \boldsymbol{x} \right] \mathrm{e}^{-\mathrm{j} \omega t} \mathrm{d} t \\
&= \int_{\mathbb{R}^2} f(\boldsymbol{x}) \mathrm{e}^{- \mathrm{j} \omega \langle \boldsymbol{x}, \boldsymbol{\theta} \rangle } \mathrm{d} \boldsymbol{x} \\
&= \hat{f}(\omega \cos \theta, \omega \sin \theta),
\end{align*}

where in the second equality use used the definition of the Radon transform.

The remarkable aspect of the Fourier Slice Theorem is that it allows one to invert the Radon transform by (a) collecting measurements for all $\theta \in [0, \pi]$, (b) evaluating the 1D Fourier transform of these measurements and using them to fill the 2D Fourier space, and (c) evaluating inverse 2D Fourier transform to form the image.

#### Backprojection

We now define the backprojection operator $\mathcal{R}^\ast: L_2(\mathbb{R} \times [0, \pi]) \rightarrow L_2(\mathbb{R}^2)$

which is in fact the adjoint of the Radon transform, i.e., for all $p \in L_2(\mathbb{R} \times [0, \pi])$ and for all $f \in L_2(\mathbb{R}^2)$, we have that

Intuitively, for each point $\boldsymbol{x}$, backprojection collects all the projections that passed through that point.

#### Filtered Backprojection

Consider the following inverse Fourier transform

\begin{align*}
f(\boldsymbol{x})
&= \frac{1}{(2\pi)^2} \int_{\mathbb{R}^2} \hat{f}(\boldsymbol{\omega}) \mathrm{e}^{\mathrm{j} \langle \boldsymbol{x}, \boldsymbol{\omega} \rangle} \mathrm{d} \boldsymbol{\omega} \\
&= \frac{1}{(2\pi)^2} \int_0^\pi \int_\mathbb{R} \hat{f}(\omega \cos \theta, \omega \sin \theta) \mathrm{e}^{\mathrm{j} \omega \langle \boldsymbol{x}, \boldsymbol{\theta} \rangle} |\omega| \mathrm{d} \omega \mathrm{d} \theta\\
&= \frac{1}{(2\pi)^2} \int_0^\pi \int_\mathbb{R} |\omega| \hat{p}_\theta(\omega) \mathrm{e}^{\mathrm{j} \omega \langle \boldsymbol{x}, \boldsymbol{\theta} \rangle} \mathrm{d} \omega \mathrm{d} \theta \\
&= \int_0^\pi \left[\frac{1}{2\pi}\int_\mathbb{R} \frac{|\omega|}{2\pi} \, \hat{p}_\theta(\omega) \mathrm{e}^{\mathrm{j} \omega \langle \boldsymbol{x}, \boldsymbol{\theta} \rangle} \mathrm{d} \omega\right] \mathrm{d} \theta \\
&= \int_0^\pi q_\theta(x \cos \theta + y \sin \theta) \mathrm{d} \theta
\end{align*}
where we performed the variable change $\omega_x = \omega \cos \theta$ and $\omega_y = \omega \sin \theta$, which gives the Jacobian $J(\omega, \theta) = \omega$. This implies that it is possible to reconstruct the function $f$ by 1D filtering of the Radon data with

and then applying the backprojection operator $\mathcal{R}^\ast$ to the result $q_\theta(t) = (h \ast p_\theta)(t)$. To read the PDF version of this post click here.

# A simple ISTA ↔ FISTA switch

Today, let us revisit the topic of enforcing sparsity and see an easy trick to accelerate the convergence speed of the algorithm (demo).

The basic algorithm that we discussed last time is the iterative shrinkage/thresholding algorithm (ISTA) that can be specified as follows

where $t = 1, 2, 3, \dots$ is the iteration number, $\mathbf{y}$ is the measurement vector, $\mathbf{H}$ is the measurement matrix that models the acquisition system, $\gamma > 0$ is a step-size of the algorithm that we can always set to the inverse of the largest eigenvalue of $\mathbf{H}^\mathrm{T}\mathbf{H}$ to ensure convergence (i.e., set $\gamma = 1/L$ with $L = \lambda_{\text{max}}(\mathbf{H}^\mathrm{T}\mathbf{H})$), $\tau > 0$ is the regularization parameter that controls the sparsity of the final solution (larger $\tau$ leads to a sparser solution), and finally $\eta$ is a scalar thresholding function applied in a component-wise fashion. One of the most popular thresholding functions is the soft-thresholding defined as

where $(\cdot)_+$ returns the positive part of its argument, and $\mathrm{sgn}(\cdot)$ is a signum function that returns $+1$ if its argument is positive and $-1$ when it is negative.

ISTA is a very well understood method, and it is well known that its rate of convergence corresponds to that of the gradient-descent method, which is $O(1/t)$.

Let us considering the following simple iteration

where $\mathbf{f}^0 = \mathbf{s}^0 = \mathbf{f}_{\text{init}}$. When $q_t = 1$ for all $t = 1, 2, 3, \dots$ the iteration corresponds to ISTA, however, an appropriate choice of$\{q_t\}_{t \in [0, 1, \dots]}$ leads to a faster $O(1/t^2)$  convergence, which is crucial for larger scale problems where one tries to reduce the amount of matrix-vector products with $\mathbf{H}$ and $\mathbf{H}^\mathrm{T}$. The faster version of ISTA was originally proposed by Beck & Teboulle 2009 and is widely known as fast ISTA (FISTA).

So, what is that appropriate choice of $\{q_t\}_{t \in [0, 1, \dots]}$?

Beck & Teboulle proposed to initialize $q_0 = 1$ and then setting the rest iteratively as follows

A short note on the Python demo. It was done as an IPython notebook in a fully self-contained way. The first two cells create and save two files matrixmodel.py and algorithm.py that are re-usable as stand-alone files.

The switch from ISTA to FISTA is done in cell 8:

# Reconstruct with ISTA
[fhatISTA, costISTA] = fistaEst(y, forwardObj, tau, numIter, accelerate=False)

# Reconstruct with FISTA
[fhatFISTA, costFISTA] = fistaEst(y, forwardObj, tau, numIter, accelerate=True)

The output of the demo is the following figure plotting the results of both algorithms within 100 iterations: