Monthly Archives: February 2016

From Maxwell to Helmholtz

An accurate physical model is one of the main contributing factors towards the quality of  images that can be formed with an optical tomographic microscope. Our model in [1] was based on the beam propagation method, which splits the sample into layers, and computationally propagates the light layer-by-layer from the source all the way to the detector. The starting point for our derivations was the scalar Helmholtz equation

\left(\Delta + k^2(\boldsymbol{x}) \mathrm{I}\right) u(\boldsymbol{x}) = 0,

where \boldsymbol{x} = (x, y, z) denotes a spatial coordinate, u is what we called the total light-field at \boldsymbol{x}, \Delta \triangleq (\partial^2/\partial x^2 + \partial^2/\partial y^2 + \partial^2/\partial z^2) is the Laplacian, \mathrm{I} is the identity operator, and k(\boldsymbol{x}) is the wavenumber of the field at \boldsymbol{x}. The spatial dependence of the wavenumber k is due to variations of the speed of light c induced by the inhomogeneous nature of the sample under consideration. Once we have the Helmholtz equation, we can repeat the process described in [1], and obtain an algorithm for computationally forming an image from the measured data.

However, how does the Helmholtz equation above relate to the actual Maxwell's equations?

Maxwell's equations provide the full description of the optical waves, and there are four equations to consider. The first equation is often called the Faraday's law of induction

\nabla \times \mathbf{E}(\boldsymbol{x}, t) = -\frac{\partial \mathbf{B}(\boldsymbol{x},t)}{\partial t},

where \mathbf{E} and \mathbf{B} are electric and magnetic fields, respectively, and \nabla \times is the curl operator. The second equation is sometimes called the Ampère's circuital law

\nabla \times \mathbf{H}(\boldsymbol{x},t) = \mathbf{J}(\boldsymbol{x}, t) + \frac{\partial \mathbf{D}(\boldsymbol{x},t)}{\partial t},

where \mathbf{J} is the is the total current density, \mathbf{H} is a magnetic field, and \mathbf{D} is the electric displacement field. The total current density \mathbf{J} is related to the electric field \mathbf{E} as \mathbf{J} = \sigma \mathbf{E}, where \sigma(\boldsymbol{x}) is the distribution of condictivity in the sample. The magnetic field \mathbf{H} is related to \mathbf{B} as \mathbf{B} = \mu \mathbf{H}, where \mu(\boldsymbol{x}) is the distribution of permeability in space. Similarly, the electric displacement field \mathbf{D} is related to \mathbf{E} as \mathbf{D} = \epsilon \mathbf{E}, where \epsilon(\boldsymbol{x}) is the distribution of permittivity in space. The third equation is the Gauss's law

\nabla \cdot \mathbf{D}(\boldsymbol{x},t) = \rho(\boldsymbol{x}),

where \nabla \cdot is divergence operator and \rho is the distribution of charge in the sample. Finally, the fourth equation is the Gauss's law for magnetism

 \nabla \cdot \mathbf{B}(\boldsymbol{x},t) = 0.

To obtain the scalar Helmhotz equation, we assume that the sample is charge free and non-magnetic, i.e., \rho(\boldsymbol{x}) = 0 and \mu(\boldsymbol{x}) = \mu_0. We also limit ourselves to the electric fields that have transverse magnetic (TM) polarizations, i.e., \mathbf{E}(\boldsymbol{x},t) = (0, 0, E_z(x,y,t)) and \mathbf{H}(\boldsymbol{x},t) = (H_x(\boldsymbol{x},t), H_y(\boldsymbol{x},t), 0), and that are thus divergence free \nabla \cdot \mathbf{E} = 0. Then by applying the identity \nabla \times (\nabla \times \mathbf{E}) = \nabla (\nabla \cdot \mathbf{E}) - \Delta \mathbf{E}, and limiting ourselves to the scalar E_z, we obtain

\Delta E(\boldsymbol{x},t) = \mu_0\sigma(\boldsymbol{x}) \frac{\partial E(\boldsymbol{x},t)}{\partial t} + \mu_0\epsilon(\boldsymbol{x})\frac{\partial^2 E(\boldsymbol{x},t)}{\partial t^2}

The usual Helmholtz equation is then obtained by taking a temporal Fourier transform form both sides, ignoring \omega, and rearranging the terms

\Delta E(\boldsymbol{x}) + k^2(\boldsymbol{x}) E(\boldsymbol{x}) =0,

where k^2 = \omega^2 \mu_0 \epsilon(1 - \mathrm{j}\frac{\sigma}{\omega \epsilon}). Typically, it is more convenient to write k^2 = k_0^2 \epsilon, where k_0^2 = \omega^2 \epsilon_0 \mu_0 corresponds to the wave vector in the air, and \epsilon has been modified to incorporate \sigma, and is thus complex.



[1] U. S. Kamilov, I. N. Papadopoulos, M. H. Shoreh, A. Goy, C. Vonesch, M. Unser, and D. Psaltis, “Optical tomographic image reconstruction based on beam propagation and sparse regularization,” IEEE Trans. Comp. Imag., vol. 2, no. 1, 2016,

Minimization via soft-thresholding

Previously, we saw that it was straightforward to modify the gradient descent algorithm to obtain a sparse solution to a set of linear equations. The only modification that was required involved soft-thresholding the solution vector at every iteration of the gradient descent. At the end of the post, we even claimed that this modified algorithm, that is known as ISTA, minimizes some \ell_1-norm regularized energy function.

Today, we move towards a more rigorous understanding of ISTA. Specifically, we show that the soft-thresholding operator

\eta(z, \lambda) \triangleq \mathrm{sgn}(z)\left(|z|-\lambda\right)_{+}

minimizes the following convex function

c(x) \triangleq {\textstyle\frac{1}{2}}\left(z-x\right)^2 + \lambda\left|x\right|,

where the number z \in \mathbb{R} and the parameter \lambda > 0 are known, and |\cdot| denotes the absolute value. This relationship for scalar functions can be easily extended to the following key result

\eta(\mathbf{z}, \lambda) = \mathop{\mathrm{arg\,min}}_{\mathbf{x} \in \mathbb{R}^N} \left\{{\textstyle\frac{1}{2}}\left\|\mathbf{z}-\mathbf{x}\right\|_{\ell_2}^2+\lambda\|\mathbf{x}\|_{\ell_1}\right\},

for an arbitrary \mathbf{z} \in \mathbb{R}^N and parameter \lambda > 0. Note that, here, \eta is applied component-wise to its first argument.

We first introduce the notion of subderivative and of subdifferential. A subderivative of a convex function c at x is a number g \in \mathbb{R} that satisfies the inequality c(y) \geq c(x) + g(y-x), for all y. When c is differentiable, the only possible choice for g is the derivative c^\prime(x). The set of subderivaties of c at x is called the subdifferentiable, denoted \partial c(x). Similarly to the standard optimality conditions for differentiable convex funcitons, a point x^\ast \in \mathbb{R} is a minimizer of c if and only if zero is contained in the subdifferential of c at x^\ast, i.e., 0 \in \partial c(x^\ast).

When x = 0, the absolute value term in c is not differentiable. However, its subdifferential is the interval [-1, 1], which leads to the optimality condition

0 \in \partial c(0) = -z + \lambda [-1, 1] \quad\Leftrightarrow\quad z \in [-\lambda, \lambda].

When x \neq 0, the absolute value term in c is differrentiable, and \partial|x| = \mathrm{sgn}(x), which leads to the optimality condition

0 = x - z + \lambda \mathrm{sgn}(x) \quad\Leftrightarrow\quad x = z - \lambda \mathrm{sgn}(x).

By noting that (x < 0) \Rightarrow (z < -\lambda), and that (x > 0) \Rightarrow (z > \lambda), we can expression the minimizer x^\ast \neq 0 as

 x^\ast = z - \lambda \mathrm{sgn}(z)

Putting two cases together, we obtain

x^\ast = \mathrm{sgn}(z)(|z|-\lambda)_{+},

which is precisely the definition of the soft-thresholding operator.

Finally, to see how to generalize this scalar result to vectors, we re-write

c(\mathbf{x}) \triangleq {\textstyle\frac{1}{2}}\left\|\mathbf{z}-\mathbf{x}\right\|_{\ell_2}^2+\lambda\|\mathbf{x}\|_{\ell_1} = \sum_{n = 1}^N \left[{\textstyle\frac{1}{2}}(z_n-x_n)^2 + \lambda |x_n|\right],

and optimize separately over every x_n.


[1] M. Unser and P. D. Tafti, "An Introduction to Sparse Stochastic Processes," Cambridge Univ. Press, 2014, ch. Proximal operators, pp. 259–261.