# From Maxwell to Helmholtz

An accurate physical model is one of the main contributing factors towards the quality of  images that can be formed with an optical tomographic microscope. Our model in [1] was based on the beam propagation method, which splits the sample into layers, and computationally propagates the light layer-by-layer from the source all the way to the detector. The starting point for our derivations was the scalar Helmholtz equation

where $\boldsymbol{x} = (x, y, z)$ denotes a spatial coordinate, $u$ is what we called the total light-field at $\boldsymbol{x}$, $\Delta \triangleq (\partial^2/\partial x^2 + \partial^2/\partial y^2 + \partial^2/\partial z^2)$ is the Laplacian, $\mathrm{I}$ is the identity operator, and $k(\boldsymbol{x})$ is the wavenumber of the field at $\boldsymbol{x}$. The spatial dependence of the wavenumber $k$ is due to variations of the speed of light $c$ induced by the inhomogeneous nature of the sample under consideration. Once we have the Helmholtz equation, we can repeat the process described in [1], and obtain an algorithm for computationally forming an image from the measured data.

However, how does the Helmholtz equation above relate to the actual Maxwell's equations?

Maxwell's equations provide the full description of the optical waves, and there are four equations to consider. The first equation is often called the Faraday's law of induction

where $\mathbf{E}$ and $\mathbf{B}$ are electric and magnetic fields, respectively, and $\nabla \times$ is the curl operator. The second equation is sometimes called the Ampère's circuital law

where $\mathbf{J}$ is the is the total current density, $\mathbf{H}$ is a magnetic field, and $\mathbf{D}$ is the electric displacement field. The total current density $\mathbf{J}$ is related to the electric field $\mathbf{E}$ as $\mathbf{J} = \sigma \mathbf{E}$, where $\sigma(\boldsymbol{x})$ is the distribution of condictivity in the sample. The magnetic field $\mathbf{H}$ is related to $\mathbf{B}$ as $\mathbf{B} = \mu \mathbf{H}$, where $\mu(\boldsymbol{x})$ is the distribution of permeability in space. Similarly, the electric displacement field $\mathbf{D}$ is related to $\mathbf{E}$ as $\mathbf{D} = \epsilon \mathbf{E}$, where $\epsilon(\boldsymbol{x})$ is the distribution of permittivity in space. The third equation is the Gauss's law

where $\nabla \cdot$ is divergence operator and $\rho$ is the distribution of charge in the sample. Finally, the fourth equation is the Gauss's law for magnetism

To obtain the scalar Helmhotz equation, we assume that the sample is charge free and non-magnetic, i.e., $\rho(\boldsymbol{x}) = 0$ and $\mu(\boldsymbol{x}) = \mu_0$. We also limit ourselves to the electric fields that have transverse magnetic (TM) polarizations, i.e., $\mathbf{E}(\boldsymbol{x},t) = (0, 0, E_z(x,y,t))$ and $\mathbf{H}(\boldsymbol{x},t) = (H_x(\boldsymbol{x},t), H_y(\boldsymbol{x},t), 0)$, and that are thus divergence free $\nabla \cdot \mathbf{E} = 0$. Then by applying the identity $\nabla \times (\nabla \times \mathbf{E}) = \nabla (\nabla \cdot \mathbf{E}) - \Delta \mathbf{E}$, and limiting ourselves to the scalar $E_z$, we obtain

The usual Helmholtz equation is then obtained by taking a temporal Fourier transform form both sides, ignoring $\omega$, and rearranging the terms

where $k^2 = \omega^2 \mu_0 \epsilon(1 - \mathrm{j}\frac{\sigma}{\omega \epsilon})$. Typically, it is more convenient to write $k^2 = k_0^2 \epsilon$, where $k_0^2 = \omega^2 \epsilon_0 \mu_0$ corresponds to the wave vector in the air, and $\epsilon$ has been modified to incorporate $\sigma$, and is thus complex.

References:

[1] U. S. Kamilov, I. N. Papadopoulos, M. H. Shoreh, A. Goy, C. Vonesch, M. Unser, and D. Psaltis, “Optical tomographic image reconstruction based on beam propagation and sparse regularization,” IEEE Trans. Comp. Imag., vol. 2, no. 1, 2016,

# Minimization via soft-thresholding

Previously, we saw that it was straightforward to modify the gradient descent algorithm to obtain a sparse solution to a set of linear equations. The only modification that was required involved soft-thresholding the solution vector at every iteration of the gradient descent. At the end of the post, we even claimed that this modified algorithm, that is known as ISTA, minimizes some $\ell_1$-norm regularized energy function.

Today, we move towards a more rigorous understanding of ISTA. Specifically, we show that the soft-thresholding operator

minimizes the following convex function

where the number $z \in \mathbb{R}$ and the parameter $\lambda > 0$ are known, and $|\cdot|$ denotes the absolute value. This relationship for scalar functions can be easily extended to the following key result

for an arbitrary $\mathbf{z} \in \mathbb{R}^N$ and parameter $\lambda > 0$. Note that, here, $\eta$ is applied component-wise to its first argument.

We first introduce the notion of subderivative and of subdifferential. A subderivative of a convex function $c$ at $x$ is a number $g \in \mathbb{R}$ that satisfies the inequality $c(y) \geq c(x) + g(y-x)$, for all $y$. When $c$ is differentiable, the only possible choice for $g$ is the derivative $c^\prime(x)$. The set of subderivaties of $c$ at $x$ is called the subdifferentiable, denoted $\partial c(x)$. Similarly to the standard optimality conditions for differentiable convex funcitons, a point $x^\ast \in \mathbb{R}$ is a minimizer of $c$ if and only if zero is contained in the subdifferential of $c$ at $x^\ast$, i.e., $0 \in \partial c(x^\ast)$.

When $x = 0$, the absolute value term in $c$ is not differentiable. However, its subdifferential is the interval $[-1, 1]$, which leads to the optimality condition

When $x \neq 0$, the absolute value term in $c$ is differrentiable, and $\partial|x| = \mathrm{sgn}(x)$, which leads to the optimality condition

By noting that $(x < 0) \Rightarrow (z < -\lambda)$, and that $(x > 0) \Rightarrow (z > \lambda)$, we can expression the minimizer $x^\ast \neq 0$ as

Putting two cases together, we obtain

which is precisely the definition of the soft-thresholding operator.

Finally, to see how to generalize this scalar result to vectors, we re-write

and optimize separately over every $x_n$.

References:

[1] M. Unser and P. D. Tafti, "An Introduction to Sparse Stochastic Processes," Cambridge Univ. Press, 2014, ch. Proximal operators, pp. 259–261.