Sparsity of a vector is a key property that makes its accurate estimation possible when solving underdetermined systems of linear equations of the form

Here, we know the vector and the matrix , and we would like to computationally recover the vector . The last term models noise, which is unknown, and gives some robustness to the linear system of equations. The term *underdetermined* means that , *i.e.*, there are less measurements than unknowns, and *sparsity* means that , *i.e.*, only very few components of are non-zero. A great deal has already been said about the relationship between and and the level of noise that would make a computational estimation of possible. This underlying theory is often called *compressive sensing* and has received a lot of attention in the recent past [1][2].

Now, suppose that we are working on a problem that can be formulated as the linear system of equations above, and it turns out that the vector we are trying to recover is indeed sparse. How do we go about recovering in a way that takes advantage of its sparsity?

First, let's consider the scenario without sparsity. Then, the simplest approach to computationally estimate is to implement and run the *gradient descent algorithm* summarized as follows

Here, denotes the iteration number, denotes the estimate at , the superscript letter denotes the matrix-transposition operator, and is a step-size. The step-size is a parameter that can be precomputed in advance to make sure that the sequence of estimates converges. Specifically, we can set , where denotes the largest eigenvalue of the matrix [3].

To understand the standard gradient descent consider the following *least-squares energy function*

which simply measures the Euclidean- or -distance between the *actual* measurements and the *predicted* measurements . The gradient of the function with respect to the vector is given by

which implies that the gradient descent algorithm simply minimizes the least-squares energy function by iteratively pushing the estimate towards the direction of steepest descent.

It turns out that the sparsity of can be enforced via a slight modification of the standard gradient descent. Specifically, we can implement and run the following iteration

Here, is a special scalar function that is applied component-wise to its first argument, and is a parameter controlling the sparsity of the solution, *i.e.*, larger leads to sparser solutions. One popular function is the *soft-thresholding *function [4] that can be defined as follows

where the operator returns the positive part of its argument, and returns if its argument is positive, if its negative, and if it is exactly zero.

This modified gradient-descent algorithm is often called *iterative shrinkage/thresholding algorithm (**ISTA) *[5–7]; more broadly, these types of algorithms that alternate between taking a gradient-step and evaluating a nonlinearity are called *proximal-gradient* algorithms.

Additionally, it turns out that ISTA minimizes the following energy function

where is the same least-squares function as above, and is the -norm penalty, which is the basis of the compressive sensing theory.

Here is how a Python code of ISTA would look like given a matrix and a soft-thresholding function (ipython demo):

```
numIter = 1000 # number of iterations
gamma = np.square(1/np.linalg.norm(H, 2)) # step-size
lam = 0.001 # regularization parameter
xhat = np.zeros((N, 1)) # initialize ISTA
```**for** iter **in** range(numIter):
grad = np.dot(np.transpose(H), np.dot(H, xhat)-y) # gradient
xhat = eta(xhat - gamma*grad, gamma*lam) # update

**References:**

[1] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” *IEEE Trans. Inf. Theory*, vol. 52, no. 2, pp. 489–509, February 2006.

[2] D. L. Donoho, “Compressed sensing,” *IEEE Trans. Inf. Theory*, vol. 52, no. 4, pp. 1289–1306, April 2006.

[3] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” *SIAM J. Imaging Sciences,* vol. 2, no. 1, pp. 183–202, 2009.

[4] D. L. Donoho, “De-noising by soft-thresholding,” *IEEE Trans. Inf. Theory*, vol. 41, no. 3, pp. 613-627, May 1995.

[5] M. A. T. Figueiredo and R. D. Nowak, “An EM algorithm for wavelet-based image restoration,” *IEEE Trans. Image Process.*, vol. 12, no. 8, pp. 906–916, August 2003.

[6] J. Bect, L. Blanc-Feraud, G. Aubert, and A. Chambolle, “A -unified variational framework for image restoration,” in *Proc. ECCV*, Springer, Ed., vol. 3024, New York, 2004, pp. 1–13.

[7] I. Daubechies, M. Defrise, and C. D. Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” *Commun. Pure Appl. Math.*, vol. 57, no. 11, pp. 1413–1457, November 2004.

## 2 thoughts on “Enforcing sparsity with a proximal-gradient method”