Fundamentals: Extreme Value Theorems—Part 2

In the last post, we stated two fundamental theorems that establish existence of global minimizers of some continuous function \mathcal{C}(\mathbf{x}), where \mathbf{x} \in \mathbb{R}^N. Today, we want to discuss the technicalities of establishing these theorems.

Let's start with the first theorem, which will actually enable the proof of the second theorem.

Theorem 1 [Extreme Value Theorem]: If \mathcal{X} \subseteq \mathbb{R}^N is a compact set and \mathcal{C} is a continuous function on \mathcal{X}, then \mathcal{C} has a global minimizer on \mathcal{X}.

Proof: We shall divide the proof into two parts.

[Part 1: \mathcal{C} is bounded below on \mathcal{X} We shall prove this by contradiction. Suppose that \mathcal{C} is not bounded below on \mathcal{X}. Then, we can define a sequence \{\mathbf{x}_n\}_{n \in \mathbb{N}} in \mathcal{X}, by picking, for every n \in \mathbb{N}, a vector \mathbf{x}_n \in \mathcal{X} such that \mathcal{C}(\mathbf{x}_n) < -n. Since \mathcal{X} is compact, and thus bounded and closed, we know from the Bolzano-Weierstrass theorem that there exists a convergent subsequence \{\mathbf{x}_{n_k}\}_{k \in \mathbb{N}} of \{\mathbf{x}_n\}_{n \in \mathbb{N}}, whose limit we denote by \mathbf{x} \in \mathcal{X}. Note that we know that \mathbf{x} is indeed in \mathcal{X} due the fact that the latter is closed, i.e., \mathcal{X} contains all its limit points. From the continuity of \mathcal{C}, we know that \{\mathcal{C}(\mathbf{x}_{n_k})\}_{k \in \mathbb{N}} converges to the real number \mathcal{C}(\mathbf{x}). However, since \mathcal{C}(\mathbf{x}_{n_k}) < -n \leq -k for all k \in \mathbb{N}, we also have that \{\mathcal{C}(\mathbf{x}_{n_k})\}_{k \in \mathbb{N}} diverges to -\infty. This is a contradiction, which means that \mathcal{C} is bounded below on \mathcal{X}.

[Part 2: \mathcal{C} achieves its infimum on \mathcal{X} Since \mathcal{C} is bounded below, it has a greatest lower bound (commonly known as the infimum), which we shall denote as C \in \mathbb{R}. What we want to show is that there exists a vector \mathbf{x}^\ast \in \mathcal{X} such that \mathcal{C}(\mathbf{x}^\ast) = C. Since C is the infimum, for any n \in \mathbb{N}, we know that C + 1/n is not a lower bound for \mathcal{C}. We can then define a sequence \{\mathbf{x}_n\}_{n \in \mathbb{N}} in \mathcal{X} by picking \mathbf{x}_n \in \mathcal{X} such that \mathcal{C}(\mathbf{x}_n) < C + 1/n. Then, we have that

 C \leq \mathcal{C}(\mathbf{x}_n) < C + \frac{1}{n},

for all n \in \mathbb{N}, which means that the sequence \{\mathcal{C}(\mathbf{x}_n)\}_{n \in \mathbb{N}} converges to C as n \rightarrow \mathbb{N}. Then we know from the Bolzano-Weierstrass theorem that there exists a subsequence \{\mathbf{x}_{n_k}\}_{k \in \mathbb{N}}, which converges to some \mathbf{x}^\ast, and additionally, since \mathcal{X} is closed, we know that \mathbf{x}^\ast \in \mathcal{X}. From the continuity of the function \mathcal{C}, we have that \{\mathcal{C}(\mathbf{x}_{n_k})\}_{k \in \mathbb{N}} converges to \mathcal{C}(\mathbf{x}^\ast). But, \{\mathcal{C}(\mathbf{x}_{n_k})\}_{k \in \mathbb{N}} is a subsequence of \{\mathcal{C}(\mathbf{x}_n)\}_{n \in \mathbb{N}} that converges to C, so we must have C = \mathcal{C}(\mathbf{x}^\ast). Therefore, we can conclude that \mathcal{C} attains its infimum C at \mathbf{x}^\ast \in \mathcal{X}.

To establish the second theorem, we first prove that coercivity implies compactness, which is the special case of the following result.

Lemma 1 [coercivity \Leftrightarrow compactness]: If \mathcal{C} is a continuous coercive function defined on all of \mathbb{R}^N, then for every C \in \mathbb{R} the set

\mathcal{X} \triangleq \{\mathbf{x} \in \mathbb{R}^N \,|\, \mathcal{C}(\mathbf{x}) \leq C\}

is compact.

Proof for \Rightarrow: First note that, since \mathcal{C} is continuous, the set \mathcal{X} is closed. Thus, we only need to show that \mathcal{X} is bounded. We shall do this by contradiction. Suppose that there is a constant C \in \mathbb{R} such that \mathcal{X} is unbounded. Then, there must exist a sequence \{\mathbf{x}_n\}_{n \in \mathbb{N}} in \mathcal{X} such that \|\mathbf{x}_n\|_{\ell_2} \rightarrow \infty. But then, by coercivity of \mathcal{C}, we must also have \mathcal{C}(\mathbf{x}_n) \rightarrow \infty, which contradicts the fact that by definition of \mathcal{X}, we have that \mathcal{C}(\mathbf{x}_n) \leq C  for all n \in \mathbb{N}. Therefore, the set \mathcal{X} must be bounded.

We can now finally establish the following important result.

Theorem 2 [Extreme Value Theorem]: If \mathcal{C} is a continuous coercive function defined on all of \mathbb{R}^N, then \mathcal{C} has a global minimizer.

Proof: Let a constant C \in \mathbb{R} be chosen so that the set \mathcal{X} = \{\mathbf{x} \in \mathbb{R}^N \,|\, \mathcal{C}(\mathbf{x}) \leq C\} is non-empty. From the definition of coercivity, we know that \mathcal{X} is compact. From the Theorem 1 above, we know that \mathcal{C} has at least one global minimizer on \mathcal{X}, and as the set of global minimizers on \mathcal{X} is a global minimizer on \mathbb{R}^N, we get the result.


References:

[1] Wikipedia, "Extreme value theorem," https://en.wikipedia.org/wiki/Extreme_value_theorem.

[2] Class notes by James V. Burke on "Nonlinear Optimization", https://www.math.washington.edu/~burke/crs/408/notes/nlp/unoc.pdf

 

Fundamentals: Extreme Value Theorems—Part 1

In the last post, we proved that the gradient descent algorithm can be used to computationally find a global minimizer of the least-squares cost functional, i.e., it converges to a vector \mathbf{x}^\ast \in \mathbb{R}^N such that

\mathcal{C}(\mathbf{x}^\ast) \leq \mathcal{C}(\mathbf{x}),

for all \mathbf{x} \in \mathbb{R}^N, where \mathcal{C}(\mathbf{x}) \triangleq \frac{1}{2}\|\mathbf{y}-\mathbf{H}\mathbf{x}\|_{\ell_2}^2.

Today, we want to be more general and ask the following question: given an arbitrary continuous function \mathcal{C}(\mathbf{x}), with \mathbf{x} \in \mathbb{R}^N, how to know that the function has a global minimizer?

Specifically, we will discuss two famous theorems that establish existence of global minimizers.

Theorem 1 [Extreme Value Theorem]: If \mathcal{X} \subseteq \mathbb{R}^N is a compact set and \mathcal{C} is a continuous function on \mathcal{X}, then \mathcal{C} has a global minimizer on \mathcal{X}

Theorem 2 [Extreme Value Theorem]: If \mathcal{C} is a continuous coercive function defined on all of \mathbb{R}^N, then \mathcal{C} has a global minimizer.

Those theorems provide sufficient conditions for the existence of global minimizers. The rest of the post will be about defining the terms compact and coercive; in Part 2 of the post we shall prove both theorems.

We start by establishing two fundamental concepts of open and closed sets in \mathbb{R}^N.

Definition 1 [open set]: A subset \mathcal{X} \subset \mathbb{R}^N is open if and only if for every \mathbf{x} \in \mathcal{X}  there exist \epsilon > 0 such that the open ball \mathcal{B}_{\epsilon}(\mathbf{x}) of center \mathbf{x} and radius \epsilon remains in \mathcal{X}, i.e., \mathcal{B}_{\epsilon}(\mathbf{x}) \subset \mathcal{X}.

Remark that the radius \epsilon may depend on \mathbf{x}, and also we remind the definition \mathcal{B}_{\epsilon}(\mathbf{x}) \triangleq \{\mathbf{y} \in \mathbb{R}^N : \|\mathbf{x}-\mathbf{y}\|_{\ell_2} < \epsilon\}.

Definition 2 [closed set]: A subset \mathcal{X} \subset \mathbb{R}^N is closed if and only if its complement \mathcal{X}^\complement = \mathbb{R}^N \setminus \mathcal{X} is open.

More intuitively, a closed set is a set that includes its boundary (if there is one), while an open set does not. To establish the term compact set, we introduce one more intermediate definition.

Definition 3 [bounded set]: A subset \mathcal{X} \subseteq \mathbb{R}^N is bounded if there exists a constant C > 0 such that \|\mathbf{x}\|_{\ell_2} < C for all \mathbf{x} \in \mathcal{X}.

Finally, we now define the compact set by combining Definitions 2 and 3.

Definition 4 [compact set]: A subset \mathcal{X} \subseteq \mathbb{R}^N is compact if it is closed and bounded.

From the perspective of Definition 4, Theorem 1 simply states that if we are optimizing \mathcal{C} over some compact subset of \mathcal{X} \subseteq \mathbb{R}^N, then we are sure that there exists a global minimizer \mathbf{x}^\ast that we may hope to find computationally.

Now, what happens if we are interested in optimizing over the whole \mathbb{R}^N rather than on some subset?

This is where the notion of coercive functions becomes useful.

Definition 5 [coercive function]: A continuous function \mathcal{C} that is defined on all of \mathbb{R}^N is coercive if

\lim_{\|\mathbf{x}\|_{\ell_2} \rightarrow \infty} \left\{\mathcal{C}(\mathbf{x})\right\} = +\infty.

This means that for any constant C > 0 there exists a constant X > 0 (that may depend on C) such that \mathcal{C}(\mathbf{x}) > C whenever \|\mathbf{x}\|_{\ell_2} > X.

Intuitively, for a function to be coercive, it must approach +\infty along any path within \mathbb{R}^N on which \|\mathbf{x}\|_{\ell_2} becomes infinite.

Going back to the example of the least-squares cost functional, we have from the reverse triangular inequality that

 \|\mathbf{y}-\mathbf{H}\mathbf{x}\|_{\ell_2} \geq \|\mathbf{H}\mathbf{x}\|_{\ell_2}-\|\mathbf{y}\|_{\ell_2} \geq C\|\mathbf{x}\|_{\ell_2}-\|\mathbf{y}\|_{\ell_2},

where the constant C = \sqrt{\lambda_{\text{min}}(\mathbf{H}^\mathrm{T}\mathbf{H})} with \lambda_{\text{min}}(\mathbf{H}^\mathrm{T}\mathbf{H}) denoting the smallest eigenvalue of the matrix \mathbf{H}^\mathrm{T}\mathbf{H}.

Thus, when the matrix \mathbf{H} is non-singular, i.e., C > 0, and since

 \lim_{\|\mathbf{x}\|_{\ell_2} \rightarrow \infty} \left\{\|\mathbf{x}\|_{\ell_2}\right\} = +\infty,

we see that least-squares is indeed a coercive function.