# Fundamentals: Extreme Value Theorems—Part 2

In the last post, we stated two fundamental theorems that establish existence of global minimizers of some continuous function $\mathcal{C}(\mathbf{x})$, where $\mathbf{x} \in \mathbb{R}^N$. Today, we want to discuss the technicalities of establishing these theorems.

Let's start with the first theorem, which will actually enable the proof of the second theorem.

Theorem 1 [Extreme Value Theorem]: If $\mathcal{X} \subseteq \mathbb{R}^N$ is a compact set and $\mathcal{C}$ is a continuous function on $\mathcal{X}$, then $\mathcal{C}$ has a global minimizer on $\mathcal{X}$.

Proof: We shall divide the proof into two parts.

[Part 1: $\mathcal{C}$ is bounded below on $\mathcal{X}$ We shall prove this by contradiction. Suppose that $\mathcal{C}$ is not bounded below on $\mathcal{X}$. Then, we can define a sequence $\{\mathbf{x}_n\}_{n \in \mathbb{N}}$ in $\mathcal{X}$, by picking, for every $n \in \mathbb{N}$, a vector $\mathbf{x}_n \in \mathcal{X}$ such that $\mathcal{C}(\mathbf{x}_n) < -n$. Since $\mathcal{X}$ is compact, and thus bounded and closed, we know from the Bolzano-Weierstrass theorem that there exists a convergent subsequence $\{\mathbf{x}_{n_k}\}_{k \in \mathbb{N}}$ of $\{\mathbf{x}_n\}_{n \in \mathbb{N}}$, whose limit we denote by $\mathbf{x} \in \mathcal{X}$. Note that we know that $\mathbf{x}$ is indeed in $\mathcal{X}$ due the fact that the latter is closed, i.e., $\mathcal{X}$ contains all its limit points. From the continuity of $\mathcal{C}$, we know that $\{\mathcal{C}(\mathbf{x}_{n_k})\}_{k \in \mathbb{N}}$ converges to the real number $\mathcal{C}(\mathbf{x})$. However, since $\mathcal{C}(\mathbf{x}_{n_k}) < -n \leq -k$ for all $k \in \mathbb{N}$, we also have that $\{\mathcal{C}(\mathbf{x}_{n_k})\}_{k \in \mathbb{N}}$ diverges to $-\infty$. This is a contradiction, which means that $\mathcal{C}$ is bounded below on $\mathcal{X}$.

[Part 2: $\mathcal{C}$ achieves its infimum on $\mathcal{X}$ Since $\mathcal{C}$ is bounded below, it has a greatest lower bound (commonly known as the infimum), which we shall denote as $C \in \mathbb{R}$. What we want to show is that there exists a vector $\mathbf{x}^\ast \in \mathcal{X}$ such that $\mathcal{C}(\mathbf{x}^\ast) = C$. Since $C$ is the infimum, for any $n \in \mathbb{N}$, we know that $C + 1/n$ is not a lower bound for $\mathcal{C}$. We can then define a sequence $\{\mathbf{x}_n\}_{n \in \mathbb{N}}$ in $\mathcal{X}$ by picking $\mathbf{x}_n \in \mathcal{X}$ such that $\mathcal{C}(\mathbf{x}_n) < C + 1/n$. Then, we have that

for all $n \in \mathbb{N}$, which means that the sequence $\{\mathcal{C}(\mathbf{x}_n)\}_{n \in \mathbb{N}}$ converges to $C$ as $n \rightarrow \mathbb{N}$. Then we know from the Bolzano-Weierstrass theorem that there exists a subsequence $\{\mathbf{x}_{n_k}\}_{k \in \mathbb{N}}$, which converges to some $\mathbf{x}^\ast$, and additionally, since $\mathcal{X}$ is closed, we know that $\mathbf{x}^\ast \in \mathcal{X}$. From the continuity of the function $\mathcal{C}$, we have that $\{\mathcal{C}(\mathbf{x}_{n_k})\}_{k \in \mathbb{N}}$ converges to $\mathcal{C}(\mathbf{x}^\ast)$. But, $\{\mathcal{C}(\mathbf{x}_{n_k})\}_{k \in \mathbb{N}}$ is a subsequence of $\{\mathcal{C}(\mathbf{x}_n)\}_{n \in \mathbb{N}}$ that converges to $C$, so we must have $C = \mathcal{C}(\mathbf{x}^\ast)$. Therefore, we can conclude that $\mathcal{C}$ attains its infimum $C$ at $\mathbf{x}^\ast \in \mathcal{X}$.

To establish the second theorem, we first prove that coercivity implies compactness, which is the special case of the following result.

Lemma 1 [coercivity $\Leftrightarrow$ compactness]: If $\mathcal{C}$ is a continuous coercive function defined on all of $\mathbb{R}^N$, then for every $C \in \mathbb{R}$ the set

is compact.

Proof for $\Rightarrow$: First note that, since $\mathcal{C}$ is continuous, the set $\mathcal{X}$ is closed. Thus, we only need to show that $\mathcal{X}$ is bounded. We shall do this by contradiction. Suppose that there is a constant $C \in \mathbb{R}$ such that $\mathcal{X}$ is unbounded. Then, there must exist a sequence $\{\mathbf{x}_n\}_{n \in \mathbb{N}}$ in $\mathcal{X}$ such that $\|\mathbf{x}_n\|_{\ell_2} \rightarrow \infty$. But then, by coercivity of $\mathcal{C}$, we must also have $\mathcal{C}(\mathbf{x}_n) \rightarrow \infty$, which contradicts the fact that by definition of $\mathcal{X}$, we have that $\mathcal{C}(\mathbf{x}_n) \leq C$  for all $n \in \mathbb{N}$. Therefore, the set $\mathcal{X}$ must be bounded.

We can now finally establish the following important result.

Theorem 2 [Extreme Value Theorem]: If $\mathcal{C}$ is a continuous coercive function defined on all of $\mathbb{R}^N$, then $\mathcal{C}$ has a global minimizer.

Proof: Let a constant $C \in \mathbb{R}$ be chosen so that the set $\mathcal{X} = \{\mathbf{x} \in \mathbb{R}^N \,|\, \mathcal{C}(\mathbf{x}) \leq C\}$ is non-empty. From the definition of coercivity, we know that $\mathcal{X}$ is compact. From the Theorem 1 above, we know that $\mathcal{C}$ has at least one global minimizer on $\mathcal{X}$, and as the set of global minimizers on $\mathcal{X}$ is a global minimizer on $\mathbb{R}^N$, we get the result.

References:

[1] Wikipedia, "Extreme value theorem," https://en.wikipedia.org/wiki/Extreme_value_theorem.

[2] Class notes by James V. Burke on "Nonlinear Optimization", https://www.math.washington.edu/~burke/crs/408/notes/nlp/unoc.pdf

# Fundamentals: Extreme Value Theorems—Part 1

In the last post, we proved that the gradient descent algorithm can be used to computationally find a global minimizer of the least-squares cost functional, i.e., it converges to a vector $\mathbf{x}^\ast \in \mathbb{R}^N$ such that

for all $\mathbf{x} \in \mathbb{R}^N$, where $\mathcal{C}(\mathbf{x}) \triangleq \frac{1}{2}\|\mathbf{y}-\mathbf{H}\mathbf{x}\|_{\ell_2}^2.$

Today, we want to be more general and ask the following question: given an arbitrary continuous function $\mathcal{C}(\mathbf{x})$, with $\mathbf{x} \in \mathbb{R}^N$, how to know that the function has a global minimizer?

Specifically, we will discuss two famous theorems that establish existence of global minimizers.

Theorem 1 [Extreme Value Theorem]: If $\mathcal{X} \subseteq \mathbb{R}^N$ is a compact set and $\mathcal{C}$ is a continuous function on $\mathcal{X}$, then $\mathcal{C}$ has a global minimizer on $\mathcal{X}$

Theorem 2 [Extreme Value Theorem]: If $\mathcal{C}$ is a continuous coercive function defined on all of $\mathbb{R}^N$, then $\mathcal{C}$ has a global minimizer.

Those theorems provide sufficient conditions for the existence of global minimizers. The rest of the post will be about defining the terms compact and coercive; in Part 2 of the post we shall prove both theorems.

We start by establishing two fundamental concepts of open and closed sets in $\mathbb{R}^N$.

Definition 1 [open set]: A subset $\mathcal{X} \subset \mathbb{R}^N$ is open if and only if for every $\mathbf{x} \in \mathcal{X}$  there exist $\epsilon > 0$ such that the open ball $\mathcal{B}_{\epsilon}(\mathbf{x})$ of center $\mathbf{x}$ and radius $\epsilon$ remains in $\mathcal{X}$, i.e., $\mathcal{B}_{\epsilon}(\mathbf{x}) \subset \mathcal{X}$.

Remark that the radius $\epsilon$ may depend on $\mathbf{x}$, and also we remind the definition $\mathcal{B}_{\epsilon}(\mathbf{x}) \triangleq \{\mathbf{y} \in \mathbb{R}^N : \|\mathbf{x}-\mathbf{y}\|_{\ell_2} < \epsilon\}$.

Definition 2 [closed set]: A subset $\mathcal{X} \subset \mathbb{R}^N$ is closed if and only if its complement $\mathcal{X}^\complement = \mathbb{R}^N \setminus \mathcal{X}$ is open.

More intuitively, a closed set is a set that includes its boundary (if there is one), while an open set does not. To establish the term compact set, we introduce one more intermediate definition.

Definition 3 [bounded set]: A subset $\mathcal{X} \subseteq \mathbb{R}^N$ is bounded if there exists a constant $C > 0$ such that $\|\mathbf{x}\|_{\ell_2} < C$ for all $\mathbf{x} \in \mathcal{X}$.

Finally, we now define the compact set by combining Definitions 2 and 3.

Definition 4 [compact set]: A subset $\mathcal{X} \subseteq \mathbb{R}^N$ is compact if it is closed and bounded.

From the perspective of Definition 4, Theorem 1 simply states that if we are optimizing $\mathcal{C}$ over some compact subset of $\mathcal{X} \subseteq \mathbb{R}^N$, then we are sure that there exists a global minimizer $\mathbf{x}^\ast$ that we may hope to find computationally.

Now, what happens if we are interested in optimizing over the whole $\mathbb{R}^N$ rather than on some subset?

This is where the notion of coercive functions becomes useful.

Definition 5 [coercive function]: A continuous function $\mathcal{C}$ that is defined on all of $\mathbb{R}^N$ is coercive if

This means that for any constant $C > 0$ there exists a constant $X > 0$ (that may depend on $C$) such that $\mathcal{C}(\mathbf{x}) > C$ whenever $\|\mathbf{x}\|_{\ell_2} > X$.

Intuitively, for a function to be coercive, it must approach $+\infty$ along any path within $\mathbb{R}^N$ on which $\|\mathbf{x}\|_{\ell_2}$ becomes infinite.

Going back to the example of the least-squares cost functional, we have from the reverse triangular inequality that

where the constant $C = \sqrt{\lambda_{\text{min}}(\mathbf{H}^\mathrm{T}\mathbf{H})}$ with $\lambda_{\text{min}}(\mathbf{H}^\mathrm{T}\mathbf{H})$ denoting the smallest eigenvalue of the matrix $\mathbf{H}^\mathrm{T}\mathbf{H}$.

Thus, when the matrix $\mathbf{H}$ is non-singular, i.e., $C > 0$, and since

we see that least-squares is indeed a coercive function.