In the last post, we proved that the gradient descent algorithm can be used to computationally find a global minimizer of the least-squares cost functional, *i.e.*, it converges to a vector such that

for all , where

Today, we want to be more general and ask the following question: given an arbitrary continuous function , with , how to know that the function has a global minimizer?

Specifically, we will discuss two famous theorems that establish existence of global minimizers.

**Theorem 1 [Extreme Value Theorem]:** If is a *compact set* and is a continuous function on , then has a global minimizer on

**Theorem 2 [Extreme Value Theorem]: **If is a continuous *coercive* *function* defined on all of , then has a global minimizer.

Those theorems provide sufficient conditions for the existence of global minimizers. The rest of the post will be about defining the terms *compact* and *coercive*; in Part 2 of the post we shall prove both theorems.

We start by establishing two fundamental concepts of *open* and *closed sets* in .

**Definition 1 [open set]: **A subset is open if and only if for every there exist such that the open ball of center and radius remains in , *i.e.*, .

Remark that the radius may depend on , and also we remind the definition .

**Definition 2 [closed set]: **A subset is closed if and only if its complement is open.

More intuitively, a closed set is a set that includes its boundary (if there is one), while an open set does not. To establish the term compact set, we introduce one more intermediate definition.

**Definition 3 [bounded set]:** A subset is bounded if there exists a constant such that for all .

Finally, we now define the compact set by combining Definitions 2 and 3.

**Definition 4 [compact set]:** A subset is compact if it is closed and bounded.

From the perspective of Definition 4, Theorem 1 simply states that if we are optimizing over some compact subset of , then we are sure that there exists a global minimizer that we may hope to find computationally.

Now, what happens if we are interested in optimizing over the whole rather than on some subset?

This is where the notion of coercive functions becomes useful.

**Definition 5 [coercive function]:** A continuous function that is defined on all of is coercive if

This means that for any constant there exists a constant (that may depend on ) such that whenever .

Intuitively, for a function to be coercive, it must approach along any path within on which becomes infinite.

Going back to the example of the least-squares cost functional, we have from the reverse triangular inequality that

where the constant with denoting the smallest eigenvalue of the matrix .

Thus, when the matrix is non-singular, *i.e.*, , and since

we see that least-squares is indeed a coercive function*.*