This post summarized the basic concepts in contrained optimization. The KKT conditions, necessary condition and sufficient condition for optimality will also be introduced.
General Definition and Existence Condition
Consider the constrained optimization problem \[ \begin{aligned} \min_x & \quad f(x) \\ \text{s.t.} & \quad g(x) =0 \\ & \quad h(x) \leq 0 \end{aligned} \] where \(x\in\mathbb{R}^n\), \(f:\mathbb{R}^n\rightarrow\mathbb{R}\), \(g:\mathbb{R}^n\rightarrow\mathbb{R}^m\), \(g:\mathbb{R}^n\rightarrow\mathbb{R}^p\).
A point \(x\) satisfying all constraints is an admissible point. The set of all admissible points are admissible set, denoted as \(\mathcal{X}\).
In this post, it is assumed that \(f, g, h\) are two times differentiable.
An open ball with center \(x^\star\) and radius \(\theta>0\) is \[ B\left(x^{\star}, \theta\right)=\left\{x \in \mathbb{R}^n \mid\left\|x-x^{\star}\right\|<\theta\right\} \]
A point \(x^\star \in \mathcal{X}\) is a constrained local minimizer if there exists \(\theta>0\) such that \[ f(y) \geq f(x) \quad \forall y \in \mathcal{X} \cap B\left(x^{\star}, \theta\right) \]
A point \(x^\star \in \mathcal{X}\) is a constrained global minimizer if \[ f(y) \geq f(x) \quad \forall y \in \mathcal{X} \]
If the \(\geq\) becomes \(>\) with the extra condition \(y\neq x^\star\), then \(x^\star\) is a strictly constrained (global) minimizer.
The \(i\)-th inequality constraint is active at \(\tilde{x}\) if \(h_i(\tilde{x})=0\). Otherwise, it is inactive. An index set of active inequality constraints is denoted as \(I_a(\tilde{x}) = \{i\in\{1,\cdots,p\}\mid h_i(\tilde{x})=0\}\). A vector \(h_a(\tilde{x}) = \{h_i(\tilde{x})\mid i\in I_a(\tilde{x})\}\) is the subvector of active inequality constraints.
Sometimes the equality constraints are also referred as active constraints.
A point \(\tilde{x}\) is a regular point if the gradients of active inequality constraints and equality cosntraints are linearly independent. E.g., \(\nabla g_i(x),i=1,\cdots,m\) and \(\nabla h_j(x),j\in I_a(x)\) are linearly independent. This is also referred as linear independence constraint qualification (LICQ).
When the constraints are linear, e.g. \(A\tilde{x} = b\), then \(\tilde{x}\) is a regular point if \(A\) is full row rank.
The optimality condition is relatively simple for regular points (or the LICQ condition holds). Consider the Lagrange function \[ \mathcal{L}(x,\lambda,\nu) = f(x) + \lambda^Tg(x) + \nu^Th(x) \]
First-order necessary condition
Suppose \(x^\star\) is a constrained local minimizer and \(x^\star\) is a regular points for the constraints (aka, the LICQ condition holds). Then there exists unique mutiplier \(\lambda^\star\) and \(\nu^\star\) such that \[ \begin{aligned} & \nabla_x L\left(x^{\star}, \lambda^{\star}, \nu^{\star}\right)=0 \\ & g\left(x^{\star}\right)=0 \\ & h\left(x^{\star}\right) \leq 0 \\ & \nu^{\star} \geq 0 \\ & \left(\nu^{\star}\right)^T h\left(x^{\star}\right)=0 \end{aligned} \tag{1}\] where Equation 1 is the Karush-Kuhn-Tucker (KKT) conditions.
The first condition is the stationary condition. It gives that the gradient of the objective can be written as weighted sum of the gradients of the constraints.
The last condition is the complementarity slackness. It implies that if the \(i\)-th inequality constraint is inactive, then the corresponding multiplier \(\nu^\star_i\) is zero.
Other constraint qualification (QC) or regularity condition is also available, e.g., the Mangasarian-Fromovitz constraint qualification (MFCQ). For convex optimization, the Slater’s condition is also a popular choice.
In Boyd’s book, the necessary condition is stated as,
For any optimization problem with differentiable objective and constraint functions for which strong duality obtains, any pair of primal and dual optimal points must satisfy the KKT conditions. In this sense, the condition of strong duality performs the role of QC.
Strict Compelmentarity Condition
Let \(x^\star\) be a local solution and \(\nu^\star\) be the multiplier of the inequality constraint. Then the strict complementarity condition holds if \(\nu^\star_i>0\) for all \(i\in\mathcal{I}_a(\tilde{x})\). This condition implies at most one of the \(\nu^\star_i\) and \(h_i(x^\star)\) is zero.
Second-order sufficient condition
1). Assume that there exist \(x^\star, \lambda^\star, \nu^\star\) such that the KKT conditions hold. 2). Suppose moreover that \(\nu^\star\) is such that the strict complementary condition holds at \(x^\star\). 3). Assume final that \[ s^T\nabla^2_{xx}L(x^\star,\lambda^\star,\nu^\star)s > 0 \] such that for all \(s\neq 0\), \[ \begin{bmatrix} \nabla_x g(x^\star) \\ \nabla_x h_a(x^\star) \end{bmatrix} s = 0 \] holds. Then \(x^\star\) is a strict constrained local minimizer.
The necessary and sufficient conditions become global if the optimization is convex.
Conditions for Convex Optimization
As mentioned above, the local (strict) contrained minimizer is also a global (strict) constrained minimizer if the optimization problem is convex.
Convex optimization must be in the form of \[ \begin{aligned} \min_x & \quad f(x) \\ \text{s.t.} & \quad Ax =b \\ & \quad h_i(x) \leq 0, \quad i=1,\cdots,p \end{aligned} \]
In Boyd’s book, the constraint qualification is defined as
There are many results that establish conditions on the problem, beyond convexity, under which strong duality holds. These conditions are called constraint qualifications.
A commonly used QC for convex optimization is the Slater’s condition: There exists \(x\in \text{relient}\mathcal{X}\) such that \(h_i(x)<0\) for all \(i=1,\cdots,p\) and \(Ax = b\). Slator’s condition statet that for convex optimization, the strong duality holds if the Slater’s condition holds.
Then back to the necessary condition, for convex problem, if the slator condition holds, the strong duality condition holds, the KKT condition Equation 1 must be satisfied.
E.g.Convex + Slator condition -> Strong Duality -> KKT condition.
For the sufficiency, for convex optimization problem, the KKT condition is sufficient for (primal and dual) optimality. I.e., any pair of \(\tilde{x}, \tilde{\lambda}, \tilde{\nu}\) is primal and dual optimal.
E.g. Convex + KKY -> Optimality.
With the Slator’s condition, the KKT condition is both necessary and sufficient for (primal and dual) optimality (zero-duality gap) for convex problem.