Bender’s Decomposition for LP
Formulation
The Bender’s decomposition is usually used for solving LP with complicating variables. A complicating variables are those varialbes avoiding the LP to be solved 1. distributedly or from the 2. straight-forward solution.
Consider the following LP: \[ \begin{aligned} \min_{x,y} & \quad \sum_{i=1}^n c_i x_i + \sum_{j=1}^m d_j y_j \\ \text{s.t.} & \quad \sum_{i=1}^n a_{li} x_i + \sum_{j=1}^m b_{lj} y_j \leq b^{(l)}, \quad l=1,\ldots,q \\ & \quad 0 \leq x_i \leq x_i^{up}, \quad i=1,\ldots,n \\ & \quad 0 \leq y_j \leq y_j^{up}, \quad j=1,\ldots,m \end{aligned} \tag{1}\] where \(x_i^{up}\) and \(y_j^{up}\) are the upper bounds of \(x_i\) and \(y_j\), respectively.
If \(x_i\)s are known, this LP can be easily solved distributedly for each \(j=1,\ldots,m\). The Bender’s decomposition is to efficiently solve subproblems for \(y_j\)s with the cost of iterations.
Introduce the value function: for given \(x\) (as a constant), \[ \begin{aligned} \alpha(x) = \min_{y} & \quad \sum_{j=1}^m d_j y_j \\ \text{s.t.} & \quad \sum_{j=1}^m b_{lj} y_j \leq b^{(l)} - \sum_{i=1}^n a_{li} x_i, \quad l=1,\ldots,q \\ & \quad 0 \leq y_j \leq y_j^{up}, \quad j=1,\ldots,m \end{aligned} \tag{2}\] which is convex on \(x\) (the proof can be found in Boyd’s book). Therefore, \(\alpha(x)\) can be approximated from the lower bound: \[ \alpha(x) \geq \alpha(\tilde{x}) + \nabla \alpha(\tilde{x})^T (x - \tilde{x}) \] where \(\tilde{x}\) is given.
The original LP becomes a Bilevel optimization: \[ \begin{aligned} \min_{x} & \quad \sum_{i=1}^n c_i x_i + \alpha(x) \\ \text{s.t.} & \quad 0 \leq x_i \leq x_i^{up}, \quad i=1,\ldots,n \end{aligned} \] where the lower level problem is the value function \(\alpha(x)\).
The idea of Bender’s decomposition is now straight-forward. In the original LP Equation 1, if we fix \(x_i\)s, the solution is a upper bound of the original LP. In the Bilevel optimization Equation 2, if we approximate \(\alpha(x)\) from the lower bound, the solution is a lower bound. Therefore, we can iteratively update the upper bound and the lower bound until they are close enough.
Algorithm
The iteration index is denoted as \((k)\). The algorithm is as follows:
Step 0: Initialize \(x^{(0)}\) and \(\alpha^{(0)}\) by solving \[ \begin{aligned} \min_{x} & \quad \sum_{i=1}^n c_i x_i + \alpha \\ \text{s.t.} & \quad 0 \leq x_i \leq x_i^{up}, \quad i=1,\ldots,n \\ & \quad \alpha \geq \alpha^{down} \\ \end{aligned} \] where \(\alpha^{down}\) is a given lower bound of \(\alpha(x)\). Note that this optimization has solution at the boundary.
Step 1: Solve the subproblem (lower level) given by Equation 2: \[ \begin{aligned} \min_{y} & \quad \sum_{j=1}^m d_j y_j \\ \text{s.t.} & \quad \sum_{j=1}^m b_{lj} y_j \leq b^{(l)} - \sum_{i=1}^n a_{li} x_i, \quad l=1,\ldots,q \\ & \quad 0 \leq y_j \leq y_j^{up}, \quad j=1,\ldots,m \\ & \quad x_i = x_i^{(k)} \quad \lambda_i, \quad i=1,\ldots,n \end{aligned} \tag{3}\] where \(\lambda_i\) is the dual of the \(i\)-th constraint. The sensitivity \(\nabla \alpha(x^{(k)})\) equals to the dual variables. The solution is denoted as \(y^{(k)}, \lambda^{(k)}\).
Step 2: Check the convergence. The upper bound is given by the subproblem: \[ U^{(k)} = \sum_{i=1}^n c_i x_i^{(k)} + \sum_{j=1}^m d_j y_j^{(k)} \]
While the lower bound is given by the master problem: \[ L^{(k)} = \sum_{i=1}^n c_i x_i^{(k)} + \alpha^{(k)} \]
The gap is defined as: \[ U^{(k)} - L^{(k)} = \sum_{j=1}^m d_j y_j^{(k)} - \alpha^{(k)} \]
If the gap is small enough, stop. Otherwise, update \(k = k + 1\) and go to Step 3.
Step 3: Update the master problem: \[ \begin{aligned} \min_{x,\alpha} & \quad \sum_{i=1}^n c_i x_i + \alpha \\ \text{s.t.} & \quad 0 \leq x_i \leq x_i^{up}, \quad i=1,\ldots,n \\ & \quad \alpha \geq \sum_{j=1}^m d_j y_j^{(v)} + \sum_{i=1}^n \lambda_i^{(v)} (x_i - x_i^{(v)}), \quad v = 1, \cdots, k-1 \\ & \quad \alpha \geq \alpha^{down} \end{aligned} \]
Note that all the previous lower bounds (bender’s cut) are included. Therefore, the size of the master problem becomes larger. Go to step 1.
Dealing with the Infeasibility
The subproblem Equation 3 may be infeasible due to the constraints on \(x_i\). Therefore, a slack variable \(s_i \geq 0\) is introduced: \[ \begin{aligned} \min_{y,s} & \quad \sum_{j=1}^m d_j y_j + M \sum_{i=1}^n s_i \\ \text{s.t.} & \quad \sum_{j=1}^m b_{lj} y_j \leq b^{(l)} - \sum_{i=1}^n a_{li} x_i + s_i, \quad l=1,\ldots,q \\ & \quad 0 \leq y_j \leq y_j^{up}, \quad j=1,\ldots,m \\ & \quad s_i \geq 0, \quad i=1,\ldots,n \end{aligned} \] where \(M\) is a large number.
Note that this is equivalent to adding the slack variable to the original LP Equation 1 and group \(s\) into the subproblem.
Bender’s Decomposition for MILP
The algorithm for MILP is very similar to LP. If \(x\) is a set of integer variables (\(x_i\in\mathbb{N}\)), then treating \(x\) as complicating variable results in a simple MILP master problem (as lower bound, note that the \(\alpha\) is included in the master problem as continuous variable) and a LP subproblem (as upper bound).