Hi! I took the optimization course this semester (Fall 2018), which was offered for the master students under supervision of Dr. Maryam Amirmazlaghani at AUT. As many of my friends and other students asked me about how the course was offered and what the main topics offered in the course are, I decided to write this article in an effort to further clarify what you can learn by taking this course, by sharing my own experiences from taking this course. Feel free to contribute to this article and hope you'll find it useful!
- Sources
- Introduction
- Solving Optimization Problems
- Unconstrained Optimization
- Unconstrained Optimization Applications
- Constrained Optimization
First, I mention the source materials that were used in this course.
- Convex Optimization - Boyd and Vandenberghe
- All of the chapters were covered (sparsely for sure :D)
- Numerical Optimization - Jorge Nocedal, Stephen J. Wright
- Chapters 2 to 5 (also sparsely)
Also, the slides used to teach in the classroom can be found here (Available for AUT students only!).
First, The course started with some introductions of the optimization topic and then it moved on to defining an optimization problem and the algorithms used to solve them, and also further categorize them both.
Optimization is portrayed in terms of:
- Defining an objective (a quantitive measurement of performance)
- Finding the variables the objective depends on
- Finding the values of the variables that optimize the objective
Mathematically, we model our problem as an optimization problem, and then use an optimization algorithm to solve it.
Optimization problems are a set of equations in the form below, in which x
is the input vector of variables, f
is the objective function and Ci
s are the constraint functions (scalar functions of x
) that define certain equations and inequalities that x
must satisfy.
The set of x
's that satisfy all of the Ci
's and are in Domain of f
, are called the feasible set for this problem; Also, if there is no Ci
, the problem is called an unconstrained optimization problem, and otherwise, it's a constrained or general optimization problem.
Optimization algorithms are normally some iterative algorithms, which begin with an initial guess of x
and then generate a sequence of improved estimates until they terminate (satisfy a certain criterion). The strategy to improve the estimation is the key difference between different algorithms.
As you might've guessed already, optimization can be used in a vast variety of fields and solve many kinds of problems, from solving a certain company's employee allocation problems, to defining best exchange portfolio based on a person's budget, or some well known (and hot -- these days at least) topics like finding the optimal parameters for a vide variety of machine learning algorithms like:
- Different Regression Algorithms
- Linear and Kernel SVMs
- K-Means Clustering
- Neural Network Training
- and so many more!
As they say nowadays:
Optimization is the heart of machine learning.
Solving the optimization problems at the general form is a really tough challenge! There are currently (and probably, always :D) no efficient algorithms to solve the general optimization problem. Instead, we can solve some specific optimization problems efficiently called linear programs, least square problems and convex problems. General problems are usually solved in terms of defining the solving their convex subproblems.
The course then goes through defining convex sets, convex functions and some other related linear algebra stuff and some methods for checking if a function or set is convex or not, including some optimization problem equivalences that help defining convexity of functions and sets. It then uses these concepts to define and verify a convex problem, and also studying some attributes of these problems that help us in building efficient algorithms for solving these kind of problems.
Another type of convex problems discussed are the multi objective optimization problems which introduces the idea of multiple dimensioned objective functions (or optimizing alike problems at the same times). There comes the introduction of pareto optimal problems and scalarization for these kinds of problems.
Then the course moves on to introducing some algorithms for solving the unconstrained problems. The algorithms discusses are divided into two types:
These algorithms include two main steps in each iterations:
- Defining the direction in which we'll make the next step towards to optimal point. Approaches for this step include:
- Calculating the step length for each iteration to take onside the direction chosen. The approaches for this part include:
- Backtrack step length selection using wolfe conditions
- Cubic interpolation step length selection method
These algorithms define an area around the current point, and find a direction and a step length simultaneously to go to another point within the area defined, the algorithms discussed in this part are:
I've implemented these algorithms and demonstrated their example usage in this repository as a homework for this course: Unconstrained Optimization on GitHub
Then two main applications of unconstrained optimization are described which include:
These applications include several topics:
- Norm Approximation
- Least Norm
- Least Penalty
- Regularized Least Penalty (Multi Objective)
- Optimal Input Design
- Robust Approximation
These problems include estimating parameters of a known distribution and solving the well known MLE and MAP problems (using the methods described) which are used vastly in machine learning.
The course moves through defining some algorithms for solving constrained optimization problems. As duality theorem is used in these algorithms, or as it's used elsewhere for solving these problems (which we'll come to shortly), first the duality theorem and some of its applications in optimization are discussed and then they're used in solving the constrained optimization problems.
According to the duality theorem, we can find a dual problem for every optimization problem, which gives us a lower bound on the optimal value of p* (the optimal value for the original problem). Then comes two types of duality: (d* is the optimal value for the dual problem)
- Weak Duality: p* >= d*
- Strong Duality: p* = d*
And so, we call p* - d* the duality gap for our dual and original problems. Thus, the dual problem can be used to find (exact or inexact) lower bounds on the optimal values of original problem. There comes some conditions such as Salter's conditions and KKT conditions to check for strong duality.
SVM for example, is a sample of an algorithm which can be solved using its duality problem.
Finally, the course moves on to solving the constrained problems. First we consider the equality constrained problems, in which each Ci
is an equality constraint, and there exists at least one Ci
.
The methods for solving these problems include:
- Eliminating Equality Constraints
- Solving Dual Problem of the original problem
- Newton’s method
- This is a modification of the original newton algorithm which considers the equality constraint and uses duality to find a step length which both moves through the optimal point and takes into account the feasibility of the point according to the equality constraints.
- There is also another modification which enables this method to use some infeasible starting point.
As the last topic, we move on solving the general constrained optimization problem.
The approach taken here is to approximately formulate the inequality constrained problem as an equality constrained problem. We use a logarithmic barrier to make this approximation work, which gives us a central path defined based on the parameter of the logarithmic barrier (called t
). Then two algorithms are used for calculating the optimal value using the central path idea:
- Log Barrier Method
- The idea behind this method is to choose a small
t
and a random starting point at the beginning and then increasingt
based on satisfying certain criterion until reaching the desired tolerance for each step.
- The idea behind this method is to choose a small
- Primal Dual Interior Point Method
- The idea behind this algorithm is to use a modified version of the KKT conditions and compute a primal dual search direction in each step and use it in conjunction with a line search method to make a step in each iteration and continuing until reaching desired tolerance for primal and dual residuals.
I've implemented these two algorithms to run on a QP Program and demonstrated their example usage in this repository as a homework for this course: Constrained Optimization for QP on GitHub