Skip to content

Implementation and analysis of convex optimization algorithms

Notifications You must be signed in to change notification settings

mrsamsami/Convex-Optimization

Repository files navigation

Convex Optimization

This repository provides implemented algorithms for several convex optimization problems. All code is written in Python 3, using TensorFlow, NumPy and CVXPY. Jupyter notebooks are provided to show analyses.

Note: These are implemented algorithms and study for selected assignments and the project of convex optimization course taught by Prof. Jafari Siavoshani in Spring 2020.

Table of Contents

Variance Reduction in Stochastic Gradient Descent

Gradient descent and stochastic gradient descent (SGD) are efficient methods that play an essential role in optimization. Especially in large scale optimization problems (e.g., deep learning), SGD and its variants are very popular. However, SGD has slow convergence asymptotically due to its high variance. Thus, researchers have introduced some techniques to reduce the variance to accelerate the convergence.

One of the proposed algorithms is stochastic variance reduced gradient (SVRG). I have implemented a simple architecture for classification problem in the MNIST dataset and trained it with SGD, mini-batch gradient descent, and SVRG. I wish to intuitively analyze how SVRG works when applied to SGD in this notebook.

An Analysis of First-Order and Second-Order Methods

In the preceding section, I addressed one of the pitfalls of SGD. However, there is a drawback in gradient descent itself (and all first-order methods); the gradient gives no information about the loss function's curvature. Second-order techniques exploit the knowledge about the curvature for better updates.

In this study, I try to analyze some first-order and second-order methods and compare them in a simple binary classification problem: Gradient descent, Newton method, and natural gradient descent.

Inequality-Constrained Quadratic Optimization with Second-Order Methods

Consider quadratic problem with inequality constraints:

where P and q are random matrix and vector respectively. We can write the barrier problem as

The barrier method solves a sequence of above problems for increasing positive t, until the duality gap gets very small. You can find the implementation and various experiments in the barrier method notebook.

Moreover, I provide the implementation of the primal-dual interior-point method. Like the barrier method, primal-dual interior-point technique aims to compute approximately points on the central path, and its iterations are not necessarily feasible, but it is often more efficient. The Primal-dual interior-point method directly applies Newton root-finding update for solving the perturbed KKT conditions.

Ridge Classification With Proximal Gradient Descent

Proximal gradient descent is a generalized form of projection methods used to solve non-differentiable convex optimization problems. In this method, we decompose the objective function to a single smooth function with a Lipschitz-continuous gradient and a single non-smooth penalty function. Therefore, we write the problem as follows:

The update step in proximal gradient descent is

Now consider the binary classification problem with smoothing penalty:

You can examine the codes and analyses of this optimization problem in this notebook.

Environment Requirement

  • Python 3.6.5
    • TensorFlow 2.2.0
    • NumPy 1.14.3
    • CVXPY 1.1.0

Resources

Class:

Textbook:

Papers: