ETU 2024 CompMath exponential sum fitting implementation
-
Install requirements:
pip install -r requirements.txt
-
Run
main.pyor go through the notebookdemo.ipynbto see the example of usage.
Given
Consider
Where
Assuming
Finally,
Loss function to optimize is MSE:
The method to optimize will be Levenberg-Marquardt algorith used in built-in curve_fit optimizer for scipy.
Like other numeric minimization algorithms, the Levenberg–Marquardt algorithm is an iterative procedure. To start a minimization, the user has to provide an initial guess for the parameter vector $\textbf{p}^T=\begin{pmatrix}1, 1, \dots, 1\end{pmatrix}$ will work fine; in cases with multiple minima, the algorithm converges to the global minimum only if the initial guess is already somewhat close to the final solution.
In each iteration step, the parameter vector
where
is the gradient of
So, for our problem
The loss function has its minimum at a zero gradient with respect to
or in vector notation,
Taking the derivative of this approximation of
The above expression obtained for $\mathbf{p}$ comes under the Gauss–Newton method. The Jacobian matrix as defined above is not (in general) a square matrix, but a rectangular matrix of size
Levenberg's contribution is to replace this equation by a "damped version":
The (non-negative) damping factor $\lambda$ is adjusted at each iteration. If reduction of $L$ is rapid, a smaller value can be used, bringing the algorithm closer to the Gauss–Newton algorithm: $$ \mathbf{\Delta}\approx[\mathbf{J}^T\mathbf{J}]^{-1}\mathbf{J}^{T}[\mathbf y - \mathbf f\left ( \mathbf{p}\right )] $$
whereas if an iteration gives insufficient reduction in the residual, $\lambda$ can be increased, giving a step closer to the gradient-descent direction:
To make the solution scale invariant Marquardt's algorithm solved a modified problem with each component of the gradient scaled according to the curvature. This provides larger movement along the directions where the gradient is smaller, which avoids slow convergence in the direction of small gradient. Fletcher in his 1971 paper A modified Marquardt subroutine for non-linear least squares simplified the form, replacing the identity matrix $E$ with the diagonal matrix consisting of the diagonal elements of $\mathbf{J}^T\mathbf{J}$:
An effective strategy for the control of the damping parameter, called delayed gratification, consists of increasing the parameter by a small factor for each uphill step, and decreasing by a large factor for each downhill step. The idea behind this strategy is to avoid moving downhill too fast in the beginning of optimization, therefore restricting the steps available in future iterations and therefore slowing down convergence.
As performance of implementation relied only on theoretical approaches was not good enough to use it as a solution, in resulting implementation few improvements were made.
Loss function was changed to
The update formula for
Previously, step was accepted if loss function decreased, else it was rejected and damping parameter was increased. Now, the step is accepted if the metric step-acceptance in code). This metric is a measure of actual reduction in
where
This metric of step acceptance was proposed by H.B. Nielson in his 1999 paper [3].
Chosen value for
Damping parameter and model parameters are updated according to the following rules:
If
otherwise:
where REG_DECREASE_FACTOR and REG_INCREASE_FACTOR in code). These values were chosen based on the paper [2].
The algorithm stops when one of the following conditions is satisfied:
- Convergence in the gradient norm:
$\operatorname{max}|\mathbf{J}^T\mathbf{W}(\boldsymbol{y}-\boldsymbol{\hat{y}})| < \epsilon_1$ (gradient_tolin code) - Convergence in coefficients:
$\operatorname{max}|{\mathbf{\Delta}}/\mathbf{p}| < \epsilon_2$ (coefficients_tolin code) - Convergence in (reduced)
$\chi^2$ :$\chi^2_{\nu}=\chi^2/(m-n) < \epsilon_3$ (chi2_red_tolin code)
where
In nonlinear least squares problems the