The Beautiful Future
Levenberg-Marquardt algorithm 본문
The Levenberg-Marquardt algorithm(LMA), also known as the damped least-squares(DLS) method, is used to solve non-linear least squares problems.
The LMA finds only a local minimum, which is not necessarily the global minimum.
The LMA interpolates between the Gauss-Newton algorithm(GNA) and the method of gradient descent.
The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum.
For well-behaved function and reasonable starting parameters, the LMA tends to be a bit slower than GNA.
LMA can also be viewed as Gauss-Newton using a trust region approach.
1. The problem
Given a set of m empirical datum pairs of independent and dependent variables,
, optimize the parameters
of the model curve
so that the sum of the squares of the deviations
2. The solution
In case with only minimum, an uniformed standard guess like will work fine.
In case with multiple minima, the algorithm converges to the global minimum only if the initial guess is already somewhat close to the final solution.
In each iteration step, the parameter vector, is replaced by a new estimation,
. To determine
, the functions
are approximated by their linearizations
is the gradient ( row-vector in this case ) of f with respect to
.
At the minimum of the sum of squares, , the gradient of S with respect to
will be zero.
in vector notation
Taking the derivative with respect to and setting the result to zero gives:
can be singular
3. Damped version
3.1 Levenberg algorithm
where I is the identity matrix, giving as the increment, , to the estimated parameter vector,
.
The non-negative damping factor, , is adjusted at each iteration.
If reduction of S is rapid, a small value can be used, bringing the algorithm closer to the Gauss-Newton algorithm,
whereas if an iteration gives insufficient reduction in the residual, can be increased, giving a step closer to the gradient descent direction.
3.2 Levenberg-Marquardt algorithm
Levenberg's algorithm has the disadvantage that if the value of damping factor, ,is large, inverting
is not used at all.
Marquardt provided the insight that we can scale each component of the gradient according to the curvature so that there is larger movement along the directions where the gradient is smaller. This avoids slow convergence in the direction of small gradient. Therefore, Marquardt replaced the identity matrix, I,with the diagonal matrix consisting of the diagonal elements of , resulting in the Levenberg-Marquardt algorithm:
4 Choice of damping parameter
Choices of damping parameter can make the global convergence of the algorithm suffer from the undesirable properties of steep-descent, in particular very slow convergence close to the optimum.
The absolute value of any choice depends on the initial problem how well-scaled.
Marquardt recommended starting with a value and a factor
.
After one step from the starting point with the damping factor and secondly with
. If both of these are worse than initial point, then the damping is increased by successive multiplication by v until a better point is found with a new damping factor of
for some k.
If use of the damping factor results in a reduction in squared residual then this is taken as the new value of
and the process continues; if using λ/ν resulted in a worse residual, but using λ resulted in a better residual, then λ is left unchanged and the new optimum is taken as the value obtained with λ as damping factor.
'알고리즘' 카테고리의 다른 글
Linear Classification1 (0) | 2016.04.28 |
---|---|
Adaboost (0) | 2016.04.26 |
Expectation Maximization Algorithm (0) | 2015.07.16 |
Gaussian Mixture Model( GMM ) (0) | 2015.07.14 |
Kalman Filter (0) | 2014.09.16 |