As you know, we entered our discussion of derivatives to determine size and direction of a step with which to move along a cost curve. We first used a derivative in a single variable function to see how the output of our cost curve changed with respect to change a change in our regression line's y-intercept or slope. Then we learned about partial derivatives to see how a three dimensional cost curve responded to a change in a regression line's y-intercept or slope.
However, we have not yet explicitly showed how partial derivatives apply to gradient descent.
Well, that's what we hope to show in this lesson: explain how we can use partial derivatives to find the path to minimize our cost function, and thus find our "best fit" regression line.
Now gradient descent literally means that we are taking the shortest path to descend towards our minimum. However, it is somewhat easier to understand gradient ascent than descent, and the two are quite related, so that's where we'll begin. Gradient ascent, as you could guess, simply means that we want to move in the direction of steepest ascent.
Now moving in the direction of greatest ascent for a function
Note how this is a different task from what we have previously worked on for multivariable functions. So far, we have used partial derivatives to calculate the gain from moving directly in either the
The direction of the greatest rate of increase of a function is called the gradient. We denote the gradient with the nabla, which comes from the Greek word for harp, which is kind of what it looks like:
Now how do we find the direction for the greatest rate of increase? We use partial derivatives. Here's why.
As we know, the partial derivative
Let's relate this again to the picture of our mountain climbers. Imagine the vertical edge on the left is our y-axis and the horizontal edge is on the bottom is our x-axis. For the climber in the yellow jacket, imagine his step size is three feet. A step straight along the y-axis will move him further upwards than a step along the x-axis. So in taking that step he should point his direction aligned with the y-axis than the x-axis. That will produce a bigger increase per step size.
In fact, the direction of greatest ascent for a function,
Now that we have a better understanding of a gradient, let's apply our understanding to a multivariable function. Here is a plot of a function:
Imagine being at the bottom left of the graph at the point
The gradient of the function
$\frac{df}{dx}(2x + 3y) = 2 $ and
So what this tells us is to move in the direction of greatest ascent for the function
So this path maps up well to what we see visually. That is the idea behind gradient descent. The gradient is the partial derivative with respect to each type of variable of a multivariable function, in this case
In this lesson, we saw how we can use gradient descent to find the direction of steepest descent. We saw that the direction of steepest descent is generally some combination of a change in our variables to produce the greatest negative rate of change.
We first how saw how to calculate the gradient ascent, or the gradient
For gradient descent, that is to find the direction of greatest decrease, we simply reverse the direction of our partial derivatives and move in $ - \frac{\delta f}{\delta y}, - \frac{\delta f}{\delta x}$.