Over the past few days I’ve been taking the machine learning course on Coursera by Stanford University. The professor, Andrew Ng, explained gradient descent in a relatively easier way to understand:
(For some reason the course’s lectures are also on Youtube…)
Basically, gradient descent is taking the partial derivative of a cost function in terms of a “weight” and subtracting it from the weight.
In other words, θ = θ – gradient/derivative (* the learning rate).
For example, let’s pretend this is the convex function of a linear-regression-cost function. The cost function would be z while x and y are the weights. Now, if we have a random y=ax+b function and have a = x and b = y on the graph, we can find the error of these two weights. In order to make our predictions more accurate, we would need to decrease z, and in this case, to the vertex.
A few days ago, I had completely no idea what a partial derivative is. Yet, it turns out that partial derivatives are basically derivatives that takes every variable as a constant for the one you’re trying to find. To make this simple, let’s say we’re trying to find the partial derivative of f(x) in terms of θ1.
f(x) = θ1 * x + θ0
Or, we can make it look a bit more friendly in terms of finding the partial derivative of θ1.
f(x) = a*θ1 + b
Then, we just have to take the derivative as a normal function and get
a + 0 = a = x
So, the partial derivative of f(x) in terms of θ1 is x. It is important to know that partial derivatives can get more complicated such as finding the gradient used in multi-variable gradient descent as shown in the video.
After getting the partial derivative, we would then use it to calculate the new value of the weight using gradient descent.
Since I’m taking this course, I will not be updating the neural network demo for quite a while (or at least not as often).