Gradient Descent is a first-order iterative optimization algorithm for finding the local minimum of a differentiable cost function. Key Idea : Employ the negative gradient at each step to decrees the cost function.
Two ingredients :
- Direction determined by the gradient at the point.
- Magnitude also called the step size or learning rate. Intuition :
- Strat at any value of parameter .
- Change in the direction that decreases the cost function.
- Repeat each step until the decrease in cost with each step is very small.
Formally, we define the negative gradient of a cost function as:
We choose a step size parameter (learning rate).
The update equation becomes:
Example
If , what’s the gradient ?
Ans :
Taking the derivative with respect to :
Final simplified gradient: