Gradient Descent is a first-order iterative optimization algorithm for finding the local minimum of a differentiable cost function. Key Idea : Employ the negative gradient at each step to decrees the cost function.

Two ingredients :

  • Direction determined by the gradient at the point.
  • Magnitude also called the step size or learning rate. Intuition :
  • Strat at any value of parameter .
  • Change in the direction that decreases the cost function.
  • Repeat each step until the decrease in cost with each step is very small.

Formally, we define the negative gradient of a cost function as:

We choose a step size parameter (learning rate).
The update equation becomes:

Example

If , what’s the gradient ?

Ans :

Taking the derivative with respect to :

Final simplified gradient: