13 Gradient Descent

Gradient Descent is a first-order iterative optimization algorithm for finding the local minimum of a differentiable cost function. Key Idea : Employ the negative gradient at each step to decrees the cost function.

Two ingredients :

Direction determined by the gradient at the point.
Magnitude also called the step size or learning rate. Intuition :
Strat at any value of parameter $θ$ .
Change $θ$ in the direction that decreases the cost function.
Repeat each step until the decrease in cost with each step is very small.

Formally, we define the negative gradient of a cost function $J (θ)$ as:

- \nabla_{θ} J = - \frac{dJ ( θ )}{d θ} .

We choose a step size parameter $η$ (learning rate).
The update equation becomes:

θ (t + 1) = θ (t) - η \nabla_{θ} J (θ (t)) (1) = θ (t) - η \frac{dJ ( θ ( t ))}{d θ} (2)

Example

If $J (θ) = - lo g (θ^{5} (1 - θ)^{7})$ , what’s the gradient ?

Ans :

J (θ) = - (5 lo g (θ) + 7 lo g (1 - θ))

Taking the derivative with respect to $θ$ :

\frac{dJ}{d θ} = - (\frac{5}{θ} + 7 \cdot \frac{1}{1 - θ} \cdot (- 1)) = - (\frac{5}{θ} - \frac{7}{1 - θ})

Final simplified gradient:

\nabla_{θ} J (θ) = \frac{7}{1 - θ} - \frac{5}{θ}

Ayush Acharjya's Notes

Explorer

13 Gradient Descent

Graph View

Backlinks