Powered by GitBook

Where we are

Scores: Expresses numerical value of Weights * input data.
L_i : Expresses the loss of all falsely classified values after one forward pass. Can be
- SVM Loss: Prefers that the true score is greater than the rest
- Softmax Loss:
L : The total loss function with added regularization term that coerces the model towards a simpler solution.

We want to find the gradient of L. We would do this with gradient descent.

At every timestep we evaluate the gradient of the losses using either the:

Numerical Gradient
- Slow :(, approximate :(, easy to write :)
Analytical Gradient
- Fast :), exact :), error-prone :(
In practice: Derive analytical gradient, check with Numerical Gradient.

So how do we compute the analytical gradient for arbitrarily complex functions?

results matching ""

No results matching ""