Where we are

  • Scores: Expresses numerical value of Weights * input data.
  • Li : Expresses the loss of all falsely classified values after one forward pass. Can be
    • SVM Loss: Prefers that the true score is greater than the rest
    • Softmax Loss:
  • L : The total loss function with added regularization term that coerces the model towards a simpler solution.

We want to find the gradient of L. We would do this with gradient descent.

At every timestep we evaluate the gradient of the losses using either the:

  • Numerical Gradient
    • Slow :(, approximate :(, easy to write :)
  • Analytical Gradient
    • Fast :), exact :), error-prone :(
  • In practice: Derive analytical gradient, check with Numerical Gradient.

So how do we compute the analytical gradient for arbitrarily complex functions?

results matching ""

    No results matching ""