Gradient Descent (Vanilla)

At every timestep, evaluate the gradient, then take a step in the direction of that gradient for each dimension/weight. The amount of step you take is determined by your step size.

Step size is a hyperparameter - and one of the most important hyperparameters you must set. In practice , figuring out step size isthe first hyperparameter you try to set.

results matching ""

    No results matching ""