What is Gradient Descent?
Gradient Descent, or Stochastic Gradient Descent, is a machine learning algorithm that optimizes parameter estimates for a model by (slowly) adjusting the estimates according to gradients (partial derivatives) and some loss function. The 7 “steps” of gradient descent, according to Jeremy Howard’s fastai course on deep learning, are:
- Initialize the parameters;
- Use parameters and input data to predict an outcome;
- Calculate the model’s loss;
- Calculate the gradient (partial derivative for each model parameter);
- Update the parameter estimates using the gradient multiplied by some learning rate;
- Repeat steps 2-5;
- Stop according to some criteria (number of iterations, hitting a threshold re: model improvement).
The gradient descent algorithm is flexible enough to optimize the parameter estimates of arbitrary (but differentiable) functions, which makes it useful for machine learning, and deep learning in particular.
Gradient Descent Implementation
Below is the basic gradient descent algorithm for a regression problem using mean-square error as the loss function implemented in Julia. You can see a full worked example here