« Back to Glossary Index

Stochastic Gradient Descent (SGD) is an optimization algorithm widely used in machine learning and deep learning to minimize the loss function by iteratively updating model parameters. Unlike traditional gradient descent, which computes the gradient using the entire dataset, SGD updates parameters using a single data point or a small batch at each iteration. This approach introduces randomness, leading to faster convergence and the ability to escape local minima, though it may result in noisier updates.

Advantages of SGD:

  • Speed: By processing one data point at a time, SGD can handle large datasets more efficiently, reducing computational time per iteration.
  • Escaping Local Minima: The inherent noise in SGD allows it to potentially escape local minima, aiding in finding better solutions in complex, non-convex optimization landscapes.

Considerations:

  • Convergence Stability: The randomness in SGD can lead to fluctuations in the loss function, requiring careful tuning of hyperparameters like learning rate to achieve stable convergence.
  • Batch Size Variations: Using larger batches can reduce the variance in parameter updates, leading to more stable convergence, but may also increase computational load.
« Back to Glossary Index