STUDY OF STOCHASTIC POLYAK STEP-SIZES FOR STOCHASTIC OPTIMIZATION

Wang, Yichuan

STUDY OF STOCHASTIC POLYAK STEP-SIZES FOR STOCHASTIC OPTIMIZATION

Files

WANG-THESIS-2022.pdf (875.01 KB)

Date

2022-12-07

Authors

Wang, Yichuan

Publisher

Johns Hopkins University

Abstract

Stochastic gradient descent (SGD) is commonly used in solving finite sum optimization problems. The main parameter one needs to choose for this algorithm is the step-size. Adaptive step sizes are particularly appealing as generally they do not rely on the parameter of functions such as smoothness constant or strong convexity constant which is required to be known in advance in guaranteeing the convergence of SGD with constant step size. Loizou et al. [1] and Horváth et al. [2] proposed novel adaptive step-sizes, SPS and StoPS, which can be seen as the stochastic variants of the classical Polyak step-size (PS). In this thesis, we provide a new viewpoint and analyze SPS and StoPS via operator theory: we no longer analyze the step-size individually. Instead, we combine the adaptive step-size and unbiased estimator of the stochastic gradient. We prove the properties of SPS and StoPS operators when assuming different properties of the function we aim to minimize and give the convergence guarantees for several types of objective with the help of operator properties. Besides, we analyze the stochastic reformulation of SPS operator and show how it is different from minibatching SPS and the connection with distributed optimization.

Keywords

Stochastic Optimization, Stochastic Gradient Descent, Stochastic Polyak Step-size

URI

http://jhir.library.jhu.edu/handle/1774.2/68040

Collections

ETD -- Graduate theses

Full item page