STUDY OF STOCHASTIC POLYAK STEP-SIZES FOR STOCHASTIC OPTIMIZATION

Embargo until
Date
2022-12-07
Journal Title
Journal ISSN
Volume Title
Publisher
Johns Hopkins University
Abstract
Stochastic gradient descent (SGD) is commonly used in solving finite sum optimization problems. The main parameter one needs to choose for this algorithm is the step-size. Adaptive step sizes are particularly appealing as generally they do not rely on the parameter of functions such as smoothness constant or strong convexity constant which is required to be known in advance in guaranteeing the convergence of SGD with constant step size. Loizou et al. [1] and Horváth et al. [2] proposed novel adaptive step-sizes, SPS and StoPS, which can be seen as the stochastic variants of the classical Polyak step-size (PS). In this thesis, we provide a new viewpoint and analyze SPS and StoPS via operator theory: we no longer analyze the step-size individually. Instead, we combine the adaptive step-size and unbiased estimator of the stochastic gradient. We prove the properties of SPS and StoPS operators when assuming different properties of the function we aim to minimize and give the convergence guarantees for several types of objective with the help of operator properties. Besides, we analyze the stochastic reformulation of SPS operator and show how it is different from minibatching SPS and the connection with distributed optimization.
Description
Keywords
Stochastic Optimization, Stochastic Gradient Descent, Stochastic Polyak Step-size
Citation