STUDY OF STOCHASTIC POLYAK STEP-SIZES FOR STOCHASTIC OPTIMIZATION
Abstract
Stochastic gradient descent (SGD) is commonly used in solving finite sum optimization problems. The main parameter one needs to choose for this algorithm is the step-size. Adaptive step sizes are particularly appealing as generally they do not rely on the parameter of functions such as smoothness constant or strong convexity constant which is required to be known in advance in guaranteeing the convergence of SGD with constant step size. Loizou et al. [1] and Horváth et al. [2] proposed novel adaptive step-sizes, SPS and StoPS, which can be seen as the stochastic variants of the classical Polyak step-size (PS). In this thesis, we provide a new viewpoint and analyze SPS and StoPS via operator theory: we no longer analyze the step-size individually. Instead, we combine the adaptive step-size and unbiased estimator of the stochastic gradient. We prove the properties of SPS and StoPS operators when assuming different properties of the function we aim to minimize and give the convergence guarantees for several types of objective with the help of operator properties. Besides, we analyze the stochastic reformulation of SPS operator and show how it is different from minibatching SPS and the connection with distributed optimization.
Collections
Related items
Showing items related by title, author, creator and subject.
-
Cyclic Stochastic Optimization: Generalizations, Convergence, and Applications in Multi-Agent Systems
Hernandez Cuevas, Karla (Johns Hopkins UniversityUSA, 2017-07-18)Stochastic approximation (SA) is a powerful class of iterative algorithms for nonlinear root-finding that can be used for minimizing a loss function, L(θ), with respect to a parameter vector θ, when only noisy observations ... -
2.5D Chiplet Architecture for Embedded Processing of High Velocity Streaming Data
Figliolia, Tomas; 0000-0003-2647-8427 (Johns Hopkins UniversityUSA, 2018-01-16)This dissertation presents an energy efficient 2.5D chiplet-based architecture for real-time probabilistic processing of high-velocity sensor data, from an autonomous real-time ubiquitous surveillance imaging system. This ... -
Simulation of non-Gaussian/non-stationary stochastic processes: beyond second-order orthogonality
Kim, Hwanpyo (Johns Hopkins UniversityUSA, 2018-05-11)The theory of stochastic processes and their generations are indispensable to characterize wind fluctuations, ocean waves, and earthquake excitations among other quantities in engineering. To computationally analyze and ...