## Error Bounds and Applications for Stochastic Approximation With Non-Decaying Gain

##### Abstract

This work analyzes the stochastic approximation algorithm with non-decaying gains as applied in time-varying problems. The setting is to minimize a sequence of scalar-valued loss functions fk(·) at sampling times τk or to locate the root of a sequence of vector-valued functions gk(·) at τk with respect to a parameter θ ∈ Rp. The available information is the noise-corrupted observation(s) of either fk(·) or gk(·) evaluated at one or two design points only. Given the time-varying stochastic approximation setup, we apply stochastic approximation algorithms. The gain has to be bounded away from zero so that the recursive estimate denoted as θˆk can maintain its momentum in tracking the time-varying optimum denoted as θ∗k. Given that {θk∗ } is perpetually varying, the best property that θˆk can have is to be near the solution θ∗k (concentration behavior) in place of the improbable convergence.
Chapter 3 provides a bound for the root-mean-squared error and a bound for the mean-absolute-deviation. Note that the only assumption imposed on {θ∗k} is that the average distance between two consecutive underlying optimal parameter vectors is bounded from above. Overall, the bounds are applicable under a mild assumption on the time-varying drift and a modest restriction on the observation noise and the bias term. After establishing the tracking capability in Chapter 3, we also discuss the concentration behavior of θˆk in Chapter 4. The weak convergence limit of the continuous interpolation of θˆk is shown to follow the trajectory of a non-autonomous ordinary differential equation. Then we apply the formula for variation of parameters to derive a computable upper-bound for the probability that θˆk deviates from θ∗k beyond a certain threshold. Both Chapter 3 and Chapter 4 are probabilistic arguments and may not provide much guidance on the gain-tuning strategies useful for one single experiment run. Therefore, Chapter 5 discusses a data-dependent gain-tuning strategy based on estimating the Hessian information and the noise level. Overall, this work answers the questions “what is the estimate for the dynamical system θ∗k” and “how much we can trust θˆk as an estimate for θ∗k.”