Benefits of Additive Noise in Composing Classes with Bounded Capacity

Many machine learning systems that are used to monitor the health and well-being of elderly individuals work based on analyzing time-series. For example, the heart  rate, location, posture, body temperature, and other signals are used during a time interval to detect a pattern. Popular approaches for learning these patterns include Recurrent Neural Networks (RNNs). But can we guarantee the performance of these machine learning models? Namely, if the model works well on a few subjects, can we argue that it will work well on the future subjects? How many training samples do we need to establish such guarantee? We developed a whole new theory to establish tighter sample complexity bounds for a variety of such models. For RNNs, our sample complexity bound depends logarithmically on the length of the input sequence while the state-of-the-art bounds were super-linear. The key idea here is to add a little bit of noise while composing layers of the architecture. Intuitively, this makes it hard for the network to “memorize”. However, noisy neural networks are now probabilistic functions and require a whole set of new tools to be analyzed. Our theory established such tools, and our experiments show that these bounds can be tighter than the best-known bounds in the literature.