Many modern learning methods, such as deep neural networks, are so complex that they perfectly fit the training data. Despite this, they generalize well to the unseen data. This talk will discuss an example of this phenomenon, namely the max-margin estimators for binary classification tasks, which achieves vanishing training error for separable data.
I will first talk about sharp asymptotics for the generalization error of max-margin classifiers for the Gaussian data. Then, I will show how we can use the result to study nonlinear random features model, two-layer neural networks with random first layer weights. Finally, I will discuss several insights which can be drawn from such mathematical analysis. Time permitting, I will discuss the underlying infinite dimensional optimization problem which is obtained through the comparison Gaussian min-max theory.
Joint work with Andrea Montanari, Feng Ruan, and Jun Yan