How Does Neural Network Training Work: Edge of Stability, River Valley Landscape, and More

LIST

모드선택 :              
세미나 신청은 모드에서 세미나실 사용여부를 먼저 확인하세요

How Does Neural Network Training Work: Edge of Stability, River Valley Landscape, and More

김수현 0 2098
구분 ACM
일정 2025-11-14(금) 10:30~12:00
세미나실 27동 220호
강연자 윤철희 (KAIST 김재철AI대학원)
담당교수 홍영준
기타

Traditional analyses of gradient descent (GD) state that GD monotonically decreases the loss as long as the “sharpness” of the objective function—defined as the maximum eigenvalue of the objective's Hessian—is below a threshold $2/\eta$, where $\eta$ is the step size. Recent works have identified a striking discrepancy between traditional GD analyses and modern neural network training, referred to as the “Edge of Stability” phenomenon, in which the sharpness at GD iterates increases over time and hovers around the threshold $2/\eta$, while the loss continues to decrease rather than diverging. This discovery calls for an in-depth investigation into the underlying cause of the phenomenon as well as the actual inner mechanisms of neural network training.

In this talk, I will briefly overview the Edge of Stability phenomenon and recent theoretical explanations of its underlying mechanism. We will then explore where learning actually occurs in the parameter space, discussing a recent paper that challenges the idea that neural network training happens in a low-dimensional dominant subspace. Based on these observations, I propose the hypothesis that the training loss landscape resembles a “river valley.” I will also present an analysis of the Schedule‑Free AdamW optimizer (Defazio et al., 2024) through this river-valley lens, including insights into why schedule‑free methods can be advantageous for scalable pretraining of language models.

    정원 :
    부속시설 :
세미나명