PhD Student, Learning Theory, Optimization

Yufei Gu / 顾宇飞


Bridging the gap between the theoretical understandings and empirical practises in the field of deep learning.

My enthusiasm lies in the computational aspects of deep learning theory and advanced methodologies for optimization and generalization. My current research interest focuses on analyzing the learning dynamics and developing efficient optimizers for pre-training and post-training LLMs.

Recent Updates


Highlighted Work


  1. Tang Qian-Yuan†, Yufei Gu†, Cai Yunfeng, Sun Mingming, Li Ping, Xie Zeke, et al. Investigating the Overlooked Hessian Structure: From CNNs to LLMs. In Proceedings of the 42nd International Conference on Machine Learning (ICML 2025). <poster>
  2. Yufei Gu, Xiaoqing Zheng, Tomaso Aste. Unraveling the Enigma of Double Descent: an in-depth Analysis Through the Lens of Learned Feature Space. In Proceedings of the 12th International Conference on Learning Representations (ICLR 2024). <poster>
  3. Zhao Ji, Yufei Gu, Shao Shitong, Zhou Xun, Xiang Liang, Xie Zeke. Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better. In Proceedings of the 14th International Conference on Learning Representations (ICLR 2026). <poster>

See the full publication list →