Notes
Two Theoretical Spectra of Modern Optimizers: Fisher-Approximated and Norm-Constrained
A note revisiting two theoretical lenses for modern optimizer design, connecting Fisher-approximated methods, norm-constrained updates, and recent optimizer trends.
Mano: Restriking Manifold Optimization for LLM Training
Introduction to the Mano optimizer, which we proposed, including the theoretical analytics and empirical results.
Latent Reasoning & Iterative Refinement in Large Language Models
Latent reasoning methodologies for LLM reasoning from layer skipping/looping to token refinement, and the diffusion language model paradigm.
Matrix Decomposition & Dimensionality Reduction in Deep Learning
Matrix decomposition and dimensionality reduction techniques and visualization techniques for high-dimensional features or spaces in deep learning.
Data-centric Methods in Deep Learning
An overview of the data-centric methods in deep learning, including data valuation, selection, synthesis, and online pruning.
From Adam to Muon: Effective 'Deep' Optimizer Preconditioners
Introduction of the different routines of preconditioner design in the development of optimizers for training deep neural networks.