Reading List · Archived · 2026.01

Efficient Diffusion Language Models — Paper List

Inference acceleration, sampling, caching, and post-training techniques for diffusion language models.


Foundational Models

  1. LLaDA: Large Language Diffusion Models. <arXiv 2025.2> <NIPS 2025 Oral>
    • LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models. <arXiv 2025.5>
  2. Dream: Diffusion Large Language Models. <arXiv 2025.8>

Analytics

  1. Diffusion Language Models Know the Answer Before Decoding. <arXiv 2025.8>
  2. The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models. <arXiv 2026.1>
  3. Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models. <arXiv 2026.1>

KV-Cache / Sparse Attention

  1. dKV-Cache: The Cache for Diffusion Language Models. <arXiv 2025.5>
  2. Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding. <arXiv 2025.5>
  3. Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction. <arXiv 2025.8>
  4. SparseD: Sparse Attention for Diffusion Language Models. <arXiv 2025.9>
  5. dCache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching. <arXiv 2025.9>
  6. Attention is All You Need for KV Cache in Diffusion LLMs. <arXiv 2025.10>

Step Distillation

  1. Progressive Distillation for Fast Sampling of Diffusion Models. <ICLR 2022>
  2. Distillation of Discrete Diffusion through Dimensional Correlations. <ICML 2025>
  3. Learning Few-Step Diffusion Models by Trajectory Distribution Matching. <arXiv 2025.3>
  4. DLM-One: Diffusion Language Models for One-Step Sequence Generation. <arXiv 2025.6>
  5. Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing. <arXiv 2025.8>
  6. FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models. <arXiv 2025.9>
  7. Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Steps. <arXiv 2025.9>
  8. Adaptive (Block) Length:
    • CtrlDiff: Boosting Large Diffusion Language Models with Dynamic Block Prediction and Controllable Generation. <arXiv 2025.5>
    • Beyond Fixed: Training-free Variable-Length Denoising for Diffusion Large Language Models. <arXiv 2025.8>
    • AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size. <arXiv 2025.9>

Training-free Sampler

  1. Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking. <arXiv 2025.5>
  2. Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs. <arXiv 2025.7>
  3. Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States. <arXiv 2025.10> <ICLR 2026 Review>
  4. KLASS: KL-Guided Fast Inference in Masked Diffusion Models. <arXiv 2025.11>
  5. Beyond Confidence: Adaptive and Coherent Decoding for Diffusion Language Models. <arXiv 2025.11>
  6. Optimal Inference Schedules for Masked Diffusion Models. <arXiv 2025.11>
  7. Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules. <arXiv 2025.12>
  8. Optimizing Decoding Paths in Masked Diffusion Models by Quantifying Uncertainty. <arXiv 2025.12>
  9. Decoding Large Language Diffusion Models with Foreseeing Movement. <arXiv 2025.12>

Long Context

  1. LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs. <arXiv 2025.6>
  2. UltraLLaDA: Scaling the Context Length to 128k for Diffusion Large Language Models. <arXiv 2025.10>

Unmasking / Remasking

  1. Unmasking:
    • Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models. <NIPS 2025>
    • Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes. <ICLR 2026 Review>
    • Learning Unmasking Policies for Diffusion Language Models. <arXiv 2025.12>
    • dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning. <arXiv 2025.12>
    • Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models. <arXiv 2026.1>
  2. Remasking:
    • Remasking Discrete Diffusion Models with Inference-Time Scaling. <NIPS 2025>
    • Don’t Settle Too Early: Self-Reflective Remasking for Diffusion Language Models. <ICLR 2026 Review>
    • Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model. <arXiv 2025.10>
    • Thinking Inside the Mask: In-Place Prompting in Diffusion LLMs. <arXiv 2025.8>

AR-to-DLM Transfer / Transformation

  1. Autoregressive Models Rival Diffusion Models at ANY-ORDER Generation. <arXiv 2026.1>
  2. Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models. <arXiv 2026.1>

← Back to all reading lists