Do LLMs based on a diffusion model (as opposed to an autoregressive model) exist?

Question

Is there a such thing (described in title), at least in research papers (not actual models)?

So far all LLMs that I know are autoregressive models.

score 8 · Accepted Answer · answered Dec 25 '23 at 18:40

This is a false dichotomy. Most diffusion models, including Dalle 2 and 3, already are transformers.

However, assuming you meant to ask if any language models use diffusion as opposed to a GPT, the answer is yes. For example, see this recent paper: https://arxiv.org/abs/2305.09515

score 1 · Answer 2 · answered Mar 06 '25 at 14:37

They do now!

https://ml-gsai.github.io/LLaDA-demo/

We introduce LLaDA, a diffusion model with an unprecedented 8B scale, trained entirely from scratch, rivaling LLaMA3 8B in performance.

https://www.inceptionlabs.ai/news

We are announcing the Mercury family of diffusion large language models (dLLMs), a new generation of LLMs that push the frontier of fast, high-quality text generation.  >  >

Mercury is up to 10x faster than frontier speed-optimized LLMs. Our models run at over 1000 tokens/sec on NVIDIA H100s, a speed previously possible only using custom chips.

Do LLMs based on a diffusion model (as opposed to an autoregressive model) exist?

2 Answers2