4

Is there a such thing (described in title), at least in research papers (not actual models)?

So far all LLMs that I know are autoregressive models.

agemO
  • 103
  • 1

2 Answers2

8

This is a false dichotomy. Most diffusion models, including Dalle 2 and 3, already are transformers.

However, assuming you meant to ask if any language models use diffusion as opposed to a GPT, the answer is yes. For example, see this recent paper: https://arxiv.org/abs/2305.09515

chessprogrammer
  • 3,050
  • 2
  • 16
  • 26
1

They do now!

https://ml-gsai.github.io/LLaDA-demo/

We introduce LLaDA, a diffusion model with an unprecedented 8B scale, trained entirely from scratch, rivaling LLaMA3 8B in performance.

https://www.inceptionlabs.ai/news

We are announcing the Mercury family of diffusion large language models (dLLMs), a new generation of LLMs that push the frontier of fast, high-quality text generation. 
> 
>

Mercury is up to 10x faster than frontier speed-optimized LLMs. Our models run at over 1000 tokens/sec on NVIDIA H100s, a speed previously possible only using custom chips.

endolith
  • 111
  • 4