Is there a such thing (described in title), at least in research papers (not actual models)?
So far all LLMs that I know are autoregressive models.
Is there a such thing (described in title), at least in research papers (not actual models)?
So far all LLMs that I know are autoregressive models.
This is a false dichotomy. Most diffusion models, including Dalle 2 and 3, already are transformers.
However, assuming you meant to ask if any language models use diffusion as opposed to a GPT, the answer is yes. For example, see this recent paper: https://arxiv.org/abs/2305.09515
They do now!
https://ml-gsai.github.io/LLaDA-demo/
We introduce LLaDA, a diffusion model with an unprecedented 8B scale, trained entirely from scratch, rivaling LLaMA3 8B in performance.
https://www.inceptionlabs.ai/news
We are announcing the Mercury family of diffusion large language models (dLLMs), a new generation of LLMs that push the frontier of fast, high-quality text generation. > >
Mercury is up to 10x faster than frontier speed-optimized LLMs. Our models run at over 1000 tokens/sec on NVIDIA H100s, a speed previously possible only using custom chips.