I’ve been reading about non-autoregressive models (like NATs or diffusion models) and how they can generate outputs in parallel instead of step-by-step like autoregressive models. That sounds fast in theory, but in practice, they often need multiple refinement steps (e.g., denoising in diffusion models or iterative decoding in NATs) to get good quality.
So I’m wondering:
Are there any benchmarks that show how many refinement steps (and the corresponding time) are needed to match autoregressive model accuracy, and how the accuracy scales with the number of steps?
More practically, to reach the same level of accuracy, how does the total inference time (including all the refinement iterations) compare to autoregressive models that just decode one token at a time?
Have any companies (like Google, Meta, DeepMind, etc.) shared real-world benchmarks, blog posts, or papers on this?
Thanks!