What are all the inputs that support diversity of images in text to image generation?

Asked Apr 25 '24 at 07:18

Active Apr 25 '24 at 07:27

Viewed 35 times

For this question, consider the stable diffusion model.

For a given text embedding, Stable Diffusion can generate diverse images. In this context, 'diversity' refers to the variation among the images generated, meaning that several images with the same semantic content (given by the text embedding) can be produced.

What factors contribute to the diversity of images generated from a given text embedding in the model? I understand that the random noise vector is a primary contributor to this diversity. What factors contribute to the diversity of images generated from a given text embedding in the model, apart from the random noise vector? I am interested in understanding the inputs that lead to this diversity.

edited Apr 25 '24 at 07:27

asked Apr 25 '24 at 07:18

hanugm

4,102
3
29
63

What are all the inputs that support diversity of images in text to image generation?

0 Answers0