I am looking for a multi-modal model that can generate a mix of images and text when prompted with a query in text only.
Please find a sample expected input and output below.
Are there any models that can achieve this? Please suggest.
I have seen Next-GPT capable of generating Images based on prompt however was wondering if there are models that can decide if a text can be presented as an Image and presents the same in output.
e.g.
Input - What was the market share of smartphones in the year 2008?
Output - The marketshare of smartphones segregated by manufacturer are as listed below : Nokia: 265.6148 million (32.5% market share) Motorola: 144.9204 million (17.7% market share) Samsung: 103.7536 million (12.7% market share) LG: 54.9246 million (6.7% market share) Sony Ericsson: 51.7738 million (6.3% market share) Siemens: 28.5906 million (3.5% market share) Others: 166.9851 million (20.6% market share) Total: 816.5629 million
Source - Wikipedia
