File:Combining GPT-4 and stable diffusion to generate art from sketches.png

Size of this preview: 800 × 350 pixels. Other resolutions: 320 × 140 pixels | 640 × 280 pixels | 1,024 × 448 pixels | 1,280 × 559 pixels | 2,961 × 1,294 pixels.

Original file ‎(2,961 × 1,294 pixels, file size: 2 MB, MIME type: image/png)

This is a file from the Wikimedia Commons. Information from its description page there is shown below.
Commons is a freely licensed media file repository. You can help.

Summary

Description	English: "Possible application in sketch generation" of GPT-4. "Text-to-image synthesis models have been widely explored in recent years, but they often suffer from a lack of spatial understanding capabilities and the inability to follow complex instructions [GPN+22]. For example, given a prompt such as “draw a blue circle on the left and a red triangle on the right”, these models may produce images that are visually appealing but do not match the desired layout or colors. On the other hand, GPT-4 can generate code from a prompt, which can be rendered as an image, in a way that is true to the instructions to a higher degree of accuracy. However, the quality of the rendered image is usually very low. Here, we explore the possibility of combining GPT-4 and existing image synthesis models by using the GPT-4 output as the sketch. As shown in Figure 2.8, this approach can produce images that have better quality and follow the instructions more closely than either model alone. We believe that this is a promising direction for leveraging the strengths of both GPT-4 and existing image synthesis models. It can also be viewed as a first example of giving GPT-4 access to tools, a topic we explore in much more depth in Section" "Prompt: A screenshot of a city-building game in 3D. The screenshot is showing a terrain where there is a river from left to right, there is a desert with a pyramid below the river, and a city with many highrises above the river. The bottom of the screen has 4 buttons with the color green, blue, brown, and red respectively"
Date	22 March 2023
Source	https://arxiv.org/abs/2303.12712
Author	Authors of the study: Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang (all at Microsoft Research)

Licensing

This file is licensed under the Creative Commons Attribution 4.0 International license.

You are free:

to share – to copy, distribute and transmit the work
to remix – to adapt the work

Under the following conditions:

attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

File history

Click on a date/time to view the file as it appeared at that time.

	Date/Time	Thumbnail	Dimensions	User	Comment
current	14:11, 8 May 2023		2,961 × 1,294 (2 MB)	Prototyperspective	Uploaded a work by Authors of the study: Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang (all at Microsoft Research) from https://arxiv.org/abs/2303.12712 with UploadWizard

File usage

The following pages on the English Wikipedia use this file (pages on other projects are not listed):

Timeline of computing 2020–present

File:Combining GPT-4 and stable diffusion to generate art from sketches.png

Summary

Licensing

Captions

Items portrayed in this file

depicts

copyright status

copyrighted

copyright license

Creative Commons Attribution 4.0 International

inception

22 March 2023

media type

image/png

checksum

ec49f5701e081f4bd544deb4d83659886e883e53

data size

2,102,136 byte

height

1,294 pixel

width

2,961 pixel

File history

File usage