Talk:DALL-E/Archive 1

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

DALL·E or DALL-E?

Latest comment: 1 year ago3 comments3 people in discussion

Do we know if the official name of the AI is DALL·E or DALL-E? OpenAI seems to be using DALL·E everywhere on their website, while external sources use DALL-E. LittleWhole (talk) 08:21, 7 January 2021 (UTC)

For what it's worth, roughly the same happened to WALL-E. Azai~enwiki (talk) 08:47, 11 January 2021 (UTC)

"Dall·E" is official, but everyone uses "Dall-E" instead, because the interpunct is difficult to type. ⇒ Zhing-Za, they/them, talk 20:11, 17 April 2023 (UTC)

Did you know nomination

Latest comment: 3 years ago3 comments3 people in discussion

The following is an archived discussion of the DYK nomination of the article below. Please do not modify this page. Subsequent comments should be made on the appropriate discussion page (such as this nomination's talk page, the article's talk page or Wikipedia talk:Did you know), unless there is consensus to re-open the discussion at this page. No further edits should be made to this page.

The result was: promoted by DanCherek (talk) 14:49, 12 March 2021 (UTC)

(

Comment or view
Article history

)

... that the machine learning model DALL-E, while widely reported on for its "surreal" and "quirky" output, also learned how to solve Raven's Matrices (a form of IQ test) without being specifically taught to?Source: Input, NBC, Nature, VentureBeat, Wired, CNN, New Scientist and BBC pieces (refs in article) for surrealness and quirkiness, TheNextWeb piece (ref also in article) for Raven's Matrices
- ALT1:... that ...? Source: "You are strongly encouraged to quote the source text supporting each hook" (and [link] the source, or cite it briefly without using citation templates)

Reviewed: Template:Did you know nominations/Green Leader

5x expanded by JPxG (talk). Self-nominated at 19:20, 7 March 2021 (UTC).

This interesting article qualifies for DYK as a five-fold expansion and is new enough and long enough. The hook facts are cited inline, the article is neutral and I detected no copyright issues. A QPQ has been done. Cwmhiraeth (talk) 07:26, 8 March 2021 (UTC)

GA Review

Latest comment: 3 years ago33 comments2 people in discussion

This review is transcluded from Talk:DALL-E/GA1. The edit link for this section can be used to add comments to the review.

Reviewer: RoySmith (talk · contribs) 16:29, 2 April 2021 (UTC)

Starting review. My plan is to do two major passes through the article, first for prose, the second to verify the references. In general, all my comments will be suggestions which you can accept or reject as you see fit. -- RoySmith (talk) 16:29, 2 April 2021 (UTC)

I put this on hold for a week, with no response, so closing this as a failed review. -- RoySmith (talk) 15:39, 10 April 2021 (UTC)

Shit! I have been busy with a bunch of online stuff happening at the same time. Anyway, I will go through all of these things and probably do some expansion, and nominate again later. jp×g 18:43, 11 April 2021 (UTC)

Prose

Lead section

"It uses a 12-billion parameter[2] version of the GPT-3 Transformer model to interpret natural language inputs (such as "a green leather purse shaped like a pentagon" or "an isometric view of a sad capybara") and generate corresponding images." For the lead section, I'd leave out the whole "(such as ...)" parenthetical phrase. That's covered in more detail in the main body.

? This one, I'm not sure how familiar a layman would be with the phrase natural language being used to specifically mean "a sentence spoken the way you'd say it to a person" rather than "written in a human language". After all, fetchString = cur.execute("SELECT * FROM threads2 WHERE replycount > 20 AND viewcount/replycount > 300 ORDER BY forumid, replycount DESC LIMIT 1000") is an English-language sentence. You are correct that this is an unreasonably long sentence, though. I will ponder this. jp×g 19:10, 2 April 2021 (UTC)
JPxG, You've linked to natural language. Somebody who is not familiar with AI jargon can click through to find out what it means, and that fact that it's linked should be a clue that it has some special meaning. But, maybe if you want to give a little more, replace the current "(such as ... capybara)" with something like "(conventional human language)"? I'm assuming this was trained on an English-language corpus; you should verify that and mention it somewhere if it is indeed the case. -- RoySmith (talk) 20:07, 2 April 2021 (UTC)

"It is able to create images" -> "It can create images"

Y Done. jp×g 19:07, 2 April 2021 (UTC)

The long lists of citations on some sentences ("...texture of a porcupine").[2][4][5][6][7][8]") seem like WP:REFBOMB and detract from readability (particularly in the lead section). Can these be trimmed to just the most important sources that actually support the statement?

Y This is a silly artifact of how I wrote the article (write a couple-sentence stub, locate and format all references, then flesh out an article by moving them down into the expanded text), which is definitely unintentional. Fixed. jp×g 19:07, 2 April 2021 (UTC)

"DALL-E's name is a" -> "The name is a"

Y Fixed. jp×g 19:13, 2 April 2021 (UTC)

"in conjunction with another model, CLIP" -> "in conjunction with CLIP"

Y Fixed. jp×g 19:13, 2 April 2021 (UTC)

"OpenAI has refused to release source code for either mode" While this may be true, it's non-neutral. The implication is, "They should have released the source code, they were asked to do so, and they refused". I see you talk about this more later, but here in the lead, you state it in Wikipedia voice, which violates WP:NPOV. Must fix.

? This one is a little tricky: in GPT-2 (frankly, a better article) I wrote at much greater length about the issue there. To wit, OpenAI was founded as a nonprofit, and received funding to develop models, with the explicit goal of making their research open to the public (as opposed to organizations like DeepMind). The decision to not release GPT-2 was widely criticized, and they ended up releasing it anyway after determining that the abuse concerns were not based in fact. However, I agree that this should probably be explained in greater detail than just saying they "refused". jp×g 19:38, 2 April 2021 (UTC)

"one of OpenAI's objectives through DALL-E's development" -> one of DALE-E's objectives"

Y Fixed. jp×g 19:40, 2 April 2021 (UTC)

Architecture

"model was first developed by OpenAI in 2018" -> I think "initially" works better than "first"

Y Done. jp×g 19:41, 2 April 2021 (UTC)

"was scaled up to produce GPT-2 in 2019.[10] In 2020, GPT-2 was augmented similarly to produce GPT-3,[11] of which DALL-E is a implementation.[2][12]" -> "was scaled up to produce GPT-2 in 2019, and GPT-3 (which DALE-E uses) in 2020."

I have made an attempt here, but it is still a little awkward. Let me know what you think. jp×g 19:43, 2 April 2021 (UTC)
JPxG, What you've got now is better. I could suggest some other alternatives, but I think it's fine now. -- RoySmith (talk) 20:10, 2 April 2021 (UTC)

"It uses zero-shot learning", clarify whether "it" refers to GPT-3, DALE-E, or GPT in general.

Y I've moved that sentence down to an appropriate location, where I think it is much clearer (and where it makes more sense to be anyway). jp×g 19:47, 2 April 2021 (UTC)

"scaled down from GPT-3's parameter size of 175 billion" -> scaled down from GPT-3's 175 billion"

Y jp×g 19:47, 2 April 2021 (UTC)

"large amounts of images" -> "a large number of images".

? I will dig my heels in very slightly on this one; I say amounts (plural) since it generates a large amount (singular) in response to each prompt (singular). jp×g 19:47, 2 April 2021 (UTC)
JPxG, The problem is, "amount" implies a measurement, as opposed to a count. You can have "a large amount of image data", but you can't have "a large amount of images". You can have "a large number of images", or "many images", or "a voluminous quantity of images", or "a boatload of images". -- RoySmith (talk) 20:15, 2 April 2021 (UTC)
Much to think about. I think you are correct; will fix. jp×g 20:43, 2 April 2021 (UTC)

"another OpenAI model, CLIP,", this should start a new sentence.

Y jp×g 19:47, 2 April 2021 (UTC)

""understand and rank" its output", I think "its" refers to DALE-E's here, but clarify.

Y jp×g 19:48, 2 April 2021 (UTC)

" (like ImageNet)", I'd leave that out completely. Since you're talking about "most classifier models", calling out one in particular doesn't add any value.

ImageNet is a curated dataset of labeled images, not a classifier model. I have edited it to make this a little clearer. jp×g 19:49, 2 April 2021 (UTC)
JPxG, But, there's still lots of curated image datasets. What's so special about ImageNet that it needs to be called out as the one example you mention as something that wasn't used? -- RoySmith (talk) 20:18, 2 April 2021 (UTC)

"Rather than learn from a single label", avoid repetition of the word "rather".

Y Rather than use something instead of that rather, I have used instead rather than rather for the previous rather. jp×g 19:51, 2 April 2021 (UTC)

"CLIP learns to associate" -> "CLIP associates"

Y jp×g 19:51, 2 April 2021 (UTC)

Performance

As above, WP:REFBOMB

Y Fixed. jp×g 19:39, 2 April 2021 (UTC)

"quoted Neil Lawrence ... describing it as ..." I think you mean, "quoted Neil Lawrence ... who described it as ..."

Y jp×g 19:53, 2 April 2021 (UTC)

" He also quoted Mark Riedl" clarify who "he" is.

Y jp×g 19:53, 2 April 2021 (UTC)

Implications

"DALL-E demonstrated "it is becoming", not sure, but maybe, "DALE-E demonstrated that "it is becoming"

Y jp×g 19:54, 2 April 2021 (UTC)

My overall impression is that this reads like a publicity piece for OpenAI. The vast majority of the quotes are extolling the virtues of the system, with only one or two examples of problems, and even those are in the context of, "but he's an example of what it does better". The REFBOMB aspect is part of the problem, but it's deeper than that. I'm going to put the rest of the review on hold for a week to give you a chance to address that. -- RoySmith (talk) 17:54, 2 April 2021 (UTC)

Thanks for taking the time to review! I will go through it now. jp×g 19:01, 2 April 2021 (UTC)

Okay, have gone through it. I think that the lack of negativity in the article is mostly a consequence of OpenAI's embargo; nobody can access the code outside of an extremely narrowly-controlled demo which is more like a photo album than an interface (which, nevertheless, I strongly advise you to check out to form an opinion on the model). They also took a fair bit of time to release the paper, which I will admit to not having had time to go through yet, and I think this would also enable a fairly neutral description of what it does. Some of the more cynically minded opinion-havers called the GPT-2 embargo a deliberate strategy to hype up the GPT-2's capabilities when they did it with that model. I will go hunting for some more stuff to add to the article, though. jp×g 20:00, 2 April 2021 (UTC)

Images

I took another look at this. The infobox image File:DALL-E sample.png claims to be MIT-licensed from https://openai.com/blog/dall-e/. I can't find the image there. The Commons page is also lacking author information.

A Commons file used on this page or its Wikidata item has been nominated for deletion

Latest comment: 2 years ago1 comment1 person in discussion

The following Wikimedia Commons file used on this page or its Wikidata item has been nominated for deletion:

DALL-E sample.png

Participate in the deletion discussion at the nomination page. —Community Tech bot (talk) 14:37, 6 May 2022 (UTC)

What to do with DALL-E 2

Latest comment: 2 years ago3 comments3 people in discussion

OpenAI released last month DALL-E 2, a sequel to the DALL-E program. Should this page act as a wiki for DALL-E 1, and create a new page for DALL-E 2, or should the DALL-E page act as an umbrella for all future DALL-E iterations?

Camdoodlebop (talk) 01:38, 24 May 2022 (UTC)

I'm more inclined to agree with the second suggestion, but let's see what other editors have to say. - Munmula (talk), second account of Alumnum 02:42, 24 May 2022 (UTC)

I think the latter is better, as DALLE2 is a successor of the first model. Artem.G (talk) 08:10, 24 May 2022 (UTC)

New sources on Dall-E Mini and related projects

Latest comment: 2 years ago1 comment1 person in discussion

Lizardcreator (talk) 02:24, 15 June 2022 (UTC)

Undue weight

Latest comment: 2 years ago1 comment1 person in discussion

There is an undue weight given to open-source models that try to imitate dall-e (for example dall-e flow wasn't mentioned anywhere as anything notable);

and an undue weight to the "hidden language" developed by the model. This is just to recent to include, as very few reliable sources can be cited. The claim of the "language" is very strong one, and it's too recent to include into an encyclopedic article right now.

I think both sections should be trimmed, but it should be discussed before. Artem.G (talk) 13:19, 22 June 2022 (UTC)

Article rewrite

Latest comment: 1 year ago6 comments3 people in discussion

I've BOLDly rewritten and reshuffled most of the article, including removing a large amount of WP:SYNTH and miscellaneous other poor organisation, along with drastically slashing the weight of the open source implementaitons. I'd welcome any comments people have about these changes. BrigadierG (talk) 16:00, 18 July 2022 (UTC)

I'll be honest, I kind of half-assed this article when I wrote it in January 2021, and I certainly haven't been keeping it up to date since then. I think that the open-source implementations are mostly not relevant, and a lot of the stuff that went out was not very good. On the other hand -- I see you're in the middle of rewriting, so I don't know if this stuff is going anywhere or if it's just being removed entirely, but it looks like you removed some stuff about the actual implementation of the model (such as CLIP was trained to predict which caption (out of a "random selection" of 32,768 possible captions) was most appropriate for an image, allowing it to subsequently identify objects in images outside its training set). If anything, the original article wasn't very good because I skimmed over many details of how the model worked (mostly because I hadn't bothered to read the actual paper yet, lol). Anyway, I will shut up for a bit and wait to see where you're going with this before I form a whole opinion about it. jp×g 18:51, 19 July 2022 (UTC)

My first pass was to kill the UNDUE material and improve article structure, my second pass is to bring the article up to date with more detail. BrigadierG (talk) 10:29, 20 July 2022 (UTC)

@BrigadierG: Thanks for taking this on, in particular for cutting out the editorializing and for spotting that faked citation.

However, something seems to have gone wrong with the citations in this "squashing" - unless I'm overlooking something, [1] doesn't make any claims about anatomical diagrams, X-ray images, mathematical proofs, or blueprints. Regards, HaeB (talk) 08:43, 28 July 2022 (UTC)

Excellent spot. I'll be honest, I didn't read that source, the claim seemed to *so obviously* match the source that I didn't bother verifying. My mistake, great job. I get it, I get it, WP:AGF, but by god, are people just inventing things? BrigadierG (talk) 13:39, 28 July 2022 (UTC)

Thanks, I have removed it accordingly. It's probably worth checking various other citations too. Regards, HaeB (talk) 03:35, 31 July 2022 (UTC)

Image examples

Latest comment: 1 year ago6 comments4 people in discussion

If it helps alleviate NOT GALLERY concerns, perhaps we can agree on a few good examples of DALL-E images to be featured independently alongside the prose, rather than a dedicated gallery section?

I don't like content disputes, so I'm happy with a compromise here, but it would be a loss not to represent the product with at least a few samples. I have no preference as to which examples. ASUKITE 14:43, 31 July 2022 (UTC)

We should only be listing images that have something more of note than simply "this is interesting" (along with the rest of WP:ATA). The test for inclusion I think should be as follows:

1. Has the image in question been cited by OpenAI or a WP:RS as displaying a significant capability of DALL-E.

2. Is that significant capability better covered or covered more widely in RS by an image already included in the article?

Note that while artists pages may include a significant number of their works, they are not present in isolation - they show a key part of that artist's life or style. That's what distinguishes artistic commentary from WP:NOTGALLERY. BrigadierG (talk) 15:57, 31 July 2022 (UTC)

Thanks. Art isn't usually my topic of choice. I'll see if I can pick a couple of decent samples at some point, now that the novelty of getting access to the beta has passed somewhat. ASUKITE 19:08, 31 July 2022 (UTC)

I think a small gallery would be really useful for people - there is a discussion of what the model can and can not do, and showing more than one picture in the infobox will demonstrate the capabilities the model have. Artem.G (talk) 09:57, 2 August 2022 (UTC)

I think a gallery of varying examples would be a great idea. There is a precedent for this as the Japanese page for Stable Diffusion features a gallery of various different styles that the program can generate. Camdoodlebop (talk) 00:49, 11 September 2022 (UTC)

I've given it a second thought and I've changed my mind I do not think an image gallery is necessary for this page, count me in as a no. One example should be enough I think. Camdoodlebop (talk) 23:10, 11 September 2022 (UTC)

No mention of Raven's Matrices

Latest comment: 1 year ago2 comments2 people in discussion

There is no mention of Raven's Matrices in the referenced article (https://en.wikipedia.org/wiki/DALL-E#cite_note-dale-25) If someone could find a better reference please do. AcuteTriceratops (talk) 01:28, 12 August 2022 (UTC)

The source links to DALL-E's blog post, which explicitly mentions Raven's Matrices. I've added that as a source and removed the dubious tab. BrigadierG (talk) 20:44, 13 August 2022 (UTC)

How exactly is DALL-E an "implementation of GPT-3"?

Latest comment: 1 year ago1 comment1 person in discussion

The article's explanation of how DALL-E works currently begins by dwelling on GPT, and then states that

DALL-E's model is a multimodal implementation of GPT-3^[1] with 12 billion parameters^[2] which "swaps text for pixels", trained on text-image pairs from the Internet.^[3]

References

^ Tamkin, Alex; Brundage, Miles; Clark, Jack; Ganguli, Deep (2021). "Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models". arXiv:2102.02503 [cs.CL].
^ Johnson, Khari (5 January 2021). "OpenAI debuts DALL-E for generating images from text". VentureBeat. Archived from the original on 5 January 2021. Retrieved 5 January 2021.
^ Heaven, Will Douglas (5 January 2021). "This avocado armchair could be the future of AI". MIT Technology Review. Retrieved 5 January 2021.

Even granting some simplification for a general audience, this sentence seems highly misleading:

1. The "swaps text for pixels" part is misrepresenting the cited source, where "by swapping text for pixels" links to OpenAI's post about "Image GPT", a different model which is merely mentioned as a motivating idea for DALL-E. Image GPT did indeed simply swap text for pixels in a sense:

"Transformer models like BERT and GPT-2 are domain agnostic, meaning that they can be directly applied to 1-D sequences of any form. [...] we train GPT-2 on images unrolled into long sequences of pixels, which we call iGPT"

(Note by the way that that was based on GPT-2, not GPT-3.) But as even OpenAI's initial DALL-E announcement mentioned, far from using that simplistic approach of feeding pixel values directly into the transformer, DALL-E involved a much more sophisticated representation of the image, which presumably was an essential part of its success and was not trivial to construct:

"Similar to VQVAE, each image is compressed to a 32x32 grid of discrete latent codes using a discrete VAE that we pretrained using a continuous relaxation. We found that training using the relaxation obviates the need for an explicit codebook, EMA loss, or tricks like dead code revival, and can scale up to large vocabulary sizes."

(This is also the model part which they released as the "DALL-E" PyTorch package.)

2. The other two cited refs do indeed say that DALL-E "is a multimodal version of GPT-3" (a footnote in the "Understanding ..." preprint which looks like a late addition) or "uses a 12-billion parameter version of GPT-3" (Venturebeat). But these appear to be merely based on the first sentence from OpenAI's initial announcement from January 5, 2021. Such public-facing blog posts and press releases are rarely the most reliable sources about research results, especially compared to published academic papers. ML YouTuber Yannic Kilcher already gently mocked that claim in a January 6 reaction video based on what was known at that point ("they say it's a 12 billion parameter version of GPT-3 ...you know, it's more like not GPT-3, that was more than 10 times larger"). And the actual DALL-E paper ("Zero-Shot Text-to-Image Generation", which came out more than seven weeks after OpenAI's announcement post and was retroactively linked in it) does not cite the GPT-3 paper ("Language Models are Few-Shot Learners"). Actually, it doesn't seem to contain the term "GPT" at all. That would be extremely unusual, to say the least, if DALL-E was really "a multimodal implementation of GPT-3".

3. Lastly, the sentence is also misleading in the sense that the CLIP model (for selecting the best outputs of the VAE+transformer model) appears to be an essential part of what has been announced and discussed as DALL-E (and hence already makes up a large part of the "Technology" section of this article, as it should).

This article still receives thousands of pageviews per day and our readers really deserve better. I may try to fix and expand this section myself a bit later. (It also only has a single sentence on how DALL-E 2 works; even though there are presumably important differences considering that it is based on a diffusion model.) But I'm not an expert on this topic and have only started to read more about it, so others who are familiar with the matter should feel free to jump in. Regards, HaeB (talk) 10:39, 27 September 2022 (UTC)

Are resulting images suitable for uploading to the commons?

Latest comment: 1 year ago4 comments4 people in discussion

If an image is generated by DALL E 2, based on a prose description provided by a Wikipedian, is that resulting image eligible to be uploaded to Wikimedia commons or is there some copyright issues preventing its use? Thanks! Lbeaumont (talk) 20:15, 22 October 2022 (UTC)

Yes, Commons has a category at https://commons.wikimedia.org/wiki/Category:DALL-E and seems to consider DALL-E output to fall under their {{PD-algorithm}} template. --Belbury (talk) 22:22, 22 October 2022 (UTC)

One point to note is that the PD-algorithm license does require the image to have been produced in the United States. Other countries, like the United Kingdom for example, do not have the same AI copyright laws. ––FormalDude (talk) 00:04, 23 October 2022 (UTC)

@Lbeaumont: See in particular the previous deletion discussions listed at c:Category_talk:DALL-E#Relevant deletion discussions. A deletion on copyright grounds was twice rejected (without country-specific considerations), but some images have been deleted as being unused/out of scope for Commons. Regards, HaeB (talk) 06:57, 25 October 2022 (UTC)

The attribution of an anthropomorphic feature to the software.

Latest comment: 1 year ago4 comments4 people in discussion

The article presents the software as a being rather than a tool.

Vocabulary show this: "An image generated by DALL-E 2"...

I propose, for example, to rewrite as: "An image generated with DALL-E 2"

This distinction is important because the status given to this type of software can harm the image that humans have of themselves and can participate in the development of sectarian currents on the adoration of these technologies. (example: https://www.wired.com/story/anthony-levandowski-artificial-intelligence-religion/) DDiederichsen (talk) 21:16, 30 December 2022 (UTC)

I don't know about sectarian currents of adoration, but the OpenAI website itself seems to have - possibly very recently? - switched from crediting images as "generated by" to "created with".

Moving towards considering these images to be created by humans using DALL-E as a tool may be at odds with Wikimedia Commons' view that all such images are "in the public domain because, as the work of a computer algorithm or artificial intelligence, it has no human author in whom copyright is vested". Belbury (talk) 15:18, 3 January 2023 (UTC)

It seems like pretty normal English to say that a thing has been done "by" a machine.

Off the top of my head I tried Pillars of Creation, and there are many images that are described as having been created "by" a machine.

I'm not sure what a "sectarian current of adoration" is, but does it also apply to telescopes? ApLundell (talk) 15:37, 3 January 2023 (UTC)

too complicated Wik1234569 (talk) 16:07, 13 March 2023 (UTC)

A new model of DALL-E is being released eventually

Latest comment: 1 year ago1 comment1 person in discussion

I don't have access but some people do who had early access to Dall e 2 and I saw a youtube video about it

I think the old one looks better but when this gets officially released to the public would it get added Wik1234569 (talk) 04:31, 9 March 2023 (UTC)

[impact-1] Tamkin, Alex; Brundage, Miles; Clark, Jack; Ganguli, Deep (2021). "Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models". arXiv:2102.02503 [cs.CL].

[vb-2] Johnson, Khari (5 January 2021). "OpenAI debuts DALL-E for generating images from text". VentureBeat. Archived from the original on 5 January 2021. Retrieved 5 January 2021.

[mittr-3] Heaven, Will Douglas (5 January 2021). "This avocado armchair could be the future of AI". MIT Technology Review. Retrieved 5 January 2021.

[1]

[2]

[3]