In January, 2021, the OpenAI consortium — founded by Elon Musk and financially backed by Microsoft — unveiled its most ambitious project to date, the DALL-E machine learning system. This ingenious multimodal AI was capable of generating images (albeit, rather cartoonish ones) based on the attributes described by a user — think “a cat made of sushi” or “an x-ray of a Capybara sitting in a forest.” On Wednesday, the consortium unveiled DALL-E’s next iteration which boasts higher resolution and lower latency than the original.
The first DALL-E (a portmanteau of “Dali,” as in the artist, and “WALL-E,” as in the animated Disney character) could generate images as well as combine multiple images into a collage, provide varying angles of perspective, and even infer elements of an image — such as shadowing effects — from the written description.
“Unlike a 3D rendering engine, whose inputs must be specified unambiguously and in complete detail, DALL·E is often able to ‘fill in the blanks’ when the caption implies that the image must contain a certain detail that is not explicitly stated,” the OpenAI team wrote in 2021.