AI won’t take over the (art) world

An aged photograph of a snail playing the kazoo. A landscape painting of a giant robot destroying Los Angeles painted by Van Gogh. A Minecraft rendering of a guy riding a capybara. If you’ve been spending time on the Internet lately, you may have seen a scene like this depicted in shockingly-realistic, algorithmically-generated “art”—the handiwork of a multi-billion-parameter AI model aptly named DALL-E. Simply provide the machine with a description, and it will respond with a high-resolution image of whatever you dream up.

Image generated from the caption "Teddy bears mixing sparkling chemicals as mad scientists as a 1990s Saturday morning cartoon."

So says OpenAI, the Microsoft-backed company behind the curtain of this creative entity, which is now in version 2.0. OpenAI plans to release the model as a tool for artists to quickly mock up their own ideas and experiment with different variations of their vision. However, like many recent advances in AI, DALL-E 2 has been met with criticism and concern. The model seems to replicate stereotypes and biases it learned from the data it was trained on: it shows users male “doctors” and female “nurses,” heterosexual “weddings,” and Western “building interiors” by default. Others have highlighted the tool’s potential to be abused to generate “fake news” and accelerate the downfall of democracy. Some have gone so far as to accuse DALL-E of being too smart: so effective and so creative that it will soon render artists—and art itself—obsolete.

Some of these concerns should be taken seriously. But we must be mindful of which ones we focus our collective attention on, lest we play into Big Tech’s hands. For even critical narratives can feed—and feed on—the AI hype machine that keeps them in business. And this hype machine is full of false promises of human-level intelligence.

Machine learning, the data-driven force behind the last decade’s colossal advances in AI, is not magic. As its name suggests, the underlying principle of this technology is the idea that we can “teach” an algorithm how to predict the future through exposure to the past. With the explosion in available data and processing power available to Silicon Valley’s most powerful players, modern algorithms can get a crash-course in a wide range of problems at a massive scale; DALL-E was forced to “study” over 650 million images as part of its training.

This anthropomorphic vision of AI, however, lends itself to exaggeration, and it is ultimately an imprecise metaphor. So-called “neural networks,” the high-dimensional functions that have made such good use of the computing resources Big Tech has given them, are not a real model of how our own nervous systems work. They are not sentient, and they are not autonomous: the humans building the technology are the real brains behind the operation. Engineers and researchers—real people—provide the tools the machine has at its disposal to build its solution, control the flow of curated information that allows it to self-evaluate and improve, and hand-pick the designs which perform the best according to their own judgment.

DALL-E 2, for example, is a particular Frankenstein of algorithmic parts that each do something precise and were chosen for specific reasons. One part is a function that encodes the user’s query into an image-caption pair; this part is built using Contrastive Language-Image Pre-training, a method which has been perfected and fine-tuned by many different researchers in recent years. The other two parts are also translation functions, but they belong to a family of efficient statistical functions called diffusion models, optimized jointly so that they play well together. Each optimization procedure is conducted and evaluated with data chosen specifically for this task.

Diagram explaining the different components of DALL-E 2's architecture

You need not understand any of that for it to illustrate my point: DALL-E 2, like other state-of-the-art models, is a well-oiled machine hand-crafted to do its job (visual translation). It cannot find humor in a surprising juxtaposition, tap into the cultural zeitgeist, or communicate a novel idea—these tasks are, and always will be, best left to real artists. While DALL-E’s technical abilities are startling, art has never (just) been about accuracy—just ask anyone who continued to produce paintings when the camera was invented.

There is still so much scientists don’t know about how human intelligence actually works. For this reason, it’s difficult to prove that a neural network cannot exhibit properties like “creativity” by someone’s definition. But that doesn’t mean we need to worry about the collapse of human cultural institutions. If you want to worry about AI, worry about the actions of those who build and use it.

Related