In 2011, I discovered my own art on someone else's Tumblr page, posted and reposted thousands of times without any credit. I did not lament that my art had gone on a digital adventure of sorts. I only wished my name could have been a part of it, or perhaps a hyperlink could have been embedded in the images to lead people back to my portfolio. But as I and countless other artists have discovered the hard way, to assume that people will make an effort to ensure artists retain attribution in the chaotic digital landscape of the 21st century is naïve.
The solution to this problem is simple: everything on the internet, including works by artists, should be scraped by data-harvesting programmes. Everything. This assertion is not some anarchic provocation, but rather borne from a simple fact: anything that is available as a source of knowledge or inspiration for an individual is now also available to a machine.
So much publicly displayed art has a greater digital footprint than its creator or the public realise. When scraped from an artist’s website or other source, a work of art becomes part of an artificial intelligence (AI) programme or algorithm’s dataset, with the relevant metadata (attribution, material and technical specifications) intact. Left to, say, an overeager blogger who likely has no malevolent intent but whose favoured mode of aesthetic appreciation is to right click and select “save”, this information is often lost.
“AI image products are trained on vast numbers of copyrighted images without consent, credit or compensation and violate the rights of millions of artists,” states a class action lawsuit ten artists have brought against Stability AI, DeviantArt, Midjourney and Runway. In August, a judge in California federal court dismissed some of the artists’ claims and allowed other charges to be added, ultimately clearing the way for the case to proceed.
However, these companies are not engaged in art creation, nor do they have autonomous robots making art and attempting to claim copyright over the resulting works. It is users—people—not AI that can re-generate the works of others, or make something in the specific style of an artist without giving them credit. The lawsuit claims that these companies sell “copyright infringement as a service”, but until the datasets are transformed into output by individual users of generative AI, are there any laws being broken?
Drops in a billion-image bucket
If the plaintiffs’ argument hinges on the non-consensual downloading of images into the dataset, that may very well prove to be illegal. Conversely, a site like Tumblr hosting an artist’s work may not necessarily specify terms of use as explicitly as an artist’s personal website, and thus a user downloading a Tumblr post may be in the clear, legally speaking. But in a situation where a specific work is scraped into a dataset with billions of other images, the argument that individual rights are being violated appears flimsy.
“An AI image product is a software product designed to output images through so-called artificial-intelligence techniques. But ‘artificial intelligence’ is a misnomer,” the artists’ lawsuit begins. “The AI image products at issue in this complaint are all built around the same asset: human intelligence and creative expression, in the form of billions of artworks copied from the internet. An AI image product simply divorces these artworks from the artists and attaches a new price tag. The profits from the misappropriation of these works can then flow directly into defendants’ pockets. But the artists who provided the intelligence and creativity—including plaintiffs—were not asked for their consent. They were not given any credit. And they have not received one cent in compensation.”
This claim betrays the lawsuit’s flaws. The datasets of scraped imagery are so gargantuan in scale—”billions of artworks”—and scraped so indiscriminately, that as far as generic generation goes, that is to say the kinds of prompts where no artist is named, there is a certain kind of vanity in insisting that any artist is deprived of personal remuneration. And in a world where artists or their estates have their work removed from these datasets, such referential generation does not necessarily vanish.
If an artist “opts out” of AI participation, there are still enough screenshots, unattributed Tumblr, X and Reddit posts, archived websites and selfies where a work of art is in the background for their work to exist online. In fact, it seems safe to say that virtually every artwork ever displayed within the orbit of a smartphone user or included in a published catalogue will eventually exist on the internet in some form. Consequently, opting in may ultimately be the safer move, if only because it makes tracking one’s digital footprint easier.
‘Make me a Dalí’
Most AI image generators are now capable of understanding the difference between a work by Salvador Dalí and a work by Giorgio de Chirico. If you want an original work in the style of someone specific, it can generate it. This is where things get murky.
The ability to reference specific artists and styles is a crucial part of any creative process. Most artists do it unconsciously, but our output is inevitably shaped by what we observe. Mimicking or attempting to re-create earlier generations’ masterpieces is also a core part of classical art education. But a machine’s ability to mimic Dalí or De Chirico far exceeds that of an individual painter. So then the interesting question becomes: is efficiency a crime, or is there an efficiency threshold for criminality? Would that mean that a painter mimicking an artist is less culpable simply because they are bound by the limitations of human biology?
The artists’ lawsuit notes that Midjourney encourages its users to refer to artists in their prompts for more efficient generations. Prompt-writing is a craft, and a tricky one at that. My idea of a photograph depicting a beautiful Turkish bath in which a handsome young man gazes wistfully off into the distance, and that of Midjourney’s, are lightyears apart. If I try to specify that his skin should glisten with sweat, I’m only allowed to have one figure in the generated image. Asking for two men whose skins are glistening with sweat triggers a content warning and a notification that if I insist on attempting to generate adult material, my membership to Midjourney will be at risk.
My solution to this conundrum is to strategically reference existing visual culture in ways that I know Midjourney can understand without triggering the content warning mechanism. A successful prompt typically spans a wide range of artists, from Robert Mapplethorpe to Ara Güler, with a splash of Alejandro Jodorowsky to temper the cooler and somber tonality the other references might cause. The generated images look nothing like works by these artists, but are the calculated, intentional mix that accurately represents my inspirational process. It is crowded enough to respect each artist’s work and legacy. These visual references offer common ground that transcends language, which is a strategic necessity to get the AI model to understand what I am trying to conjure.
The conundrum we all face is both simple and far-reaching: an AI-generated image is most at risk of violating an artist’s copyright if the prompt is specifically and exclusively referencing that artist. But referencing artists is one of the most practical and universal methods of input when dealing with AI today.
Prompt-based image generation tools will likely evolve and artists’ desire to retain ownership of their work is understandable. But the war that is being waged in the name of copyright infringement cannot be won by attacking tech giants. It is the users of their products whose ethics ultimately decide whose work is copied, and to what end.