Snoopy, Mario, Pikachu, and reproduction in generative AI

With the release of Grok 2, Twitter’s latest model, there has been a growing number of images depicting fictional characters as well as celebrities and public personalities on the social media platform, sometimes with hilarious results, and often quite disturbing. This has resulted in a surprising number of people asking whether Twitter is about to get sued for copyright infringement. Leaving aside the strange phenomenon of people cheering on behalf of Disney and Nintendo (as one Twitter user remarked, “the people yearn for the boot”), this is a very interesting conversation and an intriguing question. Now that we’ve had so many lawsuits against generative AI companies, where are the large media conglomerates? When is Disney going to sue? I think that the answer to that question is likely to be “never” for legal, technical and practical reasons.

Inputs, outputs, and reproduction

The growing number of images reproducing characters and people is the result of the prevalence of those characters in the training data. We like to divide the copyright analysis into two phases, the input phase and the output phase, and while this is more complicated than that, for now this distinction will suffice. The input phase encompasses all of the process of training a model, which is achieved by extracting information from large amounts of data; this implies that at some point a copy of a work is made. This is at the heart of most of the ongoing lawsuits against generative AI companies, and that is a different analysis, as there is undoubtedly a copy being made in training. The question will rest on whether this falls under fair use or fair dealing. The output phase is when a user utilises a model to create an image that could be considered to infringe copyright. Popular characters are more likely to be reproduced as an output; this is what has been called “Snoopy problem” by Matthew Sag, and the “Italian plumber problem” by James Grimmelmann and Timothy Lee.

There’s undoubtedly a reproduction taking place in the input phase, but what about the outputs? The obvious answer immediately seems to be a resounding “yes”, and that is what prompts tweets like the ones linked to above. Producing a picture of Mario or Pikachu is copyright infringement, surely!

Leaving aside the input question, what happens with machine learning is that a model is trained when data is extracted from the inputs. These are not copies of the original as such, so outputs are not collages, but this may be irrelevant. A model doesn’t have a copy of every cat, but it learns what a cat looks like. The same happens with some of the most prevalent characters online; a model doesn’t need to keep a copy of Pikachu to know what it looks like, and can make a good reproduction of it on demand, up to the rosy cheeks and cute smile. So the output may not be a direct reproduction, but it could be a reproduction nonetheless.

A reproduction need not be exact under copyright law, but it has to be substantial. So it may not matter that the model doesn’t keep copies of a work; if it can make a substantial reproduction of the work, it may still be considered to be a copy from a copyright perspective. Models can memorise some items in the training data, particularly popular ones as pointed out before. It doesn’t matter how a generative AI tool knows what Mario looks like; it can generate pretty good reproductions of Mario.

Case closed then, right? It’s copyright infringement…. but where are all of the lawsuits from the media companies? What are they waiting for? It’s been possible to reproduce characters such as Superman, Mario, Pikachu, and Snoopy for over a year. Heck, I even included some images in an article which is now over a year old!

Where are all the lawsuits?

As I mentioned above, I think that there are various reasons why there hasn’t been a lawsuit from the media, and I’m going to stick my neck out and predict that there are not likely to be any. I’ll be happy to be proved wrong here, but I sense that there are good reasons for the absence of DC and Disney lawsuits.

I think that the popularity of the characters is precisely what gives the first clue as to why no media company has sued a generative AI provider for character infringement, and this is because most of the training data comes from the Internet and not directly from their own sites and databases. There’s a good reason why AI tools are so good at reproducing some characters, and it is their prevalence in popular culture and social media; their memetic appeal makes them reproduced all over the Web. The Internet is a huge infringement machine in which we share images all the time. If you search for “Mario and Luigi” on Google Images, you’ll find thousands of sites, some of them official, but for the most part they are just users sharing stuff. I can imagine a lawsuit from, say, Nintendo against a generative AI tool that would progress to the stage of finding out where the training data sources come from. You are likely to find that most of the images come from Pinterest, Etsy, Facebook, Twitter, DeviantArt, and Reddit, in which case the question would be asked as to why these copies have not been actioned by the claimants. The answer is simple: you don’t sue your fans, but this could prove to be tricky in trying to determine the damage done to the media giants in the first place.

The reality is that these works are easily reproduced because of their popularity, and their popularity is the reason why they’re prevalent in the training data, thus generating a vicious circle of infringement. Being popular is not a defence against copyright infringement, but it is a reason why these brands tend not to sue everyone sharing their characters. This is what I’d like to call the Pikachu Paradox to complement Snoopy and Mario (sorry to James, Tim, and Matt). I would like to propose that the more popular a character is, the more likely it is to be in the training data from a variety of sources, and so the less likely it is that it will be actioned by the copyright holder. Only time will tell if this holds.

Which brings me to the second reason why I don’t think a lawsuit is forthcoming, and that is precisely because the output infringers are also the same users who are responsible for the prevalence of the works in the training data. So in the image I generated above, I typed “Pikachu and an Italian plumber having a beer” (in honour of Matthew Sag, I also produced a Snoopy image). I tricked the generator to produce an infringing image. So for sure, the model may be infringing in the input phase, something that is being determined by the courts as we speak, but the specific output which may be infringing copyright was generated by my instructions, sometimes circumventing some copyright guardrails put in place by the generative AI tools. In that case, the direct infringer would be me, and the tools would potentially be liable as secondary infringers. And that in itself creates a whole new host of legal issues, I’ve written about this already, as there may be specific defences that could be used by the AI tools in those instances. Long story short, tools are often less likely to get sued for infringement committed by their users.

There is also a legal issue at stake here, and it is the fact that some outputs may actually be infringing, but they may fall under fair use and fair dealing. Take the Pikachu and Mario image above, I would bet a large amount of money that it falls under parody/pastiche in the UK, and while this is a relatively new area of law, to me it’s evident that as an output it would not be actionable (in this case I think I’m also covered by fair dealing for reporting current events, but I digress). Similarly, I think such images would fall under transformative fair use in the US, and maybe even parody under lots of situations. And we go back to the fact that many of these outputs are generated by users for non-commercial purposes, which make it very difficult for them to be litigated.

This brings me to the fourth reason why I do not think that we are likely to see a lawsuit, and it is for purely practical reasons. Copyright lawsuits are expensive, and while some of the cases may seem like a slam dunk, we are talking about lawsuits spanning years and years, often against battle-tested legal teams that have already been party to previous copyright disputes from previous Internet ages. Media companies such as Disney and Nintendo are not in any existential danger (at least for now), unlike printed media like the New York Times. For them, a lengthy and expensive copyright battle may simply not be worthwhile for now.

And the final reason why I think that there are no lawsuits forthcoming is the fact that these media companies are in no way opposed to generative AI, and may actually welcome it in the future as cost-cutting mechanisms. I’d be surprised if generative AI isn’t already being used all over the media, and if it isn’t, it won’t be long until it is, so it may not be in the best interest of the Disneys of this world to try to destroy potential future business partners. At some point, licensing agreements and partnerships will prevail, and we may start seeing more deals being made. So trying to squeeze some money from uncertain litigation may not be in your best business interest if you’re a media company that may want to use the technology in the near future.

Concluding

As always I may be completely wrong and Disney and Nintendo are firing up the lawsuits as we speak. If that happens I will leave this blog post up and unedited as testament to my folly. But I suspect that I may be right here, evidenced by the fact that indeed generative AI tools have not been sued yet by the likes of Disney.

On the meantime, I wonder what Pikachu riding a llama looks like. There’s only one way to find out!

4 Comments

Andrew Ducker · August 22, 2024 at 10:06 am

From Disney’s point of view, I don’t think that “A thing that lets you make pictures of their characters” is something they consider a threat to their business model. If people started selling something that actually competed with them then that might be different.

Andres Guadamuz · August 23, 2024 at 8:21 am

Absolutely.

Lukas Ruthes Gonçalves · September 27, 2024 at 2:59 pm

Professor, I recently read this paper by Martin Senftblen (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4872366) and just wanted to know if you have any thoughts on the way Creative Commons licenses should deal with AI training. Since CC 2.0 (https://creativecommons.org/2004/05/25/announcingandexplainingournew20licenses/) they really emphasized the need for attribution, but none of the current models (MAYBE Perplexity) really link the content to its original creator. Do you think there would be any way to solve this puzzle?

Andres Guadamuz · September 28, 2024 at 10:41 am

Hi Lukas,

My opinion on the subject is here. I think that for the most part the licence requirements aren’t needed. https://www.technollama.co.uk/creative-commons-and-ai-training