The European Parliament has approved the AI Act (voted version here), setting in motion its publication in the official journal in the next few months (given further edits). While full implementation will take place between 2025 and 2026, we will start seeing its effects sooner rather than later as AI companies and other stakeholders start to put in place implementation strategies. The AI Act sets out a comprehensive regime that regulates some forms of artificial intelligence, although some people argue that it has been watered down and filled with exceptions that will make it mostly useless. But what about copyright?

The first thing worthy of note is that the AI Act was not initially intended to deal with copyright issues, this was supposed to be a regulation designed to classify various artificial intelligence technologies into several risk categories, going from unacceptable risk, to minimal or no risk systems. However, the rise of generative AI during 2022, and particularly the deployment of LLMs such as ChatGPT, moved drafters into at least trying to address it. As it was included at a relatively late stage of the negotiations, the resulting provisions can be considered to be relatively weak, and certainly do not meet the calls for more stringent changes to copyright law.

Generative AI is not fully defined as such in the act, but it is specified that models capable of generating content such as text and images falls under the category of a General Purpose AI models (GPAI). These are defined in Art 3 (63) as “an AI model, including where such an AI model is trained with a large amount of data using self-supervision at scale, that displays significant generality and is capable of competently performing a wide range of distinct tasks regardless of the way the model is placed on the market and that can be integrated into a variety of downstream systems or applications”. So far so broad, but what is important is that it imposes specific obligations for GPAI providers, as well as further requirement for GPAIs with systemic risks, this is assessed through appropriate technical analysis and methodologies, but there’s also a presumption of systemic risk if the model’s floating point operations (FLOPs) is greater than 10^25. That is a lot of computing power.

The main provision for GPAI models regarding copyright can be found in Art 53, under the obligations for providers of GPAI models. This imposes a transparency obligation that include the following:

  1. Draw up and keep up-to-date technical documentation about the model’s training. This should include, amongst others, its purpose, the computational power it consumes, and details about the data used in training.
  2. Draw up and keep up-to-date technical documentation for providers adopting the model. This documentation should enable providers to comprehend the model’s limitations while respecting trade secrets and other intellectual property rights. It can encompass a range of technical data, including the model’s interaction with hardware and software not included in the model itself.
  3. “put in place a policy to respect Union copyright law in particular to identify and respect, including through state of the art technologies, the reservations of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790”.
  4. “draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office”.

c) and d) are of particular interest for copyright purposes.Paragraph (c) is derived from Article 4 of the DSM Directive, which establishes an exception to copyright for Text and Data Mining (TDM) for non-scientific or non-research purposes, distinct from the scientific or research purposes covered under Article 3 of the same directive. This exception is conditional on respecting any opt-outs specified by copyright holders. This development is particularly noteworthy as it significantly clarifies the scope of Article 4. Firstly, the requirement to establish policies that respect copyright essentially serves as a reminder to abide by existing laws. More crucially, however, providers are mandated to implement technologies enabling them to honour copyright holders’ opt-outs. This is due to Article 4 introducing a framework for utilising technological tools to manage opt-outs and rights reservations, good news for the providers of such technologies. Additionally, it now appears unequivocally clear that the exceptions for TDM in the DSM Directive include AI training, as it is specified in the AI Act. The need for clarification is needed because there were some doubts that TDM covered AI training, but its inclusion in a legal framework specifically addressing AI training suggests that the TDM exception indeed covers it. Moreover, recital 105 lays to rest all doubts when it explains the reach of text and data mining:

“The development and training of such models require access to vast amounts of text, images, videos, and other data. Text and data
mining techniques may be used extensively in this context for the retrieval and analysis of such content, which may be protected by copyright and related rights.”

Paragraph d) is also interesting as GPAI models will have to provide a detailed summary of the content used to train them. The reaches of this summary will be subject of templates given by the new AI Office, but recital 107 contains an explanation of what it implies:

“In order to increase transparency on the data that is used in the pre-training and training of general-purpose AI models, including text and data protected by copyright law, it is adequate that providers of such models draw up and make publicly available a sufficiently detailed summary of the content used for training the general-purpose model. While taking into due account the need to protect trade secrets and confidential business information, this summary should be generally comprehensive in its scope instead of technically detailed to facilitate parties with legitimate interests, including copyright holders, to exercise and enforce their rights under Union law, for example by listing the main data collections or sets that went into training the model, such as large private or public databases or data archives, and by providing a narrative explanation about other data sources used. It is appropriate for the AI Office to provide a template for the summary, which should be simple, effective, and allow the provider to provide the required summary in narrative form.” [Highlight mine].

This provision is poised to spark controversy but stands as the most potent copyright-related clause in the Act. Copyright holders are likely to welcome this development, whereas GPAI providers might harbour concerns. A recurring theme in ongoing copyright infringement cases has been the use of training content disclosure by plaintiffs, those who have disclosed training data have tended to be on the receiving end of suits. However, I believe this might not necessarily be a drawback; transparency could prove beneficial, particularly in light of the stipulations in paragraph (c).

Furthermore, an essential aspect of the GPAI transparency requirement is the exception for open source models. These are defined as models “made available to the public under a license that permits free access, use, modification, and distribution, and requires public disclosure of the model’s parameters, including weights, architectural details, and usage information”. However, this exemption does not apply to GPAIs posing systemic risks. Consequently, there is an incentive to release models as open source, but merely labelling a model as “open source” is insufficient; it must be distributed under an actual FOSS license. This raises a question for open source enthusiasts: does “FOSS license” refer to any of the numerous self-styled FOSS licenses, or is it more narrowly defined, perhaps aligning with licenses approved by the OSI?

Another exception is for models that have been trained for non-professional or scientific research purposes, this in accordance to Art 2 (6), and also it’s explained in recitals 104. This is in line with existing copyright provisions in Art 3 DSM, but it may open the potential for commercial providers relying on models trained by research institutions. The wording of recital 107 however suggests that a change in the purpose of a model would eliminate this exception, but the wording is a bit vague, perhaps on purpose.

Finally, while not directly related to copyright, Art 50 contains a requirement for some AI generated content to be labelled using machine-readable format as having been generated by a computer, but this requirement is for content that is considered a deep fake, that is “AI-generated or manipulated image, audio or video content that resembles existing persons, objects, places or other entities or events and would falsely appear to a person to be authentic or truthful manipulated.” There is however, an exception for parody or humour. The importance for this provision for copyright is that it creates a way of identifying content that has been AI generated, which could have an effect for its copyrightability in jurisdictions that may impose restrictions on AI authorship due to human author requirements.

Concluding

While the AI Act already has many critics, I think that it’s a step in the right direction, and it is likely to launch a compliance exercise around Europe the likes of which we haven’t seen since the GDPR. While the transparency obligations are relatively mild, they provide copyright holders with further tools to uncover potential infringement, and at the same times it also gives GPAI providers with a bit more certainty that they will not be sued out of existence if they comply with the rules. The EU is trying to balance the rights of copyright holders and the interests of AI developers. We’ll see how the balancing act works.

I did enjoy reading the AI Act though, I didn’t even need to ask ChatGPT to summarise it for me. I also really like the acronym GPAI, it reminds me of “yippee-ki-yay”, it always comes down to Europe v US, doesn’t it?

[Edit: Made some corrections to reflect the correct numbering and version, thanks to Paul Keller for the heads-up]


6 Comments

Avatar

Klaus Krebs · May 23, 2024 at 8:24 am

Your blog post discusses the European Parliament’s approval of the AI Act, which will soon be published and gradually implemented between 2025 and 2026. The Act introduces a comprehensive regulatory framework for AI technologies, including provisions addressing copyright issues related to generative AI. Specifically, Article 53 imposes transparency obligations on providers of General Purpose AI models (GPAIs), requiring them to maintain detailed documentation and respect Union copyright laws.
Additionally, GPAI models must provide a summary of the training content used, which aims to enhance transparency and help copyright holders enforce their rights.

Given these new regulations, how do you anticipate AI developers will balance compliance with the AI Act and innovation in AI technology? Any ideas you may share are welcome -thanks!

Mesterséges intelligencia (MI) hírek, információk – 2024. március 18. | Magyary Zoltán E-közigazgatástudományi Egyesület · March 18, 2024 at 4:43 am

[…] MI-rendszerek fejlesztőinek érdekei között. Meglátjuk, hogyan sikerül-e az egyensúlyozás. The EU AI Act and copyright; Andres Guadamuz; TechnoLLama; 2024. március […]

Systemic Risk and Copyright in the EU AI Act - Truth on the Market · March 19, 2024 at 7:14 pm

[…] copyright-related obligations on GPAI providers. As Andres Guadamuz of the University of Sussex notes in his […]

Copyright and copywriting: AI challenges - Internet for Lawyers Newsletter · April 9, 2024 at 9:00 am

[…] The EU AI Act and copyright – TechnoLlama […]

Copyright Dispute of AI: a Creator’s Muse or a Stealthy Robber? – Digital Policy and Governance Blog · April 14, 2024 at 7:28 am

[…] Guadamuz, A. (2024, March 19). The EU AI Act and copyright. TechnoLlama. https://www.technollama.co.uk/the-eu-ai-act-and-copyright […]

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.