A new remuneration right could solve legal and ethical challenges of generative AI.
Music creators love technology that helps them to make music. But when these tools rely on the ingestion of copyrighted musical works to function, a robust legislative framework for fair remunerate must exist.
From the invention of the player piano to the dawn of the internet, advances in technology have resulted in the expansion of rights for music creators. Thanks to these necessary “new” rights, music creators today are remunerated for a host of 20th century advances in music production and exploitation.
Now, in the face of the enormous challenge posed by Gen AI, we believe an additional right of remuneration vested in individual human creators is needed. This new right would be instrumental in providing a sustainable future for our creative community, and in preserving our diverse cultures and identities around the world.
GenAI platforms need to operate ethically and keep track of their use of human-created works. A new right of remuneration, vested in flesh and blood music creators, would be a major step towards a sustainable and equitable use of this revolutionary technology.”
Our white paper, written by Professor Daniel J. Gervais of Vanderbilt Law School and produced in partnership with CIAM, provides a detailed legal analysis of the benefits of a new right of remuneration for Gen AI. It promises to support the sustainability of human authorship of creative works and to solve the legal issues surrounding this exciting new technology.
GenAI Campaign News
The Gen AI White Paper
FTMI joined with CIAM to release a white paper written by Prof. Daniel J. Gervais and titled “The Remuneration of Music Creators for the Use of Their Works by Generative AI”. This document aims to address the challenge of ensuring ongoing compensation for music creators and their industry partners once most of the existing music has been used to train large language models.
- This paper analyses the technology
- It examines how current law applies to Gen AI
- It proposes a possible new right that would equitably address the ethical remuneration issues
The proposed new right is outlined at a fairly high level of generality in order to focus the discussion on its desirability rather than the exact mechanics of its implementation. Unfortunately, the text has to delve into a number of complex legal doctrines that may not be easy to read for readers without legal training. However, the legal analysis section ends with a summary box intended for those readers who may not need to know all the details of these doctrines.
READ THE EXECUTIVE SUMMARY
Generative AI (Gen AI) applications challenge humans on the very terrain that has distinguished us from other species for millennia: our ability to create literary and artistic works to communicate new ideas to one another, whether as works of music, art, literature, or journalism.
We urgently need to find a way to avoid irreparable damage to this crucial facet of human existence – a sine qua non for human progress – an ability that tends to be honed over time by creators who have the time to do so and learn from experience, which often means that they can live off the fruits of their labor. The stated aim of the paper is to find a way for creators to retain agency as their life’s work is taken without their consent to create “content” that can compete with them in the marketplace.
The best way for creators to generate a decent stream of ongoing revenue for the use of their copyrighted works by GenAI applications is to be paid when the datasets used to train GenAI containing their works are used to create new “content”. This should take the form of a license. For this to happen, there must be a right that can be licensed.
From a legal point of view, the discussion revolves around which rights apply to the training (text and data mining) and to the production of literary and artistic works. In almost all cases, the development of a Large Language Model (LLM) implies the creation of at least one copy of the data that the machine uses for its training. This has several advantages, including increased speed of access and the ability to examine and make changes to the dataset. From a copyright perspective, this implies one or more reproductions. In the case of copyrighted works, this means that the right of reproduction has been infringed unless a license has been obtained or a statutory exception applies.
What is often misunderstood is that this reproduction of the copyrighted work continues to exist in modified form (i.e., a second reproduction occurs) in the dataset created during the training process. This second dataset is the one used by the LLM to produce its outputs. It consists of the creation of “tokens” based on the material used for training.
The outputs of an LLM may infringe both the right of reproduction and the right to prepare derivative works, also known as the right of adaptation (and its close cousin, the right of translation). An adaptation includes, for example, a musical arrangement or a film based on a novel. The exact scope of the concept of derivative works in this area is controversial.
Against this background, existing copyright law provides a partial solution for authors and other right holders for four main reasons. First, there are different national exceptions and limitations to copyright rights in relation to text and data mining (TDM) i.e., the “input” or training stage, which delineate what companies producing LLMs can and cannot do without a license. In the United States, where many of the best-known LLMs have been created, there is (and will continue to be for years to come in the absence of a licensing regime) doubt about the scope of fair use in this context.
Second, although the copying that occurs during the training of GenAI systems typically occurs only a few times for each GenAI dataset or LLM model, some major models (such as OpenAI’s) are moving toward the creation of an infrastructure layer, that is, a dataset that can be used by other companies and individual users. This dataset contains, as mentioned above, a complete or partial copy of the material used for training, which implies possible liability for users who make a copy. Nevertheless, the number of copies of copyrighted material used to create the dataset will be limited.
Third, the reproduction right and/or the derivative work right is more easily applied to certain GenAI outputs that are a copy or adaptation of a substantial portion of one or more identifiable pre-existing works in the dataset. If this is true, only a relatively small percentage of GenAI outputs are likely to infringe the reproduction right, the derivative work right, or both. Fourth, as a matter of copyright law, there is no protection per se for a “style” or “sound” (e.g., a person’s distinctive voice), although statutes and various legal doctrines may provide protection against this form of appropriation.
Despite these legal complexities, there is a deep sense among many authors and performers that the creation of datasets containing their tokenized works without consent or compensation is an unfair situation, a misappropriation, for which they expect the law to provide a remedy. Unfortunately, while the law of misappropriation exists, it is not internationally harmonized and is unlikely to be any time soon. There is a related view that anything created using a data corpus containing tokenized copyrighted material is a “derivative” of the dataset, and in a layman’s sense this is the case, since no output would be generated if it weren’t “derived” from the dataset by the GenAI application. Unfortunately, the legal terms “adaptation” and “derivative work” are likely to be interpreted more narrowly by the courts. Rights holders seeking to correct what they perceive to be an injustice will undoubtedly pursue avenues based on existing laws, including copyright, publicity rights, and misappropriation claims. These may lead to settlements for the use of existing material, including compensation for “past sins”.
This paper examines the applicable norms of international copyright law, and considers an additional option, namely the creation of a right of remuneration for creators to compensate for the use of LLMs created using their copyrighted works to produce commercially available “content” that can compete with the material on which the machine was trained. The proposed right should vest in them though it would remain assignable or licensable. For example, when a music streaming service filled a stream with AI-produced music, it would pay for the use of the copyrighted works in the dataset used by its generative AI model. This would be another adaptation of the copyright framework to a major technological change, as copyright has consistently done for more than two centuries. Indeed, it would be strange if copyright did not adapt to what is perhaps the most consequential technological change in history.
To be clear, this proposed solution does not preclude a licensing regime for the reproduction(s) that occur during the TDM process, which is already the subject of litigation in several jurisdictions. What it does is add a clearly defined, ongoing layer of compensation for the benefit of music creators and rights holders for GenAI systems that produce material in competition with the creators of the copyrighted material on which they were trained.
About the author
Daniel Gervais is a professor of law and director of the Intellectual Property Program at Vanderbilt Law School in Nashville, TN. He has spent 10 years researching and addressing policy issues on behalf of the World Trade Organization, the World Intellectual Property Organization, the International Confederation of Societies of Authors and Composers, and the Copyright Clearance Center.
Prof. Gervais is also the author of “The TRIPS Agreement: Drafting History and Analysis“, a leading guide to the treaty that governs international intellectual property rights.