Data deletion from AI learning models in the context of intellectual property.

15 February 2024 6 minutes Author: Newsman

This article explores effective techniques for data extraction from AI learning models in the context of intellectual property (IP)

Data used to train AI is often sourced from the Internet, such as through web scraping tools. However, some of this data will be protected by copyright or database rights, or both. Without the appropriate license, using it to train an AI system may result in infringement.

Generative AI tools can create beautiful visual effects, write essays, poems, and even books. The potential for creativity with generative AI tools is limitless. However, because generative AI learns with billions of parameters built using software that processes vast archives of images and text, it runs the risk of producing results that violate various intellectual property rights. Additionally, a would-be criminal can use generative AI to learn about their specific area of crime. AI can speed up a criminal’s way of working by providing significant information that can be used to produce counterfeits, copyrighted works and other IP-infringing material, as well as advance other IP-infringing activities such as trademark registration fraud.

Although there are indications that the fair use doctrine may apply to Al’s educational process, copyright holders argue that adequate consent is required to use a copyrighted work for Al’s education. This permission is necessary because training can provide large language models with enough information to generalize current work and achieve comparable results.

The claims cover all types of generative Als, including content generation Als such as ChatGPT, image generation Als such as Midjourney, code generation Als such as GitHub, and others. An artificial intelligence machine creates a modern painting based on a data set consisting of hundreds of modern paintings. The output looks like a modern painting, but does not include any individual An artificial intelligence machine creates a book in the style of a famous author, using a small number of books by the same authors as a data set. The AI machine used algorithms that generated the song based on lyrics and music from existing songs created by many different artists.

The difference between the first two scenarios is whether the resulting output is comparable to a particular copyrighted work and whether it can compete with the copyrighted work. In other words, each outcome must be evaluated on a case-by-case basis to determine whether it is sufficiently transformative. Can a particular work created by AI be considered similar to the original work? The first scenario involves the use of a large number of works in the same style, while the second involves the use of a small number of works by the same author. While the first scenario is unlikely to result in copyright infringement, the second example might. To establish infringement, the copyright owner must prove that the people who organized the output of artificial intelligence took a “substantial part” of their original works.

Although the criterion of similarity has been evaluated in a number of IPR cases, both civil and criminal, case law still needs to address the question of whether a work that is “stylistically” similar to a copyrighted work can be considered substantially similar . For example, if copyright still applies in the case of The Next Rembrandt, the question is whether a claim for copyright infringement can be filed if the intent was to create a work that could be attributed to Rembrandt “stylistically.” Similarly, in the third scenario, individual artists would have to demonstrate substantial similarities between their work and works created by a large number of different artists.

EXAMPLE AI created a song with Drake and The Weekend’s voices TikTok user Ghostwriter977, who also claims to be a songwriter, wrote a song called “Heart on My Sleeve” and used AI to imitate the voices of Drake and The Weekend. The weekend performs this song. The main music of the song was new; only the voices were recognizable as those of two famous performers. The song became very popular until Universal Music Group requested that it be removed from Spotify, Apple Music and other platforms due to copyright infringement. This raised the question of whether the song was indeed infringing copyright.

In general, copyright does not protect the singer’s voice; rather, it protects creative output such as music or lyrics. The legal basis for such treatment would be comparable to that of tribute bands that do not infringe copyright protection.

This question is complicated in the context of generative AI. First, was Drake and The Weekend’s music used to train the AI, and would that be an authorized use of copyrighted content?

Second, is it illegal to use another person’s name, voice, image or likeness without their prior permission? Voice, like biometric data, is protected by Article 4.1 of Regulation (EU) 2016/679 on the protection of individuals with regard to the processing of personal data and on the free movement of such data, as it can be used to identify individuals, is specific to the physiological identity of people and reveals a large amount of personal information. information about the speaker.

Violations of personal data regulations can also be considered a criminal offense

This element is particularly important in criminal cases involving intellectual property. It is necessary to prove that there was an intention to produce infringing material – the work of an independent artist that happens to be similar to the original work does not entail criminal liability. In other words, the courts would have to assess the nature of the use of copyrighted works and its impact on the market. Causal relationship. There must be a causal connection between the copyrighted work and the creation of the infringing work; “copying” must take place, ie

Who is responsible? Determining liability for copyright infringement by an AI system can be complex. To date, AI is not a legal entity and cannot be held responsible for infringement of intellectual property rights. The best approach to determining liability is to check who had it.

Exception to database law

There is a “fair dealing” exception for databases that have been made available to the public (in any way). However, the exception is narrow and unlikely to apply in a commercial context. Database rights in public databases will not be infringed by the fair dealing of a substantial part of its contents, provided that:

  • withdrawal is carried out by a person who is a legal user of the database,
  • it is extracted for illustration purposes for teaching or research and not for any commercial purpose,
  • the source is indicated.

This is a fairly narrow exception that requires lawful access and non-commercial use. Therefore, extracting a significant portion of a publicly available database for use for AI training purposes will not be covered by the exception if the purpose is commercial. This means that the onus is on the potential extractor of content from the database to ensure that it is legitimate.

These legal recommendations were developed by a lawyer in the field of protection of business interests and intellectual property. For professional legal advice, contact via Telegram: @your_legist

Other related articles
Found an error?
If you find an error, take a screenshot and send it to the bot.