Is the act of learning from a few images information analysis?

2022-09-21 Summary

Masayuki Matsuda (ed.), Commentary on the Copyright Act, Supplemental Volume, Commentary on the 2040 Amendment (Keiso Shobo, 2022) in charge of Masashi Sawada, p. 15.
- “Not necessarily analyze a large number of works.”
- The “sounds that make up the two pieces of music” also constitutes a large amount of information.
Even when a small number of images are used for learning, they fall under the category of “information analysis” under Article 30-4 of the Copyright Act, and the Act may be applied.

2022-09-01 Facebook

In the discussion of fine tuning, I saw the claim that fine tuning is not information analysis because it does not use numerous works. I checked and found that “information analysis” is defined in the second paragraph of Article 30-4 as “extracting, comparing, classifying, or otherwise analyzing information pertaining to language, sound, images, or other elements constituting said information from a large number of works or other large amounts of information. So, the argument is that fine tuning with a few images of a single author is not “information analysis” because it does not satisfy the “many” requirement. I certainly agree~.

2022-09-02

@nishio: @tka0120 I read your article on automatic image generation AI and copyright. 3.1.1(1) “In principle, copyrighted works may be freely used to the extent necessary to generate AI software” is correct, but I am concerned that readers may misinterpret it. You limit this to “the case where an automatic image generation AI is created by collecting a large number of photos and illustrations existing on the Web and generating a dataset for training”. On the other hand, there is a technology that uses an existing dataset and trains 3 to 5 additional images to acquire a specific style. I believe that if the latter technology is used, the requirement of “from a large number of works or other large amounts of information” in the definition of “information analysis” is not met, and therefore the works cannot be freely used. If the reader interprets “AI software” as extending to anything that uses the latter technology, he/she is likely to reach an incorrect conclusion. ref: https://textual-inversion.github.io

Textual Inversion

Midjourney, Stable Diffusion, mimic and other image auto-generating AIs and copyright (Part 2) | STORIA Law Firm Image Generating AI and Copyright

The problem with mimic, too, was not so much prior learning with a large amount of data, but with a dozen or so images to learn “style”. The problem with mimic, too, was that it used only a dozen or so images to learn “style” rather than a large amount of data for preliminary learning. Is Article 30-4 of the Copyright Law applicable to the use of such a small number of images?

The question is whether the act of learning from a small number of images falls under the definition of “information analysis” in item 2… A “large number of works” is just one example of a “large amount of information. Therefore, even if you do not use “a large number of works”, if you use “a large amount of information”, it will fall under the category of “information analysis”. Then, when considering the use of a small number of images for learning, if we focus on “each image”, it certainly does not fall under the category of “numerous works”, but if we focus on “the information contained in each such image (contents of individual pixels, positional relationship of pixels, etc.)”, it could be considered “a large amount of information”. However, if we focus on “information contained in each image (contents of each pixel, positional relationship of pixels, etc.),” I think it is possible to fall under the category of “large amount of information. Therefore, the conclusion is that even when a small number of images are used for learning, they fall under the category of “information analysis” under Article 30-4 of the Copyright Act, and the Act may be applied.

@OKMRKJ: For reference, Masayuki Matsuda (ed.), “Copyright Law Commentar Supplement, Commentary on the Revision of Copyright Law in 2008 and 2020” (Keiso Shobo, 2022) in charge of Masashi Sawada, p. 15, In addition to stating that “a large number of works need not necessarily be analyzed,” it also states that “the ‘sounds’ constituting two pieces of music” also constitutes a large amount of information.

This page is auto-translated from /nishio/少数の画像から学習する行為は情報解析か using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.

🪴 Quartz 4.0

Is the act of learning from a few images information analysis?

Graph View

Backlinks