@nishio: some people claim that the DNN parameters are on the scale of hundreds of millions so they can be memorized in full. However, it takes 800,000 dimensions to represent a single 512 x 512 RGB image, and StableDiffusion is learning with 2.3 billion images, so it is off by five orders of magnitude, to put it bluntly.
@nishio: oh, the original tweet is talking about generalization, I misunderstood because it doesnât match the diagram. For example, as an extreme example, if you train a neural net of the same shape as Stable Diffusion by giving it only one image, it will learn it in its entirety and produce that image no matter what prompts you put in. This âIf X, then Yâ probably holds, but in reality, since youâve put in billions of images and trained them and the premise X is false, the proposition is true. If you mean whether the probability is zero or not, of course it is not zero. But it is the probability of starting from a random point in a vast space of over 10,000 dimensions and inadvertently matching an existing pointâŠsince the UUID is 128 bits and this is a 20,000 dimensional float, the probability is about 100 matches of UUIDs, to put it crudely. UUIDs are not uniformly random, so Iâd prefer another example⊠Well, I meant to say that humans use hash values of about 3-digit bits of information with a ânegligible probability of inadvertent collisionâ, and about 2 orders of magnitude above that.
@lempiji: this kind of discomfort is always there and uncompressed memory is impossible, but with billions of data for 800,000 dimensions, itâs almost like zero density. Itâs pretty natural to think that there are some rules for the point of existence of data, and that lossless compression exists as a way to point to that point.
@nishio: JPEG compression of a photo under the condition âdetails that humans donât care much about can be changedâ, which is even better than lossless compression, will shrink by about 10 times, but The current situation is that it does not shrink to about 100 times, so it is quite unreasonable to claim that âthere must be a way to compress 3-4 orders of magnitude more with no lossâ. I agree that the compression of images so far has not used the âmeaning of the imageâ, and by using it, it is possible to compress images with a higher compression ratio than JPEG, âcompression that does not change the meaning to humans muchâ, for example, âany catâ can be compressed to 3 letters of cat. The compression ratio depends on what the user identifies with.
âI drew itâ (using AI) Itâs easy if you are not particular about it and just want to use any girl that looks like she is on a drawing site. The reason is that I, as a user, recognize a wide range of âOKâ. The more I care about detailed attributes, the narrower the OK range becomes. For example, what about the fingers on the left hand of this hand? @nishio: Until now, there have been âPeople who noticeâ like âthe text is not packed correctlyâ or âa few pixels offâ or âthere is a typographical errorâ and people who release without noticing it. This will happen in the future with these illustrations as well. There are two types of people in the world: those who care about whether or not the illustration is done right down to the fingertips, and those who donât. The latter group is the majority. The latter is the majority. NovelAIDiffusion
- Diary 2022-10-09 â Diary 2022-10-10 â Diary 2022-10-11 100 days ago Diary 2022-07-02. 1 year ago Diary 2021-10-10.
This page is auto-translated from /nishio/æ„èš2022-10-10 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iâm very happy to spread my thought to non-Japanese readers.