Cybozu Labs Study Session 2022-11-11
- In the month since the last Stable Diffusion Study Group on 9/30, there have been intense story developments around image generation AI.
- Iâll recap it in this digest and explain how Imagic and Aesthetic Gradients work in the remaining time.
- 10/3 NovelAI, a provider of novel creation AI services, releases paid image generation AI NovelAIDiffusion
- Animation picture specializing in high quality and noisy
- Capable of learning and generating images with arbitrary aspect ratios, which was not possible with Stable Diffusion
- In the Japanese-speaking world, people became angry because the study source was an unauthorized reproduction site.
- 10/7 NovelAIDiffusion source code and models leaked and shared via Torrent
- 10/12 NovelAI announces that the number of images generated has exceeded 30 million in the first 10 days since its release.
- Roughly speaking, the image is of sales of 3 million yen per day.
- 10/17 NovelAI Prompt Manual âCode of Elementsâ in Chinese
- Sideline evidence that the use of NovelAIâs spill model is major in Chinese-speaking countries.
- 10/18 Imagic is the talk of the town.
- Some say itâs very useful and can be used properly, others say itâs not quite as useful as expected.
- Iâm the latter, but this could be âI just donât understand how to use it wellâ.
- 10/20 Stable Diffusion 1.5 is released by Runway, not by Stability AI, which released 1.4; Stability AI files for temporary removal, but later withdraws it.
- 10/21 Stability AI, (in a big hurry?) Released new VAE, one that improves eye and face decoding
- 10/22 Strange people came to the home of a person who was sending out information related to NovelAI in Japanese, resulting in police action
- 11/3 âNovelAI Aspect Ratio Bucketingâ released under MIT license
NovelAIDiffusion Release
- NovelAI, a provider of novel creation AI services, releases NovelAIDiffusion, a paid image generation AI
- In Stable Diffusion, the prompt was censored at 77 tokens, but in NovelAIDiffusion, it triples to 231 tokens
- Stable Diffusion used to crop the training data into a square, but thanks to NovelAIâs ingenuity, it is now possible to train and generate data in any aspect ratio.
- Unlike university laboratories that aim to publish papers, this is a service of a for-profit company, so details were not disclosed (they were later made public).
- Aspect ratio strongly affects composition. NAI Curated
girl, blue eyes, blue long hair, blue cat ears, chibi
- ![image](https://gyazo.com/ec303056563dd0308f6530af5549d053/thumb/1000)![image](https://gyazo.com/a8a40c57789dea0cd4e523c2ed84999c/thumb/1000)![image](https://gyazo.com/e34a2583abf1105d02ba614f08c2877d/thumb/1000)
- The distribution of pictures generated is severely skewed.
- Prompt "black cat" generates 5 cards and 2 are cat ear girls [[Diary 2022-10-07]].
- ![image](https://gyazo.com/487f8d241846f06d4a34770a344703db/thumb/1000)![image](https://gyazo.com/65e72a194351fed5c17fc59eb07d4961/thumb/1000)
- I almost blew tea when I saw the first one with "Let's just put in 'black cat' for comparison with Stable Diffusion.
- SNS was abuzz with the overwhelming strength against "anime-style women," a field in which the company excels.
- The day's Tweet is recorded here: [https://note.com/yamkaz/n/nbd9a028d625a](https://note.com/yamkaz/n/nbd9a028d625a)
- Most of the Tweets recorded here are "anime style women".
- Specializing and devoting resources to a narrow area of the diverse distribution of "pictures" has led to a watershed in user value in that area.
- In other areas, the expressive power is reduced, but the extended features have stuck with the customer.
- [Blue Ocean Strategy.
- Controversy erupted over the dataset used for the study.
- Data from Danbooru, a service that allows volunteers to tag images and search for images by tag, is used for training.
- Pros and cons (or at least negative opinions were loudly transmitted on the Japanese-language SNS).
- Negative: I would not recommend this hotel to anyone.
- Danbooru is an unauthorized reproduction site and is illegal.
- AI trained on illegal data is evil, it is the enemy.
- This AI is a paid service, any profit made from it is stolen from us.
- Negative: I would not recommend this hotel to anyone.
- By the way, Danbooru itself clearly states the source of the original image and links to it, so it is quite difficult to determine whether this âunauthorized reproductionâ is illegal or not.
-
- It is clearly stated that it was reprinted from Pixiv.
- fair use is a theory fair use - Wikipedia.
- the use of the reproductions will adversely affect the market (including potential markets)â?
- It is difficult to argue that the act of reprinting something that was originally published free of charge with a clear statement of the source is detrimental to the market.
- Relation to the display of a duplicate image cache on Googleâs servers in search results in Google searches.
- If the image is small, it will be âThis is a thumbnail of search results.
- If the image is a direct link, it will say âNo duplication.
- The image is so large that it seems to divide opinions.
- Of course, since this is user-submitted content, some of it may have been uploaded illegally
- (e.g., reprints from digital comics that are not published online).
- However, as long as the service operator is operating in accordance with the Digital Millennium Copyright Act (DMCA), the service operator will not be charged with a crime.
- Notice and Takedown Procedure (DMCA Notice)
-
If the operator of a website is notified that a copyrighted work has been posted on a website by a third party without the permission of the copyright holder, the website operator is exempt from liability for damages if the work is promptly removed (takedown).
- Even though Danbooru will be hated by those who are victims of the reprinting of non-public content, as if Danbooru is to blame, legally Danbooru is not to blame, notis responsibility on the part of the victim.
- 10/5 Danbooru official, the source of the training, released a statement about NovelAI, an automatic illustration generation AI - GIGAZINE
- Roughly âTell NovelAI about AI, we have nothing to do with it. If we have proof that you are the copyright holder, we will agree to remove it.â
- From a DMCA perspective, the burden of proof is on the party claiming unauthorized reproduction, so I would say so.
-
- The world says, âWhatâs wrong with using Danbooru?â in the world.
-
-
/StableDiffusionâŚLAION 5B has Danbooruâs image URL
-
/WaifuDiffusionâŚDanbooru 2021 dataset use clearly stated.
-
/NovelAIâŚstated Danbooru use.
-
Mid JourneyâŚcollaborate with WaifuLabs to use Safebooru-derived data (planned)
-
In other words, everyone is using Danbooru! that means everyone is using Danbooru!
- In other words, the reaction of the Japanese-speaking world to NovelAIâs use of Danbooru is the group polarization phenomenon.
- Opposition shouted louder, so neutral - proponents shut up for fear of damage.
- Heard on multiple channels that âweâre trying, but weâre not disseminating information.â - New Technology and Publication Bias
- Some people have advised me to refrain from expressing logically correct opinions because âeven logically correct opinions can get you tangled up with crazy people.â
- 10/22 regarding house convexity | episode 852 |note
- A case of a strange person coming to the home of a person who was actively disseminating information.
- 10/29 AI painter started but numbers are egregious
-
On twitter, there are many people who say, âI hide pictures with AI tags. Some people say, âAfter all, pictures must be drawn by humans,â but this was just âthe opinion of a vocal minority,â a fact that I felt clearly when I looked at the numbers of my own account on pixiv.
-
- 10/31 pixiv News - Features for handling AI-generated works have been released.
- Pixiv has landed on the âdonât eliminate AI-generated works, but let them segregate themselves with separate rankings.
- Meanwhile, Danbooru has banned the submission of AI works as of confirmation on 11/10.
-
NovelAI Leakage
- 10/7 NovelAIDiffusion source code and models leaked and shared via Torrent
- Only 4 days after release, w
- 10/12 NovelAI announces that the number of images generated has exceeded 30 million in the first 10 days since its release.
- The smallest of the preset sizes is 512x512, and if you produce 4 pieces with default parameters, itâs 20anlas, so about 2000 pieces for $11.
- (The default parameter was changed after this to 16anlas.)
- Itâs probably used for higher resolution and such, so roughly speaking, itâs about a penny a piece.
- Roughly speaking, the image is of sales of 3 million yen per day.
- The smallest of the preset sizes is 512x512, and if you produce 4 pieces with default parameters, itâs 20anlas, so about 2000 pieces for $11.
- 10/17 NovelAI Prompt Manual âCode of Elementsâ in Chinese
- docs
-
-
Easily done Diary 2022-10-17.
-
- This round bracket used for vector emphasis in tokens, does not work for NovelAIâs service.
- Using round brackets in NovelAI is pointless.
- The round brackets are the de facto standard AUTOMATIC1111/stable-diffusion-webui functionality for running Stable Diffusion locally
- In other words, this is a major proof that in Chinese-speaking countries, the local runoff model is used instead of NovelAIâs service.
- The use of leaked models, some people say something like âitâs illegal, so donât do itâ in Japan, but what kind of law does it violate? Iâm not sure.
- In Japanese law, is it Article 2, Paragraph 1, Item 5 of the Unfair Competition Prevention Law?
-
- acquires a trade secret or uses or discloses a trade secret without knowledge with knowledge that an act of wrongful acquisition of a trade secret has intervened or without knowledge due to gross negligence
- I think NovelAI was in Delaware, maybe there is a similar law.
- Well, even if there were, it would be hard to sue Chinese users.
- This round bracket used for vector emphasis in tokens, does not work for NovelAIâs service.
- Using round brackets in NovelAI is pointless.
- Without the spill, the Elements Code would not have been created.
- Time may have to tell if the leak was a bad thing for NovelAI.
Imagic
- 10/18 Imagic is the talk of the town.
-
@AbermanKfir: The combination ofdreambooth and embedding optimization paves the way to new image editing capabilities. Love it. Congrats on this nice work!
- Some say itâs very useful and can be used properly, others say itâs not quite as useful as expected.
- Iâm the latter, but this could be âI just donât understand how to use it wellâ.
- + âa woman wearing black suitâ =
- Understanding Imagic - Hoge Hoge
- The remarkable thing is that âthe face is preserved to the extent that it would not be out of place if people said it was the same person.â
-
80908a8905a2b157f0902fc3a878d9d3d3b5735e|twcon^s1_c10&ref_url=https%3A%2F%2Fnote.com%2Fyamkaz%2Fn%2Fn7a7394323358 @npaka123: I was trying to get my cat Imagic Stable Diffusion, but the cat didnât change and moved to a bedroom-like place.
-
- The cat is saved.
- Imagic 2022-10-31
- Prompt with flower.
- The default strength is 0.9, but that didnât change at all, so I increased it and it flowered.
- /villagepump/2022/10/30#635eaefae2dacc0000f46cc2
- Opinion that it is easier to function in live action /villagepump/2022/11/01#6360b5eee2dacc0000329928.
- Well, surely itâs more amazing that something generated by a NovelAI model is Imagic with a Stable Diffusion model and working properly?
-
Itâs about two orders of magnitude more time-consuming than img2img, but it doesnât maintain the original picture that well.
- Different models, so itâs normal not to maintain them.
- Principles and other stories were added at the end of this presentation.
Stable Diffusion 1.5
- 10/20 Stable Diffusion 1.5 is released by Runway, not by Stability AI, which released 1.4.
- Stability AI applies for temporary removal, but later withdraws
- Iâm guessing it was a mistake on Stability AIâs part to not properly grasp the scope of the rights to the joint research work product.
- The kind of thing where you thought you had exclusive rights, but you didnât.
- On Runwayâs part, itâs reasonable to release it because itâs a chance to raise awareness.
- I think a lot of people are starting to know and be aware of Runway because of this, myself included.
- consideration
- https://note.com/yamkaz/n/n165fa3922570
- Stability AI side wants to promote NSFW countermeasures, but also wants to release models without countermeasures, so Runway released them.
- I think itâs too much to ask.
- Runway is also a private company, so there is no incentive to take on risk.
- If thatâs the purpose, why not just leak it with the same pose as NovelAI, that it was leaked by an anonymous hacker attack?
- 10/21 Stability AI, (in a big hurry?) Released new VAE, one that improves eye and face decoding
- I interpret this as âweâre not ready to release 1.6, but we donât want Runway to stay up-to-date for too long, so letâs release what we can as soon as we can.
- Combining the 1.5 model from Runway with the VAE from Stability AI at hand, âThe facial expressions are so much better!â some people are saying.
- is personally distancing myself from the feeling that âDEPENDENCY HELL is about to startâŚâ
- Runway: AI Magic Tool
- We provide a variety of useful services centered on video editing.
- Infinite Image
- So-called outpainting
- Canât you tell itâs a composite from a distance?
- Specify the area you want to composite.
- Press the generate button to make 4 sheets and choose one.
- He doesnât seem to be very good at cartoon style.
- â
- NovelAI img2img Noise 0 Strength 0.5
- Outpainting does not change the original image (facial expressions and so on).
- img2img is roughly the same, but the details change.
- Erase and Replace
- So-called inpainting
- Probably because it fills in noise Runwayâs inpainting is excellent.
- Tends to make mysterious things appear in the erased area
- Other assortments include object tracking for video and noise reduction for audio.
Technology behind NovelAIDiffusion
- 10/11 NovelAI Improvements on Stable Diffusion | by NovelAI | Oct, 2022 | Medium
- 10/22 The Magic behind NovelAIDiffusion | by NovelAI | Oct, 2022 | Medium
- 11/3 âNovelAI Aspect Ratio Bucketingâ published under MIT license.
- On 10/11, I wrote a technically pointed talk, but the world fundamentally doesnât understand how image generation AI works, and they keep saying things like, âWeâre just patching images from a databaseâ and other bullshit, so I said, âNo, weâre not! I gave a basic explanation on 10/22.
- The Magic Behind NovelAIDiffusion (10/22)
- The original Stable Diffusion was trained on the approximately 150 TB LAION dataset
- Fine tuning with 5.3 million records and 6 TB data set.
- This dataset has detailed text tags
- (This is probably Danbooru origin)
- The model itself is 1.6 GB and can generate images without reference to external data
- The size doesnât change during learning (= so it doesnât remember the image! Iâm just saying)
- The model took three months to learn.
- I donât mean that theyâve had the learning process running for 3 months, but that theyâve developed a human to look at the progress along the way and fix the problems - and then repeat the process.
- The goal is not to write a paper, but to create a good model and make money through service development, so itâs okay to do some human trial and error along the way.
- The model was trained using eight A100 80GB SXM4 cards linked via NVSwitch and a compute node with 1TB of RAM
- Improvement of Stable Diffusion by NovelAI (10/11)
- Use the hidden state of CLIPâs penultimate layer
- penultimate layer is âone layer before the final layerâ
- Stable Diffusion is a mechanism that uses the hidden state of the final layer of CLIPâs transformer-based text encoder for guidance on classifier free guidance
- Imagen (Saharia et al., 2022) uses the hidden state of the penultimate layer for guidance instead of the hidden state of the final layer.
- Discussion in the EleutherAI Discord
- The final layer of CLIP is prepared to be compressed into a small vector for use in similarity searches
- Thatâs why the value changes so rapidly.
- So using that one previous layer might be better for CFGâs purposes.
- experimental results
- Using the information from the layer before the final one in Stable Diffusion, I was able to generate an image that matched the prompt, albeit with slightly less accuracy.
- This is not obvious, because Imagen is not LDM.
- Color leaks are more likely to occur when using values from the final layer
- For example, in âHatsune Miku, red dressâ, the red color of the dress leaks into the color of Mikuâs eyes and hair.
- Using the information from the layer before the final one in Stable Diffusion, I was able to generate an image that matched the prompt, albeit with slightly less accuracy.
- aspect ratio bucket
- Existing image generation models have a problem of creating unnatural cropped images.
- I mean like the lack of a neck in the portrait.
- The problem is that these models are trained to produce square images
- Most training source data is not square
- It is desirable to have squares of the same size when processing in batches, so only the center of the original data is extracted for training.
- Then, for example, the painting of the âknight with crownâ would have its head and legs cut off, and the important crown would be lost.
- This can produce a human being without a head and legs, or a sword without a handle and tip.
- I was trying to create an ancillary service to a novel generating AI service, so this wasnât going to work at all.
- Also, studying âThe Knight with the Crownâ without the crown is not a good idea because of the mismatch between the text and the content
- Tried random crop instead of center crop, but only a slight improvement.
- It is easy to train Stable Diffusion at various resolutions, but if the images are of different sizes, they cannot be grouped into batches, so mini-batch regularization is not possible, and the training becomes unstable.
- Therefore, we have implemented a batch creation method that allows for the same image size within a batch, but different image sizes for each batch.
- Thatâs aspect ratio bucketing.
- To put the algorithm in a nutshell, we have buckets with various aspect ratios, and put the image in the closest aspect ratio.
- I mean, a little bit of discrepancy is fine.
- Random crop for a slight displacement.
- In most cases, less than 32 pixels need to be removed.
- Existing image generation models have a problem of creating unnatural cropped images.
- Triple the number of tokens
- StableDiffusion has up to 77 tokens
- 75 with BOS and EOS
- This is a limitation of CLIP
- So, round up the prompt to 75, 150, or 225, split it into 75 tokens each, run them through CLIP individually, and combine the vectors
- StableDiffusion has up to 77 tokens
- hypernetwork
- Totally unrelated to the method of the same name proposed in 2016 by Ha et al.
- You put a name on it without knowing it and covered it up.
- Techniques used to correct hidden states using small neural nets from multiple points in a larger network
- Can have a greater (and clearer) impact than prompt tuning, and can be attached or detached as a module
- This means that the ability to provide a switch that can be recognized as a component by the end user and can be attached or detached is an advantage in providing the service.
- From our experience in providing novel generation AI to users, we knew that users could understand (and perhaps improve user satisfaction) with regard to providing them with a function switch
- Performance is important
- Complex architecture increases accuracy, but the resulting slowdown is a major problem in a production environment (when the AI is actually a service that end-users touch).
- Initially, we tried to learn embedding (just as we had already tried with the novel generation AI)
- This is the equivalent of a Texual Inversion
- But the model did not generalize well enough.
- So we decided to apply hypernetting.
- After much trial and error, I decided to touch only the K and V parts of the cross-attachment layer.
- I wonât touch the rest of U-net.
- Shallow attention layers overlearn, so penalize them during learning.
- This method performed as well as or better than fine tuning.
- Better than fine tuning, especially when data for the target concept is limited
- I think itâs because the hypernet can find sparse regions that match the data in the latent space while the original model is preserved.
- Fine tuning with the same data will reduce generalization performance by trying to match a small number of training examples
- Maybe fine tuning of the entire model gives too much freedom and tries to represent the training data a little bit with the overall weights.
- By limiting it to adjusting the attention only, the âdenoising mechanism by condition vectorâ is preserved in a decent state learned with a lot of data, but the input vector to it changes more drastically than the one created by a mere transformer, I thought.
- Totally unrelated to the method of the same name proposed in 2016 by Ha et al.
- Use the hidden state of CLIPâs penultimate layer
-
Mechanism for generating a new image based on a single image and text prompt
-
Input is similar to StableDiffusionâs img2img, but features the ability to make global pixel changes that img2img does not
-
How does it work?
-
- StableDiffusion is broadly defined as âtext as input and image as output, learned in text/image pairs.â
- But when I opened the box, I found a frozen CLIP inside.
- Text is in the form of embedded vectors before being passed to [LDM
- Learning SD is the process of fixing the embedding vector e and output image x and updating the LDM model parameters θ to minimize the loss L
- Imagic is divided into three steps
- 1: First, fix the image and model parameters and optimize the embedding vector
- Losses here are the same as StableDiffusion, the usual definition of DDPM.
- 2: Then fix its embedding vector and optimize the model parameters
- (An auxiliary network is added to preserve the high-frequency component.)
- 3: Output image by linear interpolation of e and eopt as input to a new LDM
- 1: First, fix the image and model parameters and optimize the embedding vector
- Imagic is divided into three steps
- StableDiffusion is broadly defined as âtext as input and image as output, learned in text/image pairs.â
-
schematic
-
Step 0
- A picture of the cake and the prompt âpistachio cakeâ are given.
- Of course, the image created from the prompt âpistachio cakeâ is completely different from the image you gave
-
Step 1
- Update the embedding vector e so that the output image is closer to the input image x
- I think the images in this diagram are too similar.
- (The paper does not clearly show the image at this time, it says it looks roughly like this, but it appears to include the influence of the auxiliary model described below.)
-
Step 2
- Update model parameter θ so that the difference between the image generated from eopt and the input image x is reduced by combining auxiliary models
- In this case, the auxiliary model part learns and absorbs the details that cannot be represented by LDM, resulting in almost the same image.
- Auxiliary models are attached to preserve high-frequency components.
- The detail is well preserved!â This is because the network preserves high-frequency components that are not preserved by LDM.
- LDM collapses 8x8 pixels into 1 pixel, so the high-frequency component of the information given by the image is lost.
- Since the details are restored by the VAE decoder, that does not preserve the face of the individual given by the image. The auxiliary model absorbs the differences.
-
Step 3
- He claims that somewhere in the one-dimensional space that this new model generates, âthere is something relatively close to what we want to get.
- The assumption here is that âa small space would be considered flatâ.
- Argument that a mixing factor of about 0.7 looks good.
- Well, this is the case with photographs. When I experimented with an animated picture I used in NovelAI, it was almost identical to the original image even at 0.9 (the background color was different).
- He claims that somewhere in the one-dimensional space that this new model generates, âthere is something relatively close to what we want to get.
-
consideration
- Unlike img2img, dynamic changes happen, right? but it does.
- Input is the same as img2img, but unlike img2img, the given image is not used as the initial value when generating the image later.
- Generation process is txt2img with auxiliary model
- img2img downscales the given image (VAE encode) and paints the picture with it as the initial value.
- Itâs like a person with bad eyesight drawing a picture while referring to the original picture.
- So itâs absurd to give him a picture of a red dress and ask him to make it blue.
- Imagic passes the picture of the red outfit and says, âThis is the picture of the blue outfit.â
- The meaning of the word âblueâ is moved to âredâ by updating the embedding vector.
- Update the LDM and auxiliary models to reproduce the âpicture of red clothesâ given on it.
- And if you change the meaning of the word âblueâ back from âredâ to âblueâ, a âpicture of blue clothesâ is generated.
- Input is the same as img2img, but unlike img2img, the given image is not used as the initial value when generating the image later.
- High-frequency components such as the face are preserved because the âauxiliary modelâ absorbs facial details that would be erased if SD were used normally.
- Why is it that even at 0.9, an animated picture is almost identical to the original image (the only difference is the background color)?
- In the same way in photography, there are cases where the background has changed, not the object to be changed.
-
80908a8905a2b157f0902fc3a878d9d3d3b5735e|twcon^s1_c10&ref_url=https%3A%2F%2Fnote.com%2Fyamkaz%2Fn%2Fn7a7394323358 @npaka123: I was trying to get my cat Imagic Stable Diffusion, but the cat didnât change and moved to a bedroom-like place.
-
-
- I think the auxiliary model absorbed most of the objectâs information.
- Considered âinformation that should be kept outside the LDMâ similar to the face
- The algorithm doesnât determine what is the object it wants to change.
- For objects that occupy a large portion of the screen and that SD cannot produce at a high rate because of the prompt, âSD cannot produce, so letâs use an auxiliary model to absorb it.
- Mixing ratioΡ can be changed and tested later.
- Itâs negligible light here because itâs just a vector mixing of prompts.
- Can be done not only internally but also externally.
- In the same way in photography, there are cases where the background has changed, not the object to be changed.
- Unlike img2img, dynamic changes happen, right? but it does.
Aesthetic Gradient
- /ÉsËθÉt.ÉŞk ËÉĄÉšeÉŞdiÉnt/
- Research on extracting usersâ aesthetic senses and using them for personalization
- structure
- Text prompts are vectorized with CLIP text embedding. c
- StableDiffusionâs default would be a 768-dimensional vector.
- The average of the userâs favorite N images corresponding to that prompt, which are vectors in the CLIP image embedding. e
- If the vector is normalized, the inner product can be regarded as the similarity.
- So if we just take e, we can optimize the weight of the text embedding part of CLIP by gradient descent.
- A learning rate of 1e-4 should be about 20 steps.
- Text prompts are vectorized with CLIP text embedding. c
- consideration
- A method to fine-tune what vectors each token is embedded in CLIP
- Textual Inversion gives meaning to meaningless tokens, but this method only takes a vector of tokens that already have meaning and moves it slightly in the direction of the userâs preference.
- Instead, learning is extremely light.
- Another advantage is that unlike TI, this method is essentially multi-word OK.
- Maybe you could make 2N images from a longer prompt and then make an AG with N of them that you prefer.
- For example, in an experiment we did in Stable Diffusion Embedded Tensor Editing, a human mixed cat and kitten to create a vector that didnât correspond to a word.
- NovelAI has this functionality as a standard feature, the mixing ratio is determined by human hand.
- Aesthetic Gradient can be said to automatically create âmoderately mixed vectorsâ by learning to select only the ones you like from the images created by CAT and KITTEN.
- Another advantage is that images are converted to vectors with CLIP before use, so there is no need for size adjustment.
- Since the objective function is that of CLIP, I think that features that are not useful for CLIPâs task of determining the similarity between images and sentences are likely to be ignored.
- = Features that donât appear in the text are likely to be ignored (there are only 768 dimensions at most).
- On the other hand, I think what we want to get from vector adjustment is âa preference that cannot be well directed by text,â so I donât knowâŚ
- I think itâs useful for âitâs possible to express something in writing, but people donât express it well.â
Finally.
- I think DreamBooth is the real deal.
- Itâs expensive, so there are a lot of papers out there that are like âI made a simpler method!â but none of them seem to be good enough.
- The second is Hypernetwork, but this has not been published and detailed information has not been disclosed, and the situation is as follows: âNovelAI used NovelAIDiffusion with itâ and âThe source code was leaked! The source code has been leaked!
- This is another way to tweak the Attention, so only what Stable Diffusion can draw originally, itâs just more controllable since it was learned with Danbooruâs large number of tags.
-
- This is the kind of image
- The overall expressive capacity (number of black circles) itself has not changed.
- Concentrated black circles in specific painting style areas.
- It increased the density of points in the area.
- If you focus only on that area, it appears to have increased expressive power.
- [Cognitive Resolution
- If you focus only on that area, it appears to have increased expressive power.
- Hypernetwork is much smaller than the model of LDM itself, and can be turned on and off as a module, so it may be subdivided into âfor peopleâ and âfor backgroundsâ for animated pictures.
This page is auto-translated from /nishio/çťĺçćAIĺ埡äź(2022ĺš´10ćăă¤ă¸ă§ăšă) using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iâm very happy to spread my thought to non-Japanese readers.