image Using 5 photos of our cat as training data, recalculate the embedded vector of the new token using Textual Inversion, and use the new token as a prompt to generate an image using Stable Diffusion.

learning data image AI-generated photo, AI-generated Monet-style painting imageimage

By the way, the prompt is something like “a photo of our cat” or “a painting of our cat by Claude Monet”, but if you change the “our cat” part to “cat”, you will see the following. It is more complete as a cat, but the characteristics of “our cat” are not so good. imageimage

Maybe my cat is a type of structure in the three coat colors of black, orange, and white, where the black pigment is lost and the orange is much lighter. imageimage

The file of embedded vectors generated by Textual Inversion is about 5KB. The main content is a 768-dimensional float vector with some detailed information about the token.

---Impressions

@nishio: I’m feeling “not much resemblance” at the moment, but compared to random cat photos, it has clearly acquired features, so I have a feeling that within a few years there will be a lot of [People who keep messing around in search of a face. I have a feeling that there will be many [People who keep messing around in search of a face.

For example, if you study the photos of your daughter who died prematurely and generate hundreds of photos every day and select the ones you like, you will create new photos of your “[It lives in my heart. [Commemorative photo at a sightseeing spot you’ve never been to, field day photos, wedding photos


  • This is a virtual souvenir photo of my cat, a completely indoor cat, when I took her to the virtual ocean!

    • image

“Wedding Photography.”

  • Ah, so you could generate “your idea of an ideal son-in-law”, match them up, marry them, and then start generating pictures of “grandchildren” who never existed


    • This “virtual reality” sounds like a bad idea. If there is demand, there will be providers, and the tragedy of losing virtual grandchildren when the providers go out of business


  • I didn’t really understand the market for metaverse, which is about creating Avatars that look like real people from photos, but I guess it will develop into “Metaverse as a world where dead people continue to live”
 I guess it’s evolving into “Metaverse as a world where dead people continue to live”


    • Related: A poor avatar of the person in question.
    • His daughter, who died young and came of age in the Metaverse, is locked in by Meta (hell).

    • My virtual daughter and son-in-law are raising their non-existent grandchildren in a beautiful non-existent house by a non-existent lake while subsisting on a non-existent farm, all locked in by Meta, and the maintenance fees are deducted from my account on a subscription model. When I thought I hadn’t logged in recently, the person was dead, but I hadn’t cancelled the account, so it keeps getting debited (hell).

I saw the response of “learning a guesser’s photo” and thought that there could be hell even if the subject is still alive. It seems like there is a large amount of training data and it would be easy to improve the quality of the face. The person is growing up, but the growth is stopped when the person says, “No, I like the one I had when I was 20 years old” and is kept forever in the metaverse. - Breeding [idol There are going to be hundreds of people who will remove the porn filter.


Bowman

  • image
    • image
    • I got a very good one! I was so excited, but this was the best case, and even after generating more than 100 sheets after that, I could not produce anything better than this!
    • image
      • Interpreted as “Bowmen usually have a local dish.” w
      • There are too many outputs where the food is the main body. You said “it’s a CHARACTER” when you were learning.
    • In fact, this was the first experiment, and after I got excited and left it for a while, I decided “let’s try it in live action”, which is the cat experiment above.

live-action Bowman

  • image
  • image
  • It’s technically interesting that they have mastered various things such as “texture”, “colors they tend to use”, “CO-like logo”, “size against people”, etc. from the images I gave them with no prior information
 but I guess consumers won’t be satisfied with this quality, right?

Results can be seed sensititve. If you’re unsatisfied with the model, try re-inverting with a new seed (by adding —seed <#> to the prompt).

  • You may or may not get a good one if you run the gacha 100 times for an hour at a time.
  • Since what we get as a result of learning is a single vector of 768 dimensions, we may be able to search efficiently by selecting only the good ones among multiple vectors and averaging or GA
    • Optimization problem in 768 dimensional space where the evaluation function is human after all.
  • My cat’s learning will be at a satisfactory level if I work hard, and I have a feeling that Bowman won’t be able to do it.
    • Bowman can’t be represented by one token, he would need to be represented by about three tokens, for example, a face, a logo, and an outfit.

Learning with Unexplored Logos image

  • image
  • I tried changing the background to show the logo image part, but it still doesn’t seem to work.
  • I guess you’d have to take a picture of a logo shaped 3D object placed in various locations.
  • It sounds like they understood it as “an abstract image with greenish, diagonal and horizontal lines” rather than “an unexplored logo.”

This page is auto-translated from [/nishio/Textual Inversionă‚’è©Šă—ăŠăżă‚‹](https://scrapbox.io/nishio/Textual Inversionă‚’è©Šă—ăŠăżă‚‹) using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.