Stable Diffusion is awesome, and it’s even more awesome when you take time to customize it.

Awesome GIF

If you’re a creator using it to make cool stuff, you’ve likely seen others adding themselves into custom models, and customizing models to do other really cool stuff. Like adding custom objects or characters. But how does that work?

There are a few different methods for customizing stable diffusion that are available to us. Dreambooth, Textual Inversion, LoRA, and Hypernetworks. Each method has its advantages and disadvantages, and they each function a bit differently. In addition to working differently to create models, you’ll also need to install an environment in which to use them. For my own customizations I’ve been primarily using Textual Inversion through InvokeAI. Read more about how to install InvokeAI here.

Stable Diffusion Training Techniques:

Comparison of Stable Diffusion Training Techniques Detailed explanation of the four training techniques for Stable Diffusion models

Dreambooth

Dreambooth is by far the most popular method of customizing Stable Diffusion. And, if you’re interested in preserving details of your subject, it’s likely the best option available to get the job done. Note: There is currently no support for Dreambooth in InvokeAI.

The Dreambooth method fine-tunes the diffusion model itself, until it understands the new concept you are trying to teach it.

✅ Pros:

Highly effective at preserving concepts (probably the best of the 4 methods we’re discussing)

🔻 Cons:

Requires a lot of storage because the output is an entirely new model (average model is around 2-5 GB)
Cannot run simultaneously with other models/concepts (requires a merge of the models)
Model merges can be lossy

Resources:

Detailed Explanation: https://dreambooth.github.io/
White Paper: https://arxiv.org/abs/2208.12242
Code: https://github.com/XavierXiao/Dreambooth-Stable-Diffusion

Examples:

Models

Analog Diffusion

Dreamlike Photoreal

Some Example Image Outputs

“AI Tinder Photos w/ Analog Diffusion + Dreambooth”

"I faked myself with merged dreambooth and analog diffusion and people on IG totally bought it. 😪" “I faked myself with merged dreambooth and analog diffusion and people on IG totally bought it. 😪”

Textual Inversion

Textual inversion is, in my opinion, the most dynamic and useful method for training stable diffusion concepts. With the right settings you can train concepts very well, although not quite as good as Dreambooth.

Textual inversion creates a special word embedding that captures new concepts.