Exploring the Different Parameters in Stable Diffusion Models
- 7 minutes read - 1407 wordsTable of Contents
Stable Diffusion models are constantly evolving with each AI model generation, and the relevance of its parameters is changing.
This blog post uses the latest model from Dreamstudio as a reference and highlights the importance of understanding the various parameters in Stable Diffusion models.
Changing relevance of Stable Diffusion parameters
With every AI model generation (v1.4, 1.5, 2.1, 2.2, etc.), the relevance of parameters has changed and is changing. This blog uses the latest model from Dreamstudio as a reference. Some parameters, like the CFG scale, are set to 7 and cannot be changed from the user interface. However, the CFG scale can be changed using the API of Dreamstudio .
In April 2023, the CEO of Stability AI made some interesting remarks on Twitter, which strongly indicate that Stability AI will use a B2B business model. The Dreamstudio website will be a reference for all model changes; the same is most likely true for all API changes. In other words, the models listed on the Dreamstudio website are THE reference for Stable Diffusion.
Since there are countless „Stable Diffusion“ based models, a „canon“ for the parameters makes a lot of sense.
Understanding Negative Prompts
Before diving into Stable Diffusion, it’s essential to clarify negative prompts and their role in image generation. When working with an AI model like OpenAI’s DALL-E, you often provide a positive prompt to guide the image creation process. Negative prompts refine your image by telling the AI what not to include in the final result.
Examples of Negative Prompts
- General negative prompts: “No text,” “No logos,“ or “No watermarks” can be used to ensure unwanted elements do not appear in the final image.
- Negative prompts for people portraits: “No glasses” or “No beard” can help the AI produce a more accurate picture without certain features.
- Negative prompts for photorealistic images: “No cartoon-like” or “Not a painting” can refine the final image to adhere to the desired level of realism.
Dreamstudio and other Stable Diffusion services provide pre-curated negative prompts to make it easier for users to work with AI-generated images effectively.
Denoising Steps and Step Count
Denoising steps are essential for the final image quality in AI-generated images. The number of denoising steps you choose can significantly affect the image’s quality, so finding the right balance is crucial.
As a rule of thumb, consider the following recommendations:
- Low-resolution images: 15-25 steps
- Medium-resolution images: 26-100 steps
- High-resolution images: 101-150 steps
A lower step count of 15-25 can produce a quick, rough image. On the other hand, a higher step count of 100-150 results in a slower image generation process that produces higher-quality output.
The default of Dreamstudio is 50 steps and can be changed.
Diffusion Samplers
Multiple diffusion samplers - sometimes also called schedulers - are available (via API, the sampler will be selected automatically):
- DDIM
- DPM Solver++
- Euler Ancestral
- Euler
- Heun
- LMS
- DPMPP 2M
- DPMPP 2S Ancestral
- DPM 2
- DPM 2 Ancestral
These samplers are responsible for how the AI model “moves” through probabilities to find the most likely image matching the provided prompt. These solvers are trying to follow the curves of the AI model by predicting the next values (also called stochastic differential equations). By predicting values, the solver can recreate the image from the lower dimensional latent space present in the AI model.
Each sampler has its strengths and weaknesses, and understanding the differences will help you choose the most suitable one for your needs. The most relevant flaws in this context are speed and accuracy.
Most differences between samplers are visible at low step counts, typically less than 20. Specific samplers produce distinguishable images faster than others, resulting in noticeable differences during the initial phases. However, the randomness makes it difficult to predict how these early images will evolve as more steps are added.
DDIM
In contrast to Euler Ancestral, DDIM (Denoising Diffusion Implicit Model) prioritizes adherence to the input prompt. This can be beneficial when you require a particular result at the cost of some creative variability.
DPM Solver++
DPM Solver++ (Denoising Projection Matching Solver++) is a sampler that balances image quality and prompts adherence while allowing creative exploration.
Euler Ancestral
Euler Ancestral is balancing performance and quality. Its main advantage is the ability to produce a wide range of images, each with high creative freedom.
Euler
The simplest possible solver.
Heun
A more accurate but slower version of Euler.
LMS (Linear multi-step method)
Same speed as Euler but (supposedly) more accurate.
DPM++ 2M
DPM++ 2M needs a higher number of steps to perform well. As such, it acts similarly to Euler.
DPM++ 2S Ancestral
Should add more details - after enough steps - then DPM++ 2M.
DPM 2
DPM 2 is a sophisticated numerical method for solving diffusion models’ stochastic differential equations (SDEs).
It is intended to improve the original Diffusion pseudo-marginal algorithm’s (DPM) computational efficiency by lowering the computational cost of estimating the likelihood term in the SDE.
DPM2 outperforms Euler but at the expense of being two times slower.
DPM 2 Ancestral
DPM2 Ancestral adds more details than DPM2.
Clip Guidance Preset
CLIP guidance brings the result closer to the text prompt. If CLIP guidance is enabled, the Stable Diffusion seed will not behave deterministically. That is because - most likely - CLIP guides the diffusion process and softens the seed value. Their multiple guidance presets:
- FAST_BLUE
- FAST_GREEN
- NONE (default)
- SIMPLE
- SLOW
- SLOWER
- SLOWEST
Style Presets
Since the SDXL model, Dreamstudio has been offering style presets, which already became a norm for other Stable Diffusion-based apps like, for example, Photoleap App. Style presets easier to use than parameter settings. Styles offered:
- Enhance (default)
- Anime
- Photographic
- Digital-art
- Comic-book
- Fantasy-art
- Line-art
- Analog-film
- Neon-punk
- Isometric
- Low-poly
- Origami
- Modeling-compound
- Cinematic
- 3d-model
- Pixel-art
- Tile-texture
Classifier-free Guidance Scale (CFG)
The CFG Scale parameter allows you to control the balance between creativity and adherence to the prompt in your final image. The recommended values for different types of output are as follows (default CFG is 7):
- Low CFG Scale (hallucination-like): A low CFG setting (1-3) can result in highly imaginative and abstract images, which may not hold a solid connection to the prompt.
- Mid-range CFG Scale (optimized for balance): A medium CFG setting (3.1-10) balances creative freedom and prompt adherence, ideal for users who want to meet their prompt requirements without overly restricting the outcome.
- High CFG Scale (adherence to prompt with possible artifacts): A high CFG setting (>10) will tightly follow the provided prompt, potentially introducing visual artifacts in the process. This setting is suitable for those who prioritize input accuracy over creative exploration.
Seeds
Seeds are random values that help the AI model generate different variations of images from the same prompt. By changing the seed, you explore different creative paths AI models can take to generate your desired image.
Seeds can be used effectively in the following ways:
- Test multiple combinations of seeds and prompts for impressive results.
- Reproduce the same image by providing the same seed and prompt only without CLIP guidance.
- Generate related images using different seeds and the same prompt.
Img2img Prompts
Not only text prompts can be used, but also image prompts, also called image-to-image or img2img prompts.
- The input is an image instead of a textual prompt.
- The AI model identifies the input image and generates a new image based on it.
- You can control the strength of the img2img transformation by adjusting the slider.
By choosing the right strength, you can create unique variations of your original image, opening up new creative possibilities and some style consistency.
Conclusions
Stable Diffusion models are constantly evolving with each AI model generation, and the relevance of its parameters is changing.
This blog post has provided a comprehensive guide to the different parameters, diffusion samplers, and other factors that affect the quality and creative freedom of Stable Diffusion-generated images. Understanding these parameters is crucial for improving the quality and creative output of Stable Diffusion models.
Negative prompts, denoising steps, clip guidance presets, and style presets are just some factors that users should consider when working with Stable Diffusion models. Users can create impressive and unique images that meet their requirements using the proper parameters combination.