Delving into Midjourney's Progress: Employing Entropy to Assess AI Model Quality Improvements
- 8 minutes read - 1674 wordsTable of Contents
Midjourney’s Blend feature transforms the AI art landscape by combining concepts and aesthetics to generate unique visual ideas by merging 2-5 images.
Blend mode, with its numerous applications and simple workflow, opens up new possibilities for artists and designers.
Understanding the technology behind Midjourney, on the other hand, remains challenging. This article examines the differences between Midjourney v4 and v5, revealing key quality improvements and their implications for AI-generated art.
What is Midjourney Blend?
MidJourney has introduced an incredible new “Blend” feature that allows users to merge 2-5 images, combining their concepts and aesthetics to create a unique, novel idea.
The AI consistently preserves significant elements from each image in the final blend. Users can also change the dimensions of the final image to portrait, landscape, or square.
This powerful tool has various applications, such as creating movie stills by blending images of actors and characters, and speeds up the workflow compared to traditional editing processes like Photoshop.
Midjourney has supported multi-image prompts since Midjourney v3, which was released in July 2022. Multi Image Prompts are a unique feature of Midjourney, not offered by Dall-E, Stable Diffusion, or Craiyon.
The blend feature is like a multi-image prompt without a text prompt.
What is so special about the blend mode?
Unlike image-to-image from Stable Diffusion, a Midjourney image prompt is on par with a text prompt and, in some cases, even more, powerful than image-to-image.
The blend mode allows you to explore concepts and ideas quickly, much faster than relying on text prompts alone. It also allows re-using existing digital assets like photographs and digital art and combining them with AI artworks.
There is an attempt to replicate the blend mode and Midjourney’s multi-image prompts for Stable Diffusion. However, the Image Mixer is less capable than Midjourney’s image multi-prompts or blend mode.
In other words /blend is a potent, unique, and valuable tool for concept art, mood board, and style development.
Trying to make Blend Mode transparent
After Midjourney introduced blend mode in early 2023, still with Midjourney v4, its potential was quickly realized by the AI Art community for use cases like:
- Personalized artwork
- Consistency of characters
- Create a unique style
Midjourney is very secretive about its technology compared to Stability and OpenAI. Understanding how their multi-image prompts and blend mode work is also challenging. That means the blend mode is compelling and also very intransparent.
Differences between Midjourney v4 and v5
Using an experimental approach, it is clear that Midjourney v5 is rapidly approaching a pattern, resulting in a more stable and structured pattern than Midjourney v4.
By comparing image complexity using entropy, it is possible to conclude that Midjourney v5’s AI model is superior to v4. Blend mode is a valuable tool for evaluating the quality of the AI model within Midjourney.
Although Midjourney’s AI model is not open-source and cannot be investigated like Stable Diffusion, blending mode allows for insights into the model’s performance, potentially revealing some underlying mechanisms within the “black box.”
As AI models evolve, such techniques can assist artists and researchers in better understanding and assessing the quality of AI-generated art and its potential applications in various creative fields.
Relevance for AI Art and Visual Storytelling
By generating images based on text prompts, AI Art tools like Midjourney have enormous potential for visual storytelling. Consistency in style and character depiction, on the other hand, is essential when creating mood boards or storyboards. Dissonance can be caused by inconsistent styles and characters, affecting the overall coherence and impact of the visual narrative.
While Midjourney’s blend mode can assist in achieving a consistent style, maintaining character consistency is still tricky. Addressing these limitations will be critical as AI art tools evolve to improve their utility in visual storytelling. These tools will become invaluable assets for artists, filmmakers, and designers in crafting compelling visual narratives by refining AI-generated art and enabling consistent styles and characters.
Image-to-Image, Depth-to-Image, ControlNet, and, to a lesser extent, Dalle-2 recreate features are similar to the blend mode in Midjourney. Nonetheless, their purpose is to control the output stability, which is helpful if you want to create a sequence of images for animation or shot movies.
Blend accepts images with the same style for different inputs, allowing it to create consistent scenes precisely what mood boards, visual storytelling, and storyboards require. Text prompts can be fine-tuned and stable diffused to achieve consistency over multiple iterations. However, developing a style with fine-tuning is inefficient because fine-tuning a Stable Diffusion model takes much longer than using blend mode.
Recognizing the power of Midjourney’s blend mode while acknowledging that Midjourney is somewhat opaque and that there is no API for automated testing and assessment emphasizes the importance of using blend itself to understand how Midjourney’s blend mode works.
Experimental Setup
Usually, the blend mode is used only once per iteration. That means you have up to 5 images you want to blend, and you will repeat this process to get the desired result.
Of course, you also can blend the output of a blend, the following output again, and so on. That is how you can develop a style. Doing that, you will observe that Midjourney handles styles differently than others.
The experiment starts with two random images, and in the next step, two of the resulting images will be selected and blended again; then, you repeat this process multiple times.
Starting with two images containing noise or very simple structures, you will see that Midjourney is increasing the complexity. However, there is a significant difference between Midjourney v4 and v5.
Interpretation of Midjourney’s behavior
There are two test series, „pink“ and „noise. “ „Pink“ is structured and has a low entropy, while „noise“ is unstructured and has a relatively high entropy. There are two measurements:
- Entropy average (which is the average entropy of the images blended)
- Histogram sum average (the average of histogram sums, it measures the complexity of the image)
Comparing the measurements for Midjourney v4 and v5, you can see that v5 increases image complexity (Histogram sum average) much faster than v4. The entropy for v4 also approaches a higher value than v5. V5 also reacts much more to the geometric structure of „pink“than „noise.“ While it was possible to create strange patterns by iterating blend with „v4“, in „v5, “ the image structure approaches a stable state and is not hallucinating.
In other words, the „v5“ model of Midjourney is more efficient and stable than the „v4“ version. Combining this with Midjourney now creating images with a resolution of 1024x1024 by default instead of 512x512, it becomes clear that Midjourney also improved the memory efficiency by a factor of 3-4.
Test Series Midjourney v4 blend noise
Image 1 | Image 2 | Entropy 1 | Entropy 2 | Sum 1 | Sum 2 | |
---|---|---|---|---|---|---|
4.53 | 3.28 | 140 | 18 | |||
5.20 | 4.79 | 448 | 199 | |||
5.80 | 5.59 | 1046 | 819 | |||
6.16 | 6.48 | 976 | 1356 | |||
6.52 | 6.68 | 1334 | 1815 | |||
6.78 | 6.80 | 4687 | 3956 | |||
6.66 | 6.74 | 13496 | 13941 | |||
5.98 | 5.98 | 24712 | 30758 | |||
6.22 | 5.87 | 60879 | 58846 | |||
6.29 | 6.17 | 84513 | 76197 | |||
6.51 | 6.19 | 86530 | 83487 | |||
6.19 | 6.66 | 81240 | 93489 | |||
6.55 | 6.54 | 86119 | 86604 | |||
6.74 | 6.68 | 102836 | 95125 | |||
6.87 | 6.63 | 117423 | 119650 | |||
6.84 | 6.78 | 134680 | 130191 | |||
6.98 | 6.94 | 132227 | 131792 | |||
6.76 | 6.93 | 138186 | 134378 | |||
6.83 | 6.90 | 145689 | 135058 | |||
6.87 | 6.97 | 126092 | 117148 | |||
6.87 | 6.88 | 108882 | 110676 | |||
6.81 | 6.90 | 110359 | 111013 | |||
6.65 | 6.81 | 124263 | 118231 | |||
6.73 | 6.78 | 120195 | 115708 | |||
6.87 | 6.68 | 128364 | 124032 |
Test Series Midjourney v5 blend noise
Image 1 | Image 2 | Entropy 1 | Entropy 2 | Sum 1 | Sum 2 | |
---|---|---|---|---|---|---|
5.27 | 2.17 | 1188 | 17 | |||
3.94 | 3.40 | 1071 | 324 | |||
5.46 | 5.02 | 12755 | 19416 | |||
5.74 | 5.79 | 62170 | 72176 | |||
5.98 | 5.99 | 106963 | 140308 | |||
6.11 | 6.06 | 187716 | 175786 | |||
6.44 | 6.41 | 191265 | 176359 | |||
6.18 | 6.33 | 147614 | 195716 | |||
6.34 | 6.12 | 178605 | 158508 | |||
6.22 | 6.23 | 141643 | 135511 | |||
6.12 | 6.19 | 94476 | 112022 | |||
5.80 | 5.95 | 54621 | 46863 | |||
6.14 | 5.29 | 60850 | 53250 | |||
5.75 | 5.94 | 84298 | 103617 | |||
6.11 | 5.78 | 91952 | 99835 |
Test Series Midjourney v4 blend pink
Image 1 | Image 2 | Entropy 1 | Entropy 2 | Sum 1 | Sum 2 | |
---|---|---|---|---|---|---|
2.09 | 0.75 | 15631 | 249937 | |||
2.85 | 3.13 | 250184 | 257296 | |||
2.65 | 4.37 | 257980 | 256453 | |||
3.48 | 4.74 | 254336 | 242706 | |||
3.84 | 5.10 | 230920 | 204197 | |||
5.37 | 5.40 | 173866 | 157728 | |||
5.90 | 6.06 | 174517 | 180191 | |||
6.56 | 6.38 | 178649 | 145838 | |||
6.19 | 6.00 | 167114 | 126285 | |||
6.06 | 5.38 | 153179 | 118019 | |||
6.09 | 6.17 | 119116 | 107152 | |||
6.20 | 6.80 | 124605 | 100092 | |||
6.72 | 6.76 | 94216 | 105400 | |||
6.82 | 6.79 | 114444 | 154591 | |||
6.70 | 6.91 | 139373 | 163483 | |||
6.90 | 6.92 | 171800 | 157225 | |||
6.93 | 6.95 | 170994 | 181207 | |||
6.92 | 6.91 | 156286 | 172952 | |||
6.97 | 6.95 | 158602 | 184686 | |||
6.97 | 6.79 | 184973 | 206138 | |||
6.67 | 6.79 | 190083 | 216097 | |||
6.80 | 6.83 | 209105 | 206730 | |||
6.87 | 6.55 | 207837 | 177583 | |||
6.75 | 6.74 | 181175 | 192899 | |||
6.71 | 6.83 | 195523 | 189841 | |||
6.63 | 6.81 | 187217 | 207097 | |||
6.71 | 6.73 | 191542 | 157933 | |||
6.82 | 6.83 | 184096 | 174412 |
Test Series Midjourney v5 blend pink
Image 1 | Image 2 | Entropy 1 | Entropy 2 | Sum 1 | Sum 2 | |
---|---|---|---|---|---|---|
2.09 | 0.75 | 15631 | 249937 | |||
2.00 | 1.90 | 270313 | 815808 | |||
2.92 | 2.16 | 941239 | 165071 | |||
2.55 | 5.40 | 622583 | 872529 | |||
4.74 | 3.50 | 856652 | 569445 | |||
3.62 | 3.92 | 510707 | 624817 | |||
4.67 | 4.21 | 412849 | 532058 | |||
3.88 | 4.10 | 417974 | 426259 |
Conclusions
This blog post highlighted the significant quality improvements made in Midjourney v5 using an entropy-based analysis.
These updates give artists and researchers a more efficient and stable AI model for creating compelling visual narratives.
As AI art tools evolve, addressing style and character consistency limitations will be critical to ensuring their utility in visual storytelling.
Sources:
- https://medium.com/generative-ai/midjourney-ais-new-blend-feature-is-incredible-5f84fc9b0afa
- https://docs.midjourney.com/docs/blend
- https://beta.dreamstudio.ai/dream
- https://www.analyticsvidhya.com/blog/2020/11/entropy-a-key-concept-for-all-data-science-beginners/
- https://www.inovex.de/de/blog/the-mystery-of-entropy-how-to-measure-unpredictability-in-machine-learning/
- https://mpost.io/lambda-labs-announced-an-ai-image-mixer-that-can-combine-up-to-five-images/