Delving into Midjourney's Progress: Employing Entropy to Assess AI Model Quality Improvements

edited on:October 1, 2024- published: March 25, 2023 - 8 minutes read - 1674 words

Tags:

Midjourney blend command

<<< Harness the Power of Generative AI for Unparalleled Content Production Why google is not doomed. >>>

image from Analyzing Midjourney's Quality Improvements Through Blend Mode

Midjourney’s Blend feature transforms the AI art landscape by combining concepts and aesthetics to generate unique visual ideas by merging 2-5 images.

Blend mode, with its numerous applications and simple workflow, opens up new possibilities for artists and designers.

Understanding the technology behind Midjourney, on the other hand, remains challenging. This article examines the differences between Midjourney v4 and v5, revealing key quality improvements and their implications for AI-generated art.

What is Midjourney Blend?

MidJourney has introduced an incredible new “Blend” feature that allows users to merge 2-5 images, combining their concepts and aesthetics to create a unique, novel idea.

The AI consistently preserves significant elements from each image in the final blend. Users can also change the dimensions of the final image to portrait, landscape, or square.

This powerful tool has various applications, such as creating movie stills by blending images of actors and characters, and speeds up the workflow compared to traditional editing processes like Photoshop.

Midjourney has supported multi-image prompts since Midjourney v3, which was released in July 2022. Multi Image Prompts are a unique feature of Midjourney, not offered by Dall-E, Stable Diffusion, or Craiyon.

The blend feature is like a multi-image prompt without a text prompt.

What is so special about the blend mode?

Affiliate Links

Stable Diffusion with Python

Master Stable Diffusion for AI image generation using Python. Control and customize your creations.

Stable Diffusion Web UI on AWS

Deploy Stable Diffusion Web UI on AWS with this comprehensive guide.

Mastering Midjourney: AI Art Guide

Unlock Midjourney V6 features and create exceptional AI art.

Unlike image-to-image from Stable Diffusion, a Midjourney image prompt is on par with a text prompt and, in some cases, even more, powerful than image-to-image.

The blend mode allows you to explore concepts and ideas quickly, much faster than relying on text prompts alone. It also allows re-using existing digital assets like photographs and digital art and combining them with AI artworks.

There is an attempt to replicate the blend mode and Midjourney’s multi-image prompts for Stable Diffusion. However, the Image Mixer is less capable than Midjourney’s image multi-prompts or blend mode.

In other words /blend is a potent, unique, and valuable tool for concept art, mood board, and style development.

Trying to make Blend Mode transparent

After Midjourney introduced blend mode in early 2023, still with Midjourney v4, its potential was quickly realized by the AI Art community for use cases like:

Personalized artwork
Consistency of characters
Create a unique style

Midjourney is very secretive about its technology compared to Stability and OpenAI. Understanding how their multi-image prompts and blend mode work is also challenging. That means the blend mode is compelling and also very intransparent.

Differences between Midjourney v4 and v5

Using an experimental approach, it is clear that Midjourney v5 is rapidly approaching a pattern, resulting in a more stable and structured pattern than Midjourney v4.

By comparing image complexity using entropy, it is possible to conclude that Midjourney v5’s AI model is superior to v4. Blend mode is a valuable tool for evaluating the quality of the AI model within Midjourney.

Although Midjourney’s AI model is not open-source and cannot be investigated like Stable Diffusion, blending mode allows for insights into the model’s performance, potentially revealing some underlying mechanisms within the “black box.”

As AI models evolve, such techniques can assist artists and researchers in better understanding and assessing the quality of AI-generated art and its potential applications in various creative fields.

Relevance for AI Art and Visual Storytelling

By generating images based on text prompts, AI Art tools like Midjourney have enormous potential for visual storytelling. Consistency in style and character depiction, on the other hand, is essential when creating mood boards or storyboards. Dissonance can be caused by inconsistent styles and characters, affecting the overall coherence and impact of the visual narrative.

While Midjourney’s blend mode can assist in achieving a consistent style, maintaining character consistency is still tricky. Addressing these limitations will be critical as AI art tools evolve to improve their utility in visual storytelling. These tools will become invaluable assets for artists, filmmakers, and designers in crafting compelling visual narratives by refining AI-generated art and enabling consistent styles and characters.

Image-to-Image, Depth-to-Image, ControlNet, and, to a lesser extent, Dalle-2 recreate features are similar to the blend mode in Midjourney. Nonetheless, their purpose is to control the output stability, which is helpful if you want to create a sequence of images for animation or shot movies.

Blend accepts images with the same style for different inputs, allowing it to create consistent scenes precisely what mood boards, visual storytelling, and storyboards require. Text prompts can be fine-tuned and stable diffused to achieve consistency over multiple iterations. However, developing a style with fine-tuning is inefficient because fine-tuning a Stable Diffusion model takes much longer than using blend mode.

Recognizing the power of Midjourney’s blend mode while acknowledging that Midjourney is somewhat opaque and that there is no API for automated testing and assessment emphasizes the importance of using blend itself to understand how Midjourney’s blend mode works.

Experimental Setup

Usually, the blend mode is used only once per iteration. That means you have up to 5 images you want to blend, and you will repeat this process to get the desired result.

Generations

Of course, you also can blend the output of a blend, the following output again, and so on. That is how you can develop a style. Doing that, you will observe that Midjourney handles styles differently than others.

The experiment starts with two random images, and in the next step, two of the resulting images will be selected and blended again; then, you repeat this process multiple times.

Starting with two images containing noise or very simple structures, you will see that Midjourney is increasing the complexity. However, there is a significant difference between Midjourney v4 and v5.

Interpretation of Midjourney’s behavior

There are two test series, „pink“ and „noise. “ „Pink“ is structured and has a low entropy, while „noise“ is unstructured and has a relatively high entropy. There are two measurements:

Entropy average (which is the average entropy of the images blended)
Histogram sum average (the average of histogram sums, it measures the complexity of the image)

Comparing the measurements for Midjourney v4 and v5, you can see that v5 increases image complexity (Histogram sum average) much faster than v4. The entropy for v4 also approaches a higher value than v5. V5 also reacts much more to the geometric structure of „pink“than „noise.“ While it was possible to create strange patterns by iterating blend with „v4“, in „v5, “ the image structure approaches a stable state and is not hallucinating.

In other words, the „v5“ model of Midjourney is more efficient and stable than the „v4“ version. Combining this with Midjourney now creating images with a resolution of 1024x1024 by default instead of 512x512, it becomes clear that Midjourney also improved the memory efficiency by a factor of 3-4.

Entropy Chart

Test Series Midjourney v4 blend noise

Entropy 1	Entropy 2	Sum 1	Sum 2
4.53	3.28	140	18
5.20	4.79	448	199
5.80	5.59	1046	819
6.16	6.48	976	1356
6.52	6.68	1334	1815
6.78	6.80	4687	3956
6.66	6.74	13496	13941
5.98	5.98	24712	30758
6.22	5.87	60879	58846
6.29	6.17	84513	76197
6.51	6.19	86530	83487
6.19	6.66	81240	93489
6.55	6.54	86119	86604
6.74	6.68	102836	95125
6.87	6.63	117423	119650
6.84	6.78	134680	130191
6.98	6.94	132227	131792
6.76	6.93	138186	134378
6.83	6.90	145689	135058
6.87	6.97	126092	117148
6.87	6.88	108882	110676
6.81	6.90	110359	111013
6.65	6.81	124263	118231
6.73	6.78	120195	115708
6.87	6.68	128364	124032

Related Content

AI Steadicam Shots: Cinematic Visuals Explore AI-powered steadicam shots for cinematic visuals.

AI Image Generation: Camera Angles Discover how AI generates images with different camera angles.

AI Art Deco Image Generation Learn about AI's ability to create images in various art styles.

AI Image Generation Models Compare AI models for image generation and their capabilities.

AI Image Generation: Synthetic Data Understand how AI uses synthetic data to generate images.

AI Gothic Image Generation Explore AI's potential to create images in gothic aesthetic.

AI Time-Lapse Image Generation Discover AI's ability to generate dramatic time-lapse images.

Test Series Midjourney v5 blend noise

Entropy 1	Entropy 2	Sum 1	Sum 2
5.27	2.17	1188	17
3.94	3.40	1071	324
5.46	5.02	12755	19416
5.74	5.79	62170	72176
5.98	5.99	106963	140308
6.11	6.06	187716	175786
6.44	6.41	191265	176359
6.18	6.33	147614	195716
6.34	6.12	178605	158508
6.22	6.23	141643	135511
6.12	6.19	94476	112022
5.80	5.95	54621	46863
6.14	5.29	60850	53250
5.75	5.94	84298	103617
6.11	5.78	91952	99835

Test Series Midjourney v4 blend pink

Entropy 1	Entropy 2	Sum 1	Sum 2
2.09	0.75	15631	249937
2.85	3.13	250184	257296
2.65	4.37	257980	256453
3.48	4.74	254336	242706
3.84	5.10	230920	204197
5.37	5.40	173866	157728
5.90	6.06	174517	180191
6.56	6.38	178649	145838
6.19	6.00	167114	126285
6.06	5.38	153179	118019
6.09	6.17	119116	107152
6.20	6.80	124605	100092
6.72	6.76	94216	105400
6.82	6.79	114444	154591
6.70	6.91	139373	163483
6.90	6.92	171800	157225
6.93	6.95	170994	181207
6.92	6.91	156286	172952
6.97	6.95	158602	184686
6.97	6.79	184973	206138
6.67	6.79	190083	216097
6.80	6.83	209105	206730
6.87	6.55	207837	177583
6.75	6.74	181175	192899
6.71	6.83	195523	189841
6.63	6.81	187217	207097
6.71	6.73	191542	157933
6.82	6.83	184096	174412

Test Series Midjourney v5 blend pink

Entropy 1	Entropy 2	Sum 1	Sum 2
2.09	0.75	15631	249937
2.00	1.90	270313	815808
2.92	2.16	941239	165071
2.55	5.40	622583	872529
4.74	3.50	856652	569445
3.62	3.92	510707	624817
4.67	4.21	412849	532058
3.88	4.10	417974	426259