AI Struggles to Capture the 'Dramatic' Aesthetic with Imagen-v2
- 9 minutes read - 1810 wordsTable of Contents
The ‘dramatic’ aesthetic is a powerful tool in visual storytelling, evoking strong emotions and creating a sense of grandeur. It often involves dramatic lighting, contrasting colors, and dynamic compositions. While AI models have made significant strides in image generation, capturing this specific aesthetic remains a challenge. This blog post explores the results of an experiment where a generative AI model was tasked with creating images based on various scenes and aesthetic styles, including ‘dramatic’. The results highlight the model’s strengths and weaknesses, shedding light on the ongoing quest for AI to truly understand and replicate artistic intent.
Created with: imagen-v2
A Warrior’s Solitude in the Wasteland
A lone warrior, silhouetted against a desolate landscape, stands with a spear in hand, gazing out at the horizon. The scene evokes a sense of epic loneliness and power, capturing the somber mood of a warrior facing an uncertain future.
Prompt
Stylized: Epic and melancholic ; A lone warrior; wide shot; Heroism; A desolate battlefield with a setting sun; cinematic
Characteristic
Shot : A lone warrior stands on a desolate, rocky landscape under a cloudy sky. The warrior is armed with a spear and wears armor and a cape.
Aesthetic Score : 0.6
Mood : epic, lonely, dramatic
Quality
Entropy : 6.84
Noise : 84
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.90
Image errors : The background appears blurry and lacking in detail, and the textures on the warrior’s armor are a bit flat and unrealistic.
Lost Treasure Gleams in the Dark
A weathered treasure chest overflows with gold coins and jewels, casting a mystical glow in the dim light of a dark cave. The scene evokes a sense of mystery, adventure, and magic, with the contrast between darkness and light adding to the dramatic effect.
Prompt
Stylized: Excitement and wonder ; A treasure chest overflowing with gold; close-up; Adventure; A dark and mysterious cave; cinematic
Characteristic
Shot : A treasure chest overflowing with gold coins and a cloth sack inside a dark, mysterious cave. The chest is positioned in the foreground with the cave opening behind it, bathed in a warm, inviting light.
Aesthetic Score : 0.6
Mood : mysterious, adventurous, treasure
Quality
Entropy : 6.41
Noise : 72
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.90
Image errors : The lighting on the chest seems artificial and unnatural. The gold coins appear to be overly smooth and uniform, with a lack of detail. There is a faint blue glow on the chest and around some of the coins that looks like a post-processing artifact. The cloth sack looks very pixelated.
Silhouetted Against the Ruins: A Lone Warrior in a Post-Apocalyptic World
A solitary figure stands defiant against the backdrop of a decaying cityscape, their silhouette a stark contrast against the bleak, futuristic landscape. This image evokes a sense of loneliness, darkness, and the enduring spirit of humanity in the face of adversity.
Prompt
Stylized: Triumphant and futuristic ; A player’s avatar, a powerful warrior, standing triumphantly; medium shot; Gaming; A vibrant and futuristic cityscape; cinematic
Characteristic
Shot : A lone warrior in futuristic armor stands in a desolate cityscape, perhaps after a battle or a long journey.
Aesthetic Score : 0.7
Mood : dark, futuristic, powerful
Quality
Entropy : 6.62
Noise : 96
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 1.00
Image errors : The armor has some artifacts and appears slightly blurry in places. The cityscape is slightly blurry and lacks detail.
A City Under a Gloomy Sky
An aerial view of a sprawling metropolis, shrouded in a cloudy sky. The towering skyscrapers and dense urban fabric create a sense of grandeur and scale, while the gloomy atmosphere adds a touch of melancholy.
Prompt
Stylized: Energetic and lively ; A panoramic view of a bustling city; long shot; Tourism; A vibrant and colorful cityscape; cinematic
Characteristic
Shot : Aerial view of a city with overcast sky and a river or bay in the background.
Aesthetic Score : 0.6
Mood : gloomy, urban, industrial
Quality
Entropy : 6.85
Noise : 109
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : Some artifacts are visible in the image, especially in the sky and the water.
Silhouetted Against the Setting Sun: A Moment of Solitude in the Desert
A lone hiker stands at the peak of a sand dune, bathed in the warm glow of the setting sun. The vast desert landscape stretches out before them, creating a sense of tranquility and contemplation. The silhouette of the hiker against the expansive terrain emphasizes their solitude and the immense scale of the natural world.
Prompt
Stylized: Serene and contemplative ; A lone traveler gazing at a breathtaking sunset; medium shot; Travel; A vast desert landscape; cinematic
Characteristic
Shot : A lone hiker stands on a sand dune in a vast desert, looking out at the horizon under a hazy sunset
Aesthetic Score : 0.7
Mood : solitude, calm, vast
Quality
Entropy : 6.63
Noise : 99
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible artifacts or errors
Golden Hour Romance: A Couple’s Laughter Fills the Air
A heartwarming scene unfolds as a couple embraces and laughs in a sun-drenched field. The golden hour light casts a warm glow, highlighting their joy and connection. This image captures the essence of love and happiness, making it a truly romantic and aesthetic moment.
Prompt
Stylized: Energetic and lively ; laughing and playing in a park; medium shot; A sunny and idyllic park setting; cinematic
Characteristic
Shot : A couple laughing together in a field with trees and a blurred background of cars and people.
Aesthetic Score : 0.8
Mood : joyful, playful, romantic
Quality
Entropy : 6.81
Noise : 106
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable errors
Solitude in the Storm’s Embrace
A solitary figure stands defiant against the elements, silhouetted against a stormy sky. The dramatic contrast of light and dark, the turbulent sea, and the lone figure evoke a sense of melancholic mystery and vulnerability.
Prompt
Stylized: Dramatic and powerful ; A lone figure standing on a cliff overlooking a vast ocean; long shot; Heroism; A stormy sea with dramatic clouds; cinematic
Characteristic
Shot : A lone figure stands on a cliff overlooking a vast, stormy sea. The sky is filled with dark, dramatic clouds, creating a sense of foreboding.
Aesthetic Score : 0.75
Mood : dark, dramatic, solitary
Quality
Entropy : 6.88
Noise : 89
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant image errors are visible. There is a slight noise in the sky, but it is not intrusive.
Secrets Unveiled in Candlelight
A dimly lit room, shrouded in mystery. A map, adorned with pins, lies open on a table, illuminated by the flickering glow of a single candle. The shadows dance, whispering secrets of a hidden adventure.
Prompt
Stylized: Intriguing and mysterious ; A map with pins marking locations of hidden treasures; close-up; Adventure; A dimly lit room with antique furniture; cinematic
Characteristic
Shot : A dimly lit room with a wooden table and a map spread out on it. A lone candle casts a warm glow on the scene, and a few objects in the background are visible.
Aesthetic Score : 0.7
Mood : mysterious, suspenseful, intriguing
Quality
Entropy : 6.55
Noise : 91
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.60
Image errors : The image has a slight blurriness to it, which could be improved upon. The textures on the paper and wood feel slightly unnatural and the colors are desaturated.
Hunter in the Twilight
A lone warrior, cloaked in shadow and armed with a bow, stands poised in a dimly lit forest. The setting sun casts a warm glow, illuminating her focused expression and the deadly weapon in her hand. This image evokes a sense of intensity, mystery, and danger, leaving the viewer wondering what lies ahead.
Prompt
Stylized: Intense and focused ; A player’s character, a skilled archer, aiming at a target; close-up; Gaming; A dark and mysterious forest; cinematic
Characteristic
Shot : A woman in a dark hooded cloak is aiming a bow and arrow in a shadowy forest, lit by a warm light from an unknown source
Aesthetic Score : 0.7
Mood : intense, dramatic, mysterious
Quality
Entropy : 6.54
Noise : 115
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.80
Image errors : Some artifacts are visible in the background and in the textures of the woman’s clothes and hair. The lighting is somewhat unnatural, leading to inconsistencies in the color and tone of the image.
A Moment of Connection Amidst the City Lights
Two women share a cozy dinner, their intimate conversation illuminated by warm lighting. The blurred city backdrop adds a touch of mystery to this romantic scene.
Prompt
Stylized: Social and celebratory ; A group of friends enjoying a meal at a restaurant with a view; medium shot; Tourism; A bustling city street with vibrant lights; cinematic
Characteristic
Shot : A group of people are dining outdoors at night, the image is set in a city with buildings in the background, the scene is lit by warm yellow lighting, creating a cozy atmosphere.
Aesthetic Score : 0.6
Mood : romantic, cozy, inviting
Quality
Entropy : 6.72
Noise : 83
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image contains some artifacts, primarily in the background, which is blurry and lacking detail. The sharpness and focus on the characters and food are also not ideal, the edges of the image are slightly fuzzy.
Conclusion
The results show that the generative AI model performed well in understanding the camera position and shot composition, but struggled with the aesthetic style. Here’s a breakdown:
- Camera Position: The model scored 0.4, which is considered okay. This means the generated image’s camera position was somewhat different from what was requested in the prompt.
- Shot Analysis: The model scored 0.54, which is considered good. This indicates the generated image’s shot composition was fairly close to what was described in the prompt.
- Aesthetic Analysis: The model scored 0.07, which is considered okay. This means the generated image’s aesthetic style was somewhat different from the desired style.
Overall, the model seems to be better at understanding the scene and shot composition than the desired aesthetic style.
Sources:
- https://heartofnoir.com/knowing-noir/aesthetic-of-noir/
- https://www.yellowbrick.co/blog/film/maximizing-the-visual-impact-unveiling-the-art-of-film-aesthetics
- https://www.questjournals.org/jrhss/papers/vol10-issue8/1008255260.pdf
- https://www.jstor.org/stable/3331672
- https://www.cinepoetics.fu-berlin.de/activities/workshops/2020-12-ws/index.html
- https://resource.download.wjec.co.uk/vtc/2016-17/16-17_1-22/eng/Part%201%20What%20is%20Aesthetics.pdf
- https://deepmind.google/technologies/imagen-2/