AI Struggles to Capture the 'Dramatic' Aesthetic with Imagen-v3-fast
- 9 minutes read - 1744 wordsTable of Contents
The ‘dramatic’ aesthetic is a powerful tool in visual storytelling. It uses elements like strong contrasts, dramatic lighting, and evocative compositions to create a sense of intensity and emotion. This style is often used in film, photography, and even graphic design to draw the viewer in and create a lasting impression. But can AI understand and replicate this aesthetic? In this blog post, we explore the results of an experiment that tested an AI model’s ability to generate images based on specific aesthetics, including the ‘dramatic’ style. We’ll analyze the results and discuss the challenges of teaching AI to understand and replicate artistic styles.
Created with: imagen-v3-fast
Silhouette of Solitude: A Figure Walks Towards the Setting Sun
A lone figure traverses a vast, empty plain as the sun dips below the horizon, casting a dramatic silhouette. The scene evokes a sense of melancholy, serenity, and contemplation, with the figure’s isolation against the fiery sunset creating a powerful sense of grandeur and solitude.
Prompt
style-aesthetic Avant-garde: Epic, melancholic ; A lone figure, silhouetted against a blazing sunset; long shot; Heroism; A vast, desolate landscape; cinematic
Characteristic
Shot : A lone figure walks towards the setting sun on a vast, empty plain, the silhouette creating a sense of solitude and mystery.
Aesthetic Score : 0.7
Mood : melancholy, serene, contemplative
Quality
Entropy : 6.82
Noise : 42
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable errors.
A Hand Reaches Towards the Unknown
A swirling vortex of light and energy beckons, promising a journey beyond the familiar. This mysterious scene evokes a sense of wonder and invites contemplation of the possibilities that lie beyond our perception.
Prompt
style-aesthetic Avant-garde: Surreal, mysterious ; A hand reaching out from a swirling vortex of light; close-up; Adventure; A kaleidoscope of colors and abstract shapes; cinematic
Characteristic
Shot : A hand reaching out towards a swirling vortex of light and energy, possibly representing a portal or a gateway to another dimension.
Aesthetic Score : 0.7
Mood : mysterious, otherworldly, surreal
Quality
Entropy : 5.85
Noise : 74
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.80
Image errors : No notable artifacts or errors.
Pixelated Loneliness in a Cyberpunk Cityscape
A solitary figure stands on a rooftop ledge, gazing out over a pixelated city bathed in the glow of a digital moon. The scene evokes a sense of nostalgic longing and futuristic isolation, capturing the essence of cyberpunk aesthetics.
Prompt
style-aesthetic Avant-garde: Nostalgic, futuristic ; A pixelated character, rendered in a retro 8-bit style, standing on a precipice overlooking a digital cityscape; medium shot; Gaming; A neon-lit, futuristic cityscape; cinematic
Characteristic
Shot : A pixelated cityscape at night with a single figure standing on a ledge overlooking the city. The sky is a dark blue with some stars and a light blue moon.
Aesthetic Score : 0.7
Mood : nostalgic, cyberpunk, futuristic
Quality
Entropy : 6.29
Noise : 64
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 1.00
Image errors : The pixelated art style is intentional, but the image has visible blockiness and color banding, especially in the sky and the shadows.
A Suitcase Full of Memories
A vintage suitcase sits alone on a train platform, its weathered leather whispering tales of journeys past. The blurred background adds a sense of nostalgia and mystery, leaving you wondering about the stories it holds and the destinations it has seen.
Prompt
style-aesthetic Avant-garde: Lonely, evocative ; A single, weathered suitcase, abandoned on a deserted train platform; close-up; Tourism; A misty, atmospheric train station; cinematic
Characteristic
Shot : A vintage suitcase sits alone on the platform of a train station. The background is blurred, creating a sense of depth.
Aesthetic Score : 0.7
Mood : nostalgic, lonely, journey
Quality
Entropy : 6.62
Noise : 64
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.50
Image errors : The image appears to be slightly over-sharpened, resulting in a slightly artificial look.
Lost in the Twilight’s Embrace
A solitary figure walks down a cracked, deserted street, the hazy sunset casting long shadows and amplifying the sense of isolation. The buildings lining the path stand silent witnesses to the person’s journey, leaving the viewer to ponder their destination and the mysteries that lie ahead.
Prompt
style-aesthetic Avant-garde: Disorienting, dreamlike ; A pair of feet walking on a cracked, abstract pavement; low-angle shot; Travel; A distorted, surreal cityscape; cinematic
Characteristic
Shot : A person walking down a cracked, empty street toward a hazy sunset, with buildings lining the street on both sides
Aesthetic Score : 0.6
Mood : desolate, lonely, mysterious
Quality
Entropy : 6.63
Noise : 84
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.80
Image errors : The buildings and the environment are somewhat blurry and lack detail.
Candlelight Secrets: Three Women in a Mysterious Gathering
A single candle illuminates the faces of three women in a dimly lit room, creating an intimate and suspenseful atmosphere. The warm glow casts shadows, adding depth and mystery to the scene.
Prompt
style-aesthetic Avant-garde: Intimate, mysterious ; A family gathered around a flickering candle, their faces obscured by shadows; close-up; Family; A dimly lit, antique room; cinematic
Characteristic
Shot : Three women in a dimly lit room, their faces illuminated by a single candle, creating an intimate and mysterious atmosphere. The warm candlelight creates a soft glow, casting shadows and adding depth to the scene.
Aesthetic Score : 0.7
Mood : mysterious, intimate, suspenseful
Quality
Entropy : 6.35
Noise : 58
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry, particularly in the background. There are some minor artifacts around the edges of the candle.
Simple Beauty: A Red Balloon Against a White Canvas
This minimalist image captures the essence of playfulness with a single red balloon floating against a stark white background. The balloon’s vibrant color and round shape create a striking contrast, drawing the eye and evoking a sense of joy and simplicity.
Prompt
style-aesthetic Avant-garde: Hopeful, symbolic ; A single, red balloon floating against a stark, white background; close-up; Heroism; A minimalist, abstract setting; cinematic
Characteristic
Shot : A single red balloon against a white background.
Aesthetic Score : 0.5
Mood : simple, minimalist, playful
Quality
Entropy : 5.50
Noise : 11
Prompt Clip Score : 0.35
AI Evaluation
Likelihood of AI : 0.20
Image errors : None
Pixelated Nostalgia: A Retro Gaming Moment
A classic scene of a person immersed in a retro video game, the pixelated screen and worn controller evoking a sense of nostalgic excitement. The dark mood and blurry image add to the feeling of being transported back in time.
Prompt
style-aesthetic Avant-garde: Nostalgic, introspective ; A hand holding a vintage game controller, the screen reflecting a distorted, pixelated world; close-up; Gaming; A dimly lit, retro-themed room; cinematic
Characteristic
Shot : A person is playing a video game on a retro TV with a controller. The TV is showing a pixelated and blurry image.
Aesthetic Score : 0.4
Mood : nostalgic, retro, dark
Quality
Entropy : 6.45
Noise : 50
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image is overly soft and blurry. The pixels on the screen are not defined enough, and the lighting is uneven. The overall image appears to be lacking clarity and sharpness.
A Solitary Figure at the Edge of the Universe
A lone figure stands on a mountain peak, silhouetted against a swirling celestial portal. The scene evokes a sense of awe and wonder, suggesting a journey of spiritual discovery and the vastness of the universe.
Prompt
style-aesthetic Avant-garde: Sublime, awe-inspiring ; A lone figure standing on a mountain peak, their silhouette framed by a swirling vortex of clouds; long shot; Adventure; A dramatic, mountainous landscape; cinematic
Characteristic
Shot : A lone figure stands on a mountain peak, silhouetted against a background of swirling, luminous rings that resemble a celestial event or an otherworldly portal. The distant mountains are bathed in an orange-hued light, suggesting a dramatic sunset or sunrise.
Aesthetic Score : 0.8
Mood : mystical, hopeful, contemplative
Quality
Entropy : 6.72
Noise : 61
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.80
Image errors : The mountain ranges appear slightly repetitive and artificial. The luminous rings might be overly smooth and uniform.
Fragmented Reality: A Digital Collage of Chaos
This abstract collage explores the interplay of geometric forms and textures, creating a disorienting and visually captivating experience. The fragmented composition evokes a sense of chaos and uncertainty, inviting viewers to delve into the depths of its digital landscape.
Prompt
style-aesthetic Avant-garde: Energetic, disorienting ; A series of fragmented, overlapping images, depicting different aspects of travel and tourism; montage; Tourism; A chaotic, abstract collage; cinematic
Characteristic
Shot : Abstract collage with fragmented shapes and geometric forms, some of which are textured.
Aesthetic Score : 0.4
Mood : abstract, geometric, digital
Quality
Entropy : 6.54
Noise : 93
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image has a digital/AI-generated look, with some blurriness and artifacts. The textures are somewhat repetitive.
Conclusion
The results show that the generative AI model performed well in terms of understanding the scene and camera position, but struggled with the aesthetic. Here’s a breakdown:
- Camera Position: The model scored 0.35, which is below the “good” range of 0.5 to 0.75. This suggests that the model didn’t quite capture the intended camera positions as described in the prompt.
- Shot Analysis: The model scored 0.53, which falls within the “good” range. This indicates that the model was able to understand the scene and create a shot that was generally consistent with the prompt.
- Aesthetic Analysis: The model scored 0.19, which is significantly higher than the “very good” range of -0.2 to 0.1. This suggests that the generated image’s aesthetic deviated significantly from the expected aesthetic described in the prompt.
Overall, the model shows promise in understanding the scene and camera positions, but needs improvement in capturing the desired aesthetic.
Sources:
- https://heartofnoir.com/knowing-noir/aesthetic-of-noir/
- https://www.yellowbrick.co/blog/film/maximizing-the-visual-impact-unveiling-the-art-of-film-aesthetics
- https://www.questjournals.org/jrhss/papers/vol10-issue8/1008255260.pdf
- https://www.jstor.org/stable/3331672
- https://www.cinepoetics.fu-berlin.de/activities/workshops/2020-12-ws/index.html
- https://resource.download.wjec.co.uk/vtc/2016-17/16-17_1-22/eng/Part%201%20What%20is%20Aesthetics.pdf
- https://deepmind.google/technologies/imagen-3/