AI Struggles to Capture the 'Dramatic' Aesthetic with Imagen-v3

AI's Blind Spot: The Challenge of Capturing Dramatic Aesthetics with Imagen-v3

Contents

The ‘dramatic’ aesthetic is a powerful tool in visual storytelling. It evokes strong emotions, creates a sense of tension, and draws the viewer into the scene. But can AI truly understand and capture this aesthetic? Recent experiments with a generative AI model suggest that while AI excels at understanding scene and camera position, it struggles to capture the nuances of a desired aesthetic. This blog post explores the challenges and potential solutions for improving AI’s ability to understand and generate images with specific aesthetics, focusing on the ‘dramatic’ style. We’ll examine examples of how this aesthetic is used in film, photography, and art, and discuss how AI can be trained to better understand and replicate it.

Created with: imagen-v3

A Lone Figure in the Ruins of Hope

A solitary figure stands amidst a desolate, post-apocalyptic landscape, bathed in the warm glow of a setting sun. The scene evokes a sense of melancholy and solitude, yet the light hints at a glimmer of hope amidst the ruins.

A Lone Figure in the Ruins of Hope

Prompt

style-aesthetic Postmodern: Epic, melancholic ; A lone figure, silhouetted against a blazing sunset; wide shot; Heroism; A vast, desolate landscape with a crumbling cityscape in the distance; cinematic

Characteristic

Shot : A lone figure stands in a desolate, post-apocalyptic landscape. The sun is setting, casting a warm orange glow over the cracked earth and the distant city skyline.

Aesthetic Score : 0.7

Mood : melancholy, hope, solitude

Quality

Entropy : 6.88

Noise : 75

Prompt Clip Score : 0.30

AI Evaluation

Likelihood of AI : 0.90

Image errors : No noticeable artifacts or errors.

Hand From the Digital Realm Reaches Out

A mysterious hand emerges from a computer screen, blurring the lines between reality and the digital world. This cyberpunk-inspired scene evokes a sense of intrigue and wonder, leaving viewers questioning what lies beyond the screen.

Hand From the Digital Realm Reaches Out

Prompt

style-aesthetic Postmodern: Surreal, playful ; A hand reaching out from a pixelated, digital world, grasping at a real-world object; close-up; Gaming; A cluttered desk with a gaming console and controllers; cinematic

Characteristic

Shot : A hand reaching out of a computer screen, seemingly from a digital world, on a desk with gaming accessories

Aesthetic Score : 0.7

Mood : futuristic, mysterious, cyberpunk

Quality

Entropy : 6.14

Noise : 62

Prompt Clip Score : 0.34

AI Evaluation

Likelihood of AI : 0.90

Image errors : The hand and the screen have some blurring and pixelation, the lighting and shadows are inconsistent, the scene has a slight plastic look.

Sun-Kissed Mystery in the City Square

A young man, his face obscured by white sunglasses reflecting the bustling city square, stands before a grand church. The sun bathes the scene in a warm glow, adding to the air of intrigue and mystery. This urban landscape whispers of secrets waiting to be unveiled.

Sun-Kissed Mystery in the City Square

Prompt

style-aesthetic Postmodern: Alienated, detached, cynical ; A lone figure, sunglasses reflecting the blinding glare of the sun, stands amidst a throng of tourists, their faces obscured by the same oversized shades. The iconic landmark looms behind, dwarfed by the human sea.; cinematic

Characteristic

Shot : A young man wearing white sunglasses stands in a crowded square in front of a large, imposing church. The sun is shining, and there are many people walking around.

Aesthetic Score : 0.6

Mood : mysterious, cool, urban

Quality

Entropy : 6.44

Noise : 89

Prompt Clip Score : 0.32

AI Evaluation

Likelihood of AI : 0.20

Image errors : None

Lost in Time: A Vintage Travel Scene

Step back in time with this nostalgic scene, featuring vintage suitcases, a fedora hat, and maps. A colorful poster, partially covering a suitcase, draws the eye, while dramatic lighting adds depth and mystery, evoking the spirit of classic adventure films.

Lost in Time: A Vintage Travel Scene

Prompt

style-aesthetic Postmodern: Nostalgic, melancholic ; A vintage travel poster, faded and torn, with a romanticized image of a foreign land; close-up; Travel; A dusty, cluttered attic filled with old suitcases and maps; cinematic

Characteristic

Shot : A vintage travel scene with suitcases, a hat, and maps. The main focal point is a colorful vintage poster that is partly covering a suitcase.

Aesthetic Score : 0.7

Mood : nostalgic, vintage, adventurous

Quality

Entropy : 6.61

Noise : 95

Prompt Clip Score : 0.32

AI Evaluation

Likelihood of AI : 0.20

Image errors : The image has some slight blurring and a few minor artifacts, but these are not very noticeable and do not detract from the overall aesthetic of the image.

Immersed in the Future: VR Gaming Takes Center Stage

A vibrant scene captures the thrill of VR gaming, as young people engage with a virtual world. Neon lights and futuristic decor create an atmosphere of wonder and immersion, highlighting the excitement and playful nature of this cutting-edge technology.

Immersed in the Future: VR Gaming Takes Center Stage

Prompt

style-aesthetic Postmodern: Energetic, futuristic ; A group of friends, their faces obscured by digital avatars, playing a virtual reality game; medium shot; Gaming; A brightly lit, futuristic arcade with neon lights and holographic displays; cinematic

Characteristic

Shot : A group of young people are wearing VR headsets and interacting with a virtual world. The setting appears to be a gaming arcade or entertainment center, with bright neon lights and futuristic decor.

Aesthetic Score : 0.6

Mood : futuristic, immersive, playful

Quality

Entropy : 6.64

Noise : 75

Prompt Clip Score : 0.33

AI Evaluation

Likelihood of AI : 0.20

Image errors : The image is slightly blurry and some of the colors appear oversaturated. The lighting is uneven, creating some dark areas in the scene.

Lost in the Crowd: A Moment of Solitude in the Airport

A lone traveler navigates the bustling airport terminal, his suitcase trailing behind him. The scene, captured from behind, evokes a sense of calm amidst the chaos, suggesting themes of travel, journey, and new beginnings.

Lost in the Crowd: A Moment of Solitude in the Airport

Prompt

style-aesthetic Postmodern: Lonely, alienated ; A lone traveler, their back to the camera, walking through a crowded airport terminal; long shot; Travel; A chaotic airport terminal with people rushing and luggage carts; cinematic

Characteristic

Shot : A lone traveler walks through a crowded airport terminal, pulling a suitcase behind him. The scene is captured from behind the traveler, looking towards the front of the terminal. The terminal is brightly lit, and the atmosphere is bustling and slightly chaotic.

Aesthetic Score : 0.6

Mood : solitary, calm, urban

Quality

Entropy : 6.65

Noise : 99

Prompt Clip Score : 0.29

AI Evaluation

Likelihood of AI : 0.20

Image errors : No significant errors are evident in the image. The image quality is good, and the lighting is balanced. There are some minor imperfections in the background elements, but they are barely noticeable.

A Moment of Quiet Contemplation: A Family Portrait in Soft Light

This intimate family portrait captures a moment of quiet contemplation, bathed in soft light and muted colors. The modern interior setting, with a large window in the background, adds a sense of depth and mystery to the scene. The family members, dressed in casual clothing and holding small objects, exude a sense of connection and shared experience.

A Moment of Quiet Contemplation: A Family Portrait in Soft Light

Prompt

style-aesthetic Postmodern: Reflective, nostalgic ; A family portrait, with each member holding a different, iconic object from their travels; medium shot; Family; A minimalist, modern living room with a large window overlooking a cityscape; cinematic

Characteristic

Shot : A family portrait set in a modern interior with a large window in the background. The family members are all dressed in casual clothing and are holding small objects. The overall mood of the image is one of quiet contemplation.

Aesthetic Score : 0.7

Mood : serious, contemplative, intimate

Quality

Entropy : 6.72

Noise : 95

Prompt Clip Score : 0.28

AI Evaluation

Likelihood of AI : 0.20

Image errors : There are no noticeable errors in the image.

Lost in the Woods, Guided by a Flickering Screen

A lone hand clutches a smartphone, its screen illuminating a map app in the heart of a shadowy forest. A red pin marks the destination, but the path ahead remains shrouded in mystery. Is this a journey of adventure or a descent into the unknown?

Lost in the Woods, Guided by a Flickering Screen

Prompt

style-aesthetic Postmodern: Intriguing, suspenseful ; A hand holding a smartphone, displaying a map with a pin dropped on a remote, unknown location; close-up; Adventure; A dark, mysterious forest with dense foliage and shadows; cinematic

Characteristic

Shot : A hand holding a smartphone with a map app open, in a dark forest setting. The map shows a red pin on a location, suggesting the person is looking for directions.

Aesthetic Score : 0.4

Mood : mysterious, adventurous, suspenseful

Quality

Entropy : 6.05

Noise : 67

Prompt Clip Score : 0.36

AI Evaluation

Likelihood of AI : 0.80

Image errors : The image appears to be generated with an AI model, with some artifacts visible in the foliage and the hand. The map app is not fully rendered, and the text on the map is blurry.

One Hero Stands Against the Ashes

A lone superhero, silhouetted against a fiery cityscape, embodies hope and resilience in the face of utter devastation. This powerful image captures the essence of heroism in a world consumed by darkness.

One Hero Stands Against the Ashes

Prompt

style-aesthetic Postmodern: Desolate, hopeful ; A superhero, their costume ripped and tattered, standing on a rooftop overlooking a city in chaos; wide shot; Heroism; A dystopian cityscape with crumbling buildings and smoke in the air; cinematic

Characteristic

Shot : A lone superhero stands on a rooftop overlooking a post-apocalyptic cityscape engulfed in fire and smoke.

Aesthetic Score : 0.6

Mood : dark, ominous, heroic

Quality

Entropy : 6.74

Noise : 84

Prompt Clip Score : 0.34

AI Evaluation

Likelihood of AI : 0.80

Image errors : There are some minor artifacts and errors in the image, particularly in the smoke and flames, which appear a bit blurry and unrealistic.

The Last Sentinel: A Robot Stands Guard in a Dystopian City

A solitary robot, clad in military garb, stands defiantly in the heart of a blurred, futuristic cityscape. Its outstretched arms and dramatic pose evoke a sense of isolation and tension, hinting at a world where humanity and technology have collided in a dystopian future.

The Last Sentinel: A Robot Stands Guard in a Dystopian City

Prompt

style-aesthetic Postmodern: Surreal, humorous ; A vintage video game character, rendered in a hyper-realistic style, standing in a real-world environment; medium shot; Gaming; A bustling city street with people and traffic; cinematic

Characteristic

Shot : A robot in a military uniform standing in the middle of a city street, the background is blurry and stylized, it resembles a 3D rendering

Aesthetic Score : 0.5

Mood : futuristic, dystopian, robotic

Quality

Entropy : 6.69

Noise : 98

Prompt Clip Score : 0.32

AI Evaluation

Likelihood of AI : 1.00

Image errors : The image is very artificial, the robot looks flat and unrealistic, the background is too blurry and doesn’t seem to be the same style as the robot.

Conclusion

The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic. Here’s a breakdown:

  • Camera Position: The model scored 0.2, indicating it’s not very good at reacting to camera positions in prompts. A score between 0.5 and 0.75 would be considered good, and above 0.75 very good.
  • Shot Analysis: The model scored 0.48, which is good at understanding the scene in a prompt. A score between 0.5 and 0.75 would be considered good, and above 0.75 very good.
  • Aesthetic Analysis: The model scored 0.14, which is not very good at matching the expected aesthetic. A score between -0.2 and 0.1 would be considered very good.

Overall, the model seems to be better at understanding the scene than the camera position, but it needs improvement in capturing the desired aesthetic.

Sources: