AI's Artistic Struggle: Capturing the Scene, Not the Feeling with Imagen-v3
- 9 minutes read - 1846 wordsTable of Contents
In the realm of artificial intelligence, the ability to generate images based on textual descriptions is a rapidly evolving field. While impressive strides have been made, the challenge of capturing the nuances of human perception and artistic expression remains. This blog post examines the results of an AI model tasked with creating images based on detailed scene descriptions, highlighting its strengths and weaknesses in capturing the essence of a scene.
Created with: imagen-v3
Awe-Inspiring Sunset on a Mountain Peak
Two figures stand silhouetted against a breathtaking sunset, overlooking a sea of clouds. The vastness of the scene evokes a sense of wonder and perspective, capturing the majestic beauty of nature at its finest.
Prompt
poses looking-at-each-other: determined, awe-inspired ; A lone adventurer, standing on a mountain peak; wide shot; adventure; a vast, breathtaking landscape with clouds swirling below; cinematic
Characteristic
Shot : Two figures standing on a mountain peak overlooking a sea of clouds at sunset.
Aesthetic Score : 0.8
Mood : serene, majestic, awe-inspiring
Quality
Entropy : 6.67
Noise : 68
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable image errors.
A Moment of Truth on the Battlefield
Two soldiers locked in a tense confrontation amidst the chaos of battle. The smoke and blurred figures in the background create a sense of urgency and danger, highlighting the intensity of the moment.
Prompt
poses looking-at-each-other: tense, hopeful ; Two soldiers, one injured, the other holding a shield; medium shot; heroism; a battlefield with smoke and fire in the background; cinematic
Characteristic
Shot : Two soldiers in military uniforms, likely from the 19th century or early 20th century, face each other in a tense confrontation. They appear to be in the midst of a battlefield, with smoke and the blurred figures of other soldiers in the background.
Aesthetic Score : 0.7
Mood : intense, dramatic, war-torn
Quality
Entropy : 6.54
Noise : 95
Prompt Clip Score : 0.36
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant image errors. The image is crisp with no noticeable artifacts.
The Intensity of the Game is Palpable
Two young men, bathed in the glow of blue and red lights, are locked in a fierce competition. Their faces, illuminated by the dramatic lighting, reveal a mix of focus and intensity as they engage in a thrilling gaming session.
Prompt
poses looking-at-each-other: intense, focused ; Two gamers, heads bent over a screen; close-up; gaming; a dimly lit room with neon lights reflecting on their faces; cinematic
Characteristic
Shot : Close-up shot of two young men in a dimly lit room, illuminated by blue and red lights, likely gaming or watching something on a screen, focused on the screen and the action. The focus is on the face of the man on the right.
Aesthetic Score : 0.6
Mood : intense, focused, competitive
Quality
Entropy : 6.37
Noise : 75
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.10
Image errors : Slight compression artifacts are visible, particularly around the edges of the faces and the clothing. No significant errors in the image.
What Did They See? Shocked Reactions on a Busy City Street
A group of four young adults, caught in a moment of shared surprise, stand on a bustling city street. Their expressions of shock and wide-eyed wonder leave viewers curious about what has captured their attention. The scene, with its vibrant background and dramatic reactions, hints at an unexpected event or a thrilling discovery.
Prompt
poses looking-at-each-other: excited, curious ; A group of tourists, standing in front of a famous landmark; medium shot; tourism; a bustling city street with people and vehicles passing by; cinematic
Characteristic
Shot : A group of four young adults, three men and one woman, are standing in a city street. They are all looking at something off camera with expressions of shock and surprise. The background is a busy city street with buildings and an archway.
Aesthetic Score : 0.6
Mood : surprise, shock, excitement
Quality
Entropy : 6.82
Noise : 83
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image quality is slightly grainy. The colors are a little muted and the exposure is a bit dark.
A Moment of Shared Reflection on a Train Journey
Two women, lost in thought, gaze out the window of a moving train, their expressions hinting at a shared melancholic contemplation. The passing countryside adds a sense of tranquility to the intimate scene, highlighting the quiet connection between the two figures.
Prompt
poses looking-at-each-other: reflective, nostalgic ; Two friends, sitting on a train, looking out the window; medium shot; travel; a scenic landscape with rolling hills and fields; cinematic
Characteristic
Shot : Two women are sitting in a train, looking out the window. The scene is set on a train journey with a view of countryside in the distance.
Aesthetic Score : 0.7
Mood : melancholic, contemplative, introspective
Quality
Entropy : 5.80
Noise : 65
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.00
Image errors : Slight chromatic aberration in the edges of the image.
A Midnight Serenade: A Romantic Rendezvous in the Heart of the Forest
In the heart of a moonlit forest, a couple shares an intimate moment, bathed in the warm glow of a lantern. The interplay of light and shadow adds a touch of mystery to their romantic rendezvous, leaving their emotions to the viewer’s interpretation.
Prompt
poses looking-at-each-other: warm, intimate ; A group of friends, huddled together around a campfire; close-up; groups; a dark forest with stars twinkling in the sky; cinematic
Characteristic
Shot : A couple is standing in a forest at night, illuminated by a warm light source, possibly a fire or lantern.
Aesthetic Score : 0.7
Mood : romantic, intimate, mysterious
Quality
Entropy : 5.25
Noise : 85
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no noticeable errors in the image.
Silhouettes of Solitude: A Melancholy Sunset on the Coast
Two figures stand silhouetted against a dramatic sunset, their forms merging with the rugged rock formations in the distance. The scene evokes a sense of melancholy and solitude, with the mysterious lighting adding an air of intrigue.
Prompt
poses looking-at-each-other: melancholy, contemplative ; A lone figure, standing on a deserted beach; wide shot; adventure; a vast ocean with crashing waves and a setting sun; cinematic
Characteristic
Shot : Two figures standing on a beach with a dramatic sunset and rock formations in the distance.
Aesthetic Score : 0.7
Mood : melancholy, solitude, mysterious
Quality
Entropy : 6.73
Noise : 77
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.60
Image errors : Some of the shadows are a bit too sharp and unnatural, possibly a result of post-processing. The figure on the right has a very sharp line on the left side of the body, this could be an artifact of image editing or a result of AI generation.
Two Astronauts, One Earth, Infinite Possibilities
A captivating image of two astronauts in space suits, facing each other against the backdrop of Earth. The scene evokes a sense of wonder, mystery, and hope, leaving viewers pondering the relationship between the astronauts and the vastness of space.
Prompt
poses looking-at-each-other: awe-inspired, hopeful ; Two astronauts, floating in space; medium shot; heroism; a view of Earth from space with stars and galaxies in the background; cinematic
Characteristic
Shot : Two astronauts in space suits face each other, with Earth in the background.
Aesthetic Score : 0.7
Mood : mysterious, futuristic, hopeful
Quality
Entropy : 5.98
Noise : 93
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.80
Image errors : There are some minor artifacts in the image, particularly in the background. The lighting on the astronauts’ faces could be improved.
Danger Lurks in the Jungle: What Lies Ahead for These Explorers?
Four young adventurers, clad in safari gear, stand amidst the vibrant green foliage of a dense jungle. Their expressions, a mix of concern and anticipation, hint at the unknown dangers that may lie ahead. The suspenseful atmosphere leaves viewers wondering what awaits these explorers in the heart of the wild.
Prompt
poses looking-at-each-other: curious, adventurous ; A group of explorers, standing in a jungle clearing; medium shot; adventure; lush greenery with sunlight filtering through the leaves; cinematic
Characteristic
Shot : Four young adults, dressed in safari gear, stand in a lush green jungle. They are looking off-camera with expressions of concern and anticipation.
Aesthetic Score : 0.6
Mood : suspenseful, adventurous, apprehensive
Quality
Entropy : 6.61
Noise : 106
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.10
Image errors : No major artifacts or errors, slightly grainy
A Moment of Intimacy: Silhouetted Romance Against City Lights
In this captivating scene, a couple shares a romantic moment on a bridge at night, their silhouettes framed against the dreamy bokeh of city lights. The intimate atmosphere is heightened by the dramatic contrast of darkness and distant illumination, creating a truly unforgettable image.
Prompt
poses looking-at-each-other: romantic, intimate ; Two lovers, standing on a bridge overlooking a city; medium shot; tourism; a cityscape with twinkling lights and a river flowing below; cinematic
Characteristic
Shot : A couple is standing on a bridge at night, silhouetted against the city lights in the background. They are looking into each other’s eyes, with a sense of romance and intimacy. The bokeh of the city lights creates a dreamy and romantic atmosphere.
Aesthetic Score : 0.7
Mood : romantic, intimate, dreamy
Quality
Entropy : 6.06
Noise : 80
Prompt Clip Score : 0.37
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has a slight amount of noise, especially in the darker areas, likely due to high ISO.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.3, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.475, which is also below average. This indicates that the model didn’t fully understand the scene described in the prompt and didn’t create the expected shot composition.
- Aesthetic Analysis: The model scored 0.01, which is considered very good. This means the generated image closely matched the expected aesthetic style.
Overall, the model seems to be better at capturing the desired aesthetic than understanding the scene and camera position. This suggests that the model might need further training to improve its ability to interpret and translate prompts into accurate visual representations.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-3/