AI's Artistic Journey: Capturing Poses, But Missing the Essence with Imagen-v3
- 9 minutes read - 1904 wordsTable of Contents
The world of AI is rapidly evolving, with models capable of generating impressive images based on text prompts. However, the ability to capture the essence of a scene, particularly the desired aesthetic, remains a challenge. This blog post delves into the results of an AI model tasked with generating images based on specific scenes and poses, highlighting its strengths and weaknesses in capturing the artistic vision.
Created with: imagen-v3
A Moment of Awe: Hiker Contemplates the Majesty of Nature
A lone figure stands on a mountain peak, dwarfed by the vastness of the landscape. Clouds fill the valley below, and a distant mountain range stretches out on the horizon. The scene evokes a sense of serenity, majesty, and contemplation, highlighting the beauty and scale of nature.
Prompt
poses face-to-face: Determined, awe-inspiring ; A lone adventurer, standing on a mountain peak; wide shot; Adventure; Majestic mountain range with clouds swirling around; cinematic
Characteristic
Shot : A lone figure stands on a mountain peak overlooking a valley filled with clouds and a distant mountain range
Aesthetic Score : 0.8
Mood : serene, majestic, contemplative
Quality
Entropy : 6.78
Noise : 97
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible artifacts or errors.
Lost in the Shadows: Four Women Face Unknown Terror in Dark Forest
A chilling image captures four young women trapped in a dense, foreboding forest. Their faces, illuminated by a distant clearing, reveal fear and distress. The dark, claustrophobic setting and dramatic use of light and shadow create a palpable sense of suspense and foreboding.
Prompt
poses face-to-face: Suspenseful, mysterious ; A group of friends, huddled together in a dark forest; medium shot; Adventure; Tall trees casting long shadows, sunlight filtering through the leaves; cinematic
Characteristic
Shot : Four young women stand in a dark forest, their faces illuminated by the light of a nearby clearing. The women are wearing dark clothing and appear to be frightened or in distress. The background is a dense forest with tall trees and low-hanging branches, creating a sense of claustrophobia.
Aesthetic Score : 0.7
Mood : tense, suspenseful, foreboding
Quality
Entropy : 5.91
Noise : 84
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image quality is good, there are no noticeable artifacts or errors.
Clash of Titans: Two Warriors Locked in Fiery Confrontation
A tense standoff unfolds in the heart of a fiery inferno. Two men, one helmeted and the other with a flowing beard, face off in a battle of wills. The dramatic lighting and close-up shot heighten the intensity, drawing the viewer into the heart of the action.
Prompt
poses face-to-face: Brave, intense ; A seasoned warrior, facing down a fearsome dragon; close-up; Heroism; Fiery dragon with glowing eyes, smoke billowing around; cinematic
Characteristic
Shot : Two men are locked in a fierce confrontation. One is wearing a helmet and the other has a beard and long hair. The background is dark and fiery with a sense of intensity.
Aesthetic Score : 0.7
Mood : intense, dramatic, confrontational
Quality
Entropy : 6.48
Noise : 99
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has a few minor artifacts, such as the slight blurriness of the background and some noise around the edges of the characters. There are also some areas where the edges of the characters are not as smooth as they could be.
Shadowy Figure in a Neon-Lit City: A Cyberpunk Mystery
A hooded figure sits hunched over a computer in a dimly lit room, their face obscured by the shadows. The city skyline outside is a blur of neon lights, hinting at a futuristic world. This mysterious scene evokes a sense of intrigue and danger, suggesting a clandestine operation or a hacker working in the shadows.
Prompt
poses face-to-face: Intense, driven, solitary ; A lone figure hunched over a glowing screen, fingers flying across the keyboard. The reflection of a neon-drenched cityscape flickers in the monitor.; cinematic
Characteristic
Shot : A hooded figure sits in front of a computer in a dimly lit room, facing away from the viewer, typing on a keyboard, with a city skyline blurred in the background. Neon lights illuminate the scene.
Aesthetic Score : 0.7
Mood : mysterious, cyberpunk, futuristic
Quality
Entropy : 6.40
Noise : 71
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.80
Image errors : Some minor noise in the background and slight blurriness around the edges. This is a common occurrence in artificial intelligence-generated images.
Love in the City of Light: A Moment of Intimacy at the Eiffel Tower
In this heartwarming scene, a couple shares a tender moment in front of the iconic Eiffel Tower. With the woman’s hand gently resting on the man’s shoulder and their eyes locked in a loving gaze, the photograph captures the essence of romance and happiness. The out-of-focus background adds a dreamy quality to the image, emphasizing the couple’s connection and the enchanting atmosphere of Paris.
Prompt
poses face-to-face: Romantic, nostalgic ; A couple, gazing at each other in front of the Eiffel Tower; medium shot; Tourism; Romantic Parisian cityscape with the Eiffel Tower in the background; cinematic
Characteristic
Shot : A couple is standing in front of the Eiffel Tower and looking at each other. The woman has her hand on the man’s shoulder and they are both smiling. The background is out of focus.
Aesthetic Score : 0.7
Mood : romantic, loving, happy
Quality
Entropy : 6.79
Noise : 86
Prompt Clip Score : 0.36
AI Evaluation
Likelihood of AI : 0.20
Image errors : No major errors, colors are slightly faded
Lost in Thought: A Moment of Intrigue in the Market
A young woman stands amidst the bustling energy of a market, her gaze fixed directly on the viewer. Her pensive expression, a blend of curiosity and thoughtfulness, invites us to wonder what secrets she holds. The scene is a captivating study in human emotion, leaving us with a lingering sense of mystery.
Prompt
poses face-to-face: Curious, vibrant ; A traveler, standing on a bustling street market; medium shot; Travel; Colorful stalls overflowing with exotic goods, people bustling around; cinematic
Characteristic
Shot : A young woman is standing in the middle of a crowded market street, looking directly at the camera with a pensive expression.
Aesthetic Score : 0.7
Mood : intrigued, thoughtful, curious
Quality
Entropy : 6.73
Noise : 105
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly overexposed, and there is some noise in the background.
Fear in the Shadows: A Haunting Encounter in the Dark Forest
A group of four, caught in the eerie glow of a campfire, stare out with expressions of fear and concern. The low-key lighting and close-up framing create a palpable sense of suspense and tension, leaving the viewer wondering what lurks in the darkness.
Prompt
poses face-to-face: Intimate, suspenseful ; A group of explorers, huddled around a campfire; medium shot; Adventure; Dark forest with flickering flames illuminating their faces; cinematic
Characteristic
Shot : A group of four people, two women and two men, are standing in a dark forest, lit by a fire in the foreground, they are all looking towards the viewer, with expressions of fear and concern.
Aesthetic Score : 0.7
Mood : suspenseful, dark, mysterious
Quality
Entropy : 6.33
Noise : 103
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible errors
A City That Reaches for the Stars
From a high vantage point, a woman gazes out at a sprawling metropolis. Towering skyscrapers pierce the sky, creating a sense of awe and wonder. The urban landscape, bathed in the glow of the future, inspires a feeling of both grandeur and possibility.
Prompt
poses face-to-face: Awe-inspiring, hopeful ; A woman, looking up at a towering skyscraper; wide shot; Tourism; Modern cityscape with towering skyscrapers and bustling streets; cinematic
Characteristic
Shot : A woman stands on a platform overlooking a bustling city street. Skyscrapers rise up on both sides, with one towering skyscraper in the center of the image.
Aesthetic Score : 0.7
Mood : urban, futuristic, awe
Quality
Entropy : 6.90
Noise : 104
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no noticeable artifacts or errors in the image.
Victory Dance! Friends Celebrate a Gaming Triumph
Capture the joy of a shared gaming experience as a group of friends revel in a victory, with one woman’s excitement taking center stage. The vibrant lighting and framing emphasize the winning moment, creating a playful and joyful atmosphere.
Prompt
poses face-to-face: Joyful, celebratory ; A group of friends, celebrating a victory in a video game; close-up; Gaming; Brightly lit gaming room with controllers and headsets; cinematic
Characteristic
Shot : A group of friends are playing a video game, with one woman in the foreground, holding a controller and looking excited.
Aesthetic Score : 0.7
Mood : joyful, excited, playful
Quality
Entropy : 6.42
Noise : 81
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible errors
Silhouetted Solitude: A Man’s Contemplation at Sunset
A solitary figure stands on a deserted beach, their silhouette stark against the fiery hues of the setting sun. The scene evokes a sense of serene contemplation and quiet loneliness, leaving the viewer to ponder the man’s thoughts and the mysteries of the vast ocean before him.
Prompt
poses face-to-face: Melancholy, contemplative ; A lone traveler, standing on a deserted beach; wide shot; Travel; Vast ocean stretching out to the horizon, golden sunset; cinematic
Characteristic
Shot : A man standing on a beach at sunset, looking out at the ocean. The man is silhouetted against the setting sun, with only his outline visible. The beach is empty, and the water is calm.
Aesthetic Score : 0.7
Mood : serene, contemplative, lonely
Quality
Entropy : 6.53
Noise : 72
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable artifacts or errors
Conclusion
The analysis shows that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis.
Here’s a breakdown:
- Camera Position: The model scored 0.53, which falls within the “good” range (0.5 to 0.75). This indicates that the model was able to accurately capture the camera positions described in the prompt.
- Shot Analysis: The model scored 0.51, also within the “good” range. This suggests that the model understood the scene described in the prompt and was able to create an image that reflected that understanding.
- Aesthetic Analysis: The model scored 0.02, which is significantly lower than the “very good” range (-0.2 to 0.1). This indicates that the generated image did not match the expected aesthetic as closely as it did with the camera position and shot analysis.
Overall, the model demonstrates a good understanding of camera positions and scene composition, but needs improvement in capturing the desired aesthetic.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-3/