AI Captures the Scene, But Struggles with Camera Angles with Imagen-v2
- 9 minutes read - 1816 wordsTable of Contents
Facial expressions are a powerful tool in storytelling, conveying emotions and intentions with a single glance. Dramatic facial expressions, in particular, can heighten the impact of a scene, drawing the viewer into the story. This is where generative AI comes in, offering the potential to create images with specific facial expressions and camera angles. However, as our analysis reveals, the journey towards perfect execution is still ongoing. We explore the nuances of this technology, examining its strengths and weaknesses in capturing the essence of dramatic facial expressions.
Created with: imagen-v2
A Moment of Mystery: A Woman’s Smile in a Cafe
A close-up shot captures a woman’s subtle smile as she gazes off-screen, creating a sense of intrigue and wonder. The relaxed cafe setting adds to the contemplative mood, leaving viewers to ponder what has caught her attention.
Prompt
facial-expressions Embarrassment: Awkward and self-conscious ; A single woman; eye-level; Single Persons; A crowded cafe with loud chatter and laughter; cinematic
Characteristic
Shot : A woman sitting in a cafe looking to her left with a slight smile on her face
Aesthetic Score : 0.7
Mood : relaxed, contemplative, happy
Quality
Entropy : 6.70
Noise : 89
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.20
Image errors : some noise and blurriness, particularly around the edges of the image
The Star-Spangled Defender Stands Ready
A close-up shot captures the intensity of a superhero in a red and blue costume, his serious expression hinting at the impending danger. The city street behind him provides a backdrop of urban grit, emphasizing the weight of his responsibility.
Prompt
facial-expressions Embarrassment: Humiliated and exposed ; A superhero in a full costume; eye-level; Heroes; A bustling city street with people staring; cinematic
Characteristic
Shot : A superhero in a red and blue costume, with a cape, stands in front of a blurry city background. The superhero is looking directly at the viewer.
Aesthetic Score : 0.7
Mood : heroic, confident, powerful
Quality
Entropy : 6.74
Noise : 55
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image is slightly blurry, particularly the background.
A Moment of Surprise at the Formal Dinner
A man in a tuxedo sits at a dinner table, his expression a mixture of surprise and curiosity. His gaze is directed to the left, drawing the viewer’s attention to the unseen event that has caught his eye. The formal setting and the man’s dramatic reaction create a captivating scene.
Prompt
facial-expressions Embarrassment: Mortified and ashamed ; A man in a business suit; eye-level; Normal People; A formal dinner party with elegant guests; cinematic
Characteristic
Shot : A man in a tuxedo, sitting at a table in a dimly lit room, looking slightly confused or surprised. The background is blurred, suggesting a formal event or gathering.
Aesthetic Score : 0.7
Mood : intrigued, formal, dramatic
Quality
Entropy : 6.65
Noise : 121
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears to be slightly over-sharpened, resulting in some haloing around the subject’s edges.
Lost in the Game: A Moment of Defeat
A young man, slumped in his gaming chair, headphones on, his face etched with sadness and disappointment. The dimly lit room amplifies the sense of tension and frustration, capturing a raw moment of defeat in the digital world.
Prompt
facial-expressions Embarrassment: Cringing and defeated ; A gamer in a gaming chair; eye-level; Gamer; A dimly lit room with flashing screens and empty pizza boxes; cinematic
Characteristic
Shot : A young man wearing headphones sits in a gaming chair in a dimly lit room, his expression is one of sadness and disappointment.
Aesthetic Score : 0.4
Mood : sad, disappointed, frustrated
Quality
Entropy : 6.33
Noise : 85
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.75
Image errors : The image has a few artifacts and errors. There is a slight blurriness to the image, which could be due to the lighting or the camera used. There are also some unnatural edges and textures, particularly around the subject’s hair. The image appears to be an AI-generated image
Lost in the Crowd: A Moment of Melancholy
A woman in a white dress stands amidst a blurred crowd, her expression a study in quiet despair. The muted colors and her solitary figure evoke a sense of unease and mystery, leaving the viewer to ponder her story.
Prompt
facial-expressions Embarrassment: Lonely and out of place ; A woman in a wedding dress; eye-level; Single Persons; A crowded wedding reception with happy couples; cinematic
Characteristic
Shot : A woman in a white dress is standing at a party. She is looking away from the camera, and she appears to be sad. The background is out of focus, and there are other people in the background.
Aesthetic Score : 0.6
Mood : sad, melancholic, contemplative
Quality
Entropy : 6.78
Noise : 98
Prompt Clip Score : 0.19
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some slight artifacts, but they are not noticeable unless you zoom in.
Superman: The Hero’s Gaze
A close-up shot captures the intensity in Superman’s eyes after a fierce battle, the crowd behind him a testament to the hero’s impact. The dramatic lighting and the hero’s unwavering gaze create a powerful and emotional image.
Prompt
facial-expressions Embarrassment: Embarrassed and self-conscious ; A superhero in a cape; eye-level; Heroes; A cheering crowd at a victory parade; cinematic
Characteristic
Shot : A close-up portrait of Superman, with a blurred background of a cheering crowd, he’s looking at the viewer with a fierce and determined expression, his mouth slightly open in a slight grin, suggesting a heroic determination.
Aesthetic Score : 0.7
Mood : intense, heroic, powerful
Quality
Entropy : 6.45
Noise : 83
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some artifacts in the form of small, light-colored specks around Superman’s chest and neck area. These specks look like digital noise or a result of over-sharpening.
A Moment of Worry
A young woman, her face etched with concern, sits alone in a dimly lit cafe. Her floral dress and the blurred background suggest a sense of normalcy, yet her worried expression hints at a hidden tension. What is she waiting for, and what troubles weigh on her mind?
Prompt
facial-expressions Embarrassment: Uncomfortable and out of place ; A woman in a casual outfit; eye-level; Normal People; A fancy restaurant with white tablecloths and expensive wine; cinematic
Characteristic
Shot : A young woman in a floral dress sitting at a table, seemingly surprised or scared, with a soft focus background.
Aesthetic Score : 0.6
Mood : tense, worried, intimate
Quality
Entropy : 6.86
Noise : 98
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly blurry, especially in the background. The lighting is uneven, casting shadows on the woman’s face.
Lost in the Neon Glow: A Figure of Mystery Emerges
A hooded figure stands alone, bathed in the vibrant, blurry lights of a bustling crowd. The scene evokes a sense of mystery and intensity, hinting at a story waiting to unfold in this futuristic landscape.
Prompt
facial-expressions Embarrassment: Humiliated and defeated ; A gamer in a hoodie; eye-level; Gamer; A crowded esports tournament with loud cheers and flashing lights; cinematic
Characteristic
Shot : A young man in a hooded sweatshirt stands in a crowded venue with a dramatic lighting effect.
Aesthetic Score : 0.7
Mood : mysterious, moody, intense
Quality
Entropy : 6.44
Noise : 84
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.50
Image errors : The image has some minor artifacts, particularly in the background, and the man’s skin looks slightly over-processed. The background seems somewhat blurry and unrealistic, which makes the photo look less authentic.
A Dinner Date Gone Wrong: What Secret Lies Beneath?
A romantic dinner takes a sudden turn as a woman’s surprised expression and a man’s concerned gaze create an atmosphere of suspense and intrigue. What secret has been revealed, and what will happen next? The candles flicker, casting long shadows on the couple’s faces, adding to the mystery.
Prompt
facial-expressions Embarrassment: Awkward and uncomfortable ; A man in a tuxedo; eye-level; Single Persons; A romantic dinner for two with candles and flowers; cinematic
Characteristic
Shot : A man in a tuxedo looks at a woman with a surprised expression, the woman is also looking at him with a surprised expression. They are both indoors.
Aesthetic Score : 0.6
Mood : suspense, drama, intrigue
Quality
Entropy : 6.65
Noise : 72
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable errors
Superhero Close-Up: Intensity in the City
A dramatic close-up portrait captures a superhero’s intense gaze, shrouded in mystery against a blurred city backdrop. The mood is serious and suspenseful, leaving you wondering what challenges lie ahead.
Prompt
facial-expressions Embarrassment: Mortified and ashamed ; A superhero in a mask; eye-level; Heroes; A news conference with reporters asking difficult questions; cinematic
Characteristic
Shot : Close-up portrait of a superhero wearing a blue and red costume with a mask, standing in front of a blurred background of people.
Aesthetic Score : 0.6
Mood : intense, serious, dramatic
Quality
Entropy : 6.48
Noise : 49
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.40
Image errors : The image has some minor artifacts, particularly around the edges of the superhero’s mask. There is also a slight blurriness to the image, which may be intentional but could be improved.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.25, which is below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.675, which is considered good. This indicates that the model was able to understand the scene described in the prompt and create a shot that aligns with it.
- Aesthetic Analysis: The model scored 0.11, which is considered very good. This means that the generated image’s aesthetic closely matched the expected aesthetic described in the prompt.
Overall, the model demonstrated a good understanding of the scene and shot composition, but struggled with accurately capturing the intended camera position. The aesthetic of the generated image was very close to the expected aesthetic.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://deepmind.google/technologies/imagen-2/