AI's Artistic Journey: Capturing Poses, But Missing the Vibe with Imagen-v2
- 9 minutes read - 1861 wordsTable of Contents
In the realm of artificial intelligence, the ability to generate images based on textual prompts is rapidly advancing. However, capturing the nuances of artistic expression, particularly the aesthetic qualities of a scene, remains a significant challenge. This blog post delves into the results of an experiment where an AI model was tasked with generating images based on specific poses and scenes, revealing both its strengths and limitations in capturing the intended artistic vision.
Created with: imagen-v2
A Lone Warrior’s Triumph in the Setting Sun
A solitary warrior stands tall amidst a battlefield littered with fallen foes. The golden light of the setting sun paints the scene in an epic glow, highlighting the warrior’s victory and the dramatic scale of the battle.
Prompt
poses fighting: epic, determined ; A warrior; wide shot; heroism; a desolate battlefield with the setting sun in the background; cinematic
Characteristic
Shot : A warrior in full armor stands over a battlefield, his spear pointed towards the sky, after defeating his enemies. The sky is a warm orange color, and the ground is covered in sand and the bodies of fallen warriors.
Aesthetic Score : 0.7
Mood : epic, dramatic, victorious
Quality
Entropy : 6.86
Noise : 59
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has a few minor artifacts, such as some blurring around the edges of the warrior’s armor. The background is also a bit blurry, which could be improved by using a sharper lens.
Lost in the Jungle: A Moment of Suspense
Two figures, a man and a woman, stand amidst the dense foliage of a jungle, their expressions hinting at a brewing mystery. The man’s machete and the woman’s watchful gaze add to the sense of suspense, leaving the viewer wondering what lies ahead in this adventurous journey.
Prompt
poses fighting: intense, adventurous ; A group of adventurers; medium shot; adventure; a dense jungle with ancient ruins in the distance; cinematic
Characteristic
Shot : Two people, a man and a woman, are walking through a lush green jungle. The man is holding a large knife, while the woman is looking over her shoulder. The scene is dark and mysterious.
Aesthetic Score : 0.6
Mood : mysterious, adventurous, tense
Quality
Entropy : 6.73
Noise : 107
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image has some minor artifacts, particularly in the shadows and highlights. The foliage in the background is also slightly blurry.
Cyberpunk Intensity: A Look into the Future
A female warrior, clad in futuristic armor, stares directly at the viewer with an intense expression. The blurred neon-lit background adds to the sense of danger and intrigue in this cyberpunk scene.
Prompt
poses fighting: dynamic, futuristic ; An character; close-up; gaming; a neon-lit cityscape with holographic projections; cinematic
Characteristic
Shot : A woman in futuristic armor stands in front of a blurry city background, possibly a cyberpunk scene.
Aesthetic Score : 0.8
Mood : dark, mysterious, futuristic
Quality
Entropy : 6.44
Noise : 71
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.90
Image errors : Some minor artifacts are visible in the armor and the background, particularly around the edges.
Clash in the Marketplace: Violence Erupts Amidst Colorful Chaos
A tense confrontation unfolds in a bustling marketplace, as a woman strikes a man with her forearm. The vibrant backdrop of colorful fabrics and textiles adds a stark contrast to the raw intensity of the physical struggle, highlighting the conflict and violence brewing within the scene.
Prompt
poses fighting: chaotic, humorous ; Two people; medium shot; tourism; a bustling marketplace with colorful stalls and vibrant crowds; cinematic
Characteristic
Shot : A man and a woman are in a fight. The woman is holding the man’s head while she leans in to hit him. The background is a colorful market, with red, orange, and blue striped awnings. There are tables and objects in the background, but the focus is on the two figures.
Aesthetic Score : 0.6
Mood : intense, confrontational, dramatic
Quality
Entropy : 6.78
Noise : 107
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly blurry and some areas have noticeable pixelation. The colors are slightly muted and the lighting is flat.
A Lone Figure in the Vastness: Hope in the Desert
A solitary man, armed and determined, traverses a desolate desert landscape. The dramatic contrast between his small figure and the vast emptiness evokes a sense of adventure and hope, hinting at a journey filled with challenges and possibilities.
Prompt
poses fighting: isolated, desperate ; A lone traveler; long shot; travel; a vast desert landscape with a lone sand dune in the foreground; cinematic
Characteristic
Shot : A lone figure in a desert landscape, walking up a dune. The figure appears to be a man, wearing a brown shirt and pants. He is carrying a backpack and a weapon, possibly a sword. The sky is a soft blue, with some white clouds.
Aesthetic Score : 0.7
Mood : adventurous, dramatic, desolate
Quality
Entropy : 6.61
Noise : 88
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image is slightly overexposed and the colors are muted. The sand appears to be a bit too smooth, lacking any texture.
Urban Groove: Young Adults Dance Against the City Lights
A group of young adults capture the energy of the city with their dynamic dance routine on a rooftop overlooking the skyline. The warm lighting and contrasting angles create a sense of urban cool and playful energy.
Prompt
poses fighting: energetic, playful ; A group of friends; medium shot; groups; a rooftop overlooking a city skyline at night; cinematic
Characteristic
Shot : Four young adults are standing on a rooftop with a city skyline in the background, they appear to be having a lively conversation or possibly arguing, the overall atmosphere feels tense.
Aesthetic Score : 0.6
Mood : tense, dramatic, urban
Quality
Entropy : 6.53
Noise : 102
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears slightly blurry and there are some minor artifacts in the shadows of the figures. The overall image quality seems a bit grainy.
The Warrior’s Gaze: A Moment of Intensity on the Battlefield
A fierce warrior, cloaked in red and with a determined expression, stands amidst the chaos of battle. Smoke and fire engulf the background, creating a dramatic and intense scene. The camera captures the warrior’s gaze directly, drawing the viewer into the heart of the action.
Prompt
poses fighting: tragic, determined ; A lone warrior; close-up; heroism; a burning village with smoke billowing in the air; cinematic
Characteristic
Shot : A warrior with a beard and a scar on his face, wearing armor and a red cape, stands in the middle of a battle scene. There is fire in the background and other warriors in the distance.
Aesthetic Score : 0.75
Mood : intense, epic, gritty
Quality
Entropy : 6.76
Noise : 109
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is some blur in the background and the armor is slightly overexposed.
Clash in the Shadows: Explorers Battle in a Torched Cave
Two explorers, clad in rugged attire, engage in a fierce struggle within a dimly lit cave. The flickering flames of torches cast dramatic shadows, heightening the tension and suspense of the scene. The contrasting light and dark, coupled with the figures’ tense body language, create a palpable sense of danger and adventure.
Prompt
poses fighting: suspenseful, adventurous ; explorers; wide shot; adventure; a dark cave with flickering torches and mysterious shadows; cinematic
Characteristic
Shot : Two men in explorer attire are fighting in a dark cave, lit by torches. The man on the left is holding a flaming torch and the man on the right is trying to defend himself.
Aesthetic Score : 0.6
Mood : intense, dramatic, action
Quality
Entropy : 6.14
Noise : 66
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are some minor artifacts in the image, such as banding in the shadows.
Lost in the Digital Realm: A Glimpse into the Future
A mysterious figure, shrouded in the futuristic glow of a VR headset, stands against a blurred backdrop. The enigmatic design of the headset draws the eye, leaving the viewer to wonder what secrets lie within the digital world.
Prompt
poses fighting: immersive, intense ; gamer; close-up; gaming; a virtual reality headset with a pixelated world projected in the background; cinematic
Characteristic
Shot : A person wearing a futuristic VR headset, with a blurry background of lights.
Aesthetic Score : 0.7
Mood : cyberpunk, futuristic, mysterious
Quality
Entropy : 6.44
Noise : 76
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.90
Image errors : There is some blurring in the background, possibly due to a wide aperture.
On the Edge: A Tense Encounter on the Tracks
A low-angle shot captures a group of people walking in front of an oncoming train, their expressions tense and the scene charged with dramatic tension. The yellow and brown train looms in the background, adding to the sense of urgency and danger.
Prompt
poses fighting: fast-paced, chaotic ; Two travelers; medium shot; travel; a crowded train station with people rushing in all directions; cinematic
Characteristic
Shot : A group of people are walking on a train platform. There is a train in the background.
Aesthetic Score : 0.6
Mood : dramatic, tense, apocalyptic
Quality
Entropy : 6.27
Noise : 106
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry, and there are some artifacts in the background.
Conclusion
The results show that the generative AI model performed well in understanding the camera position and shot composition, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.45, which is slightly below the “good” range of 0.5 to 0.75. This suggests that the model’s ability to accurately interpret and implement camera positions in the generated image is decent, but could be improved.
- Shot Analysis: The model scored 0.62, falling within the “good” range. This indicates that the model effectively understood the scene described in the prompt and translated it into a visually coherent shot.
- Aesthetic Analysis: The model scored 0.11, which is significantly lower than the “very good” range of -0.2 to 0.1. This suggests that the generated image’s aesthetic deviated from the expected aesthetic, potentially lacking the desired visual style or quality.
Overall, the model demonstrates a good understanding of camera positions and shot composition, but needs improvement in generating images that align with the intended aesthetic.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-2/