AI Captures the Scene, But Misses the Mood with Titan-g1
- 9 minutes read - 1766 wordsTable of Contents
The world of AI image generation is rapidly evolving, with models capable of creating stunning visuals from text prompts. However, achieving a perfect match between the prompt and the generated image remains a challenge. This blog post examines the results of an experiment where an AI model was tasked with generating images based on specific scenes and poses. While the model demonstrated a good understanding of camera positions and shot composition, it struggled to capture the desired aesthetic. This highlights the ongoing need for advancements in AI image generation, particularly in the area of aesthetic analysis.
Created with: titan-g1
Military Formation: A Display of Discipline and Authority
A line of soldiers in uniform stand in formation, creating a powerful image of order and discipline. The somber mood and formal setting suggest a parade or ceremony, highlighting the authority and respect associated with military forces.
Prompt
poses standing-in-a-row: determined ; soldiers; wide shot; heroism; cinematic
Characteristic
Shot : A line of soldiers in uniform boots, standing in formation.
Aesthetic Score : 0.6
Mood : serious, disciplined, formal
Quality
Entropy : 6.93
Noise : 103
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible errors or artifacts.
Lost in Wonder: A Temple Beckons in the Jungle
Two figures stand in awe, their gaze drawn upwards to a majestic temple structure shrouded in jungle foliage. The low angle shot emphasizes the temple’s grandeur, while the play of light and shadow adds an air of mystery and intrigue. This image captures the adventurous spirit of exploration and the wonder of discovering hidden wonders.
Prompt
poses standing-in-a-row: excited, curious, adventurous ; A team of explorers; medium shot; adventure; a lush jungle with ancient ruins in the distance; cinematic
Characteristic
Shot : Two tourists, one with their arms raised, stand in front of an ancient temple, possibly in Southeast Asia. The temple is made of stone and has a stairway leading up to the entrance. Lush greenery surrounds the temple, giving the scene a sense of adventure and discovery.
Aesthetic Score : 0.7
Mood : adventurous, mystical, serene
Quality
Entropy : 6.87
Noise : 116
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : No obvious artifacts or errors are present in the image.
Blue Light Intensity: Three Gamers Locked in a Moment of Focus
A dimly lit room, bathed in blue light, reveals three young men engrossed in a shared experience. Headphones on, they lean in, their faces illuminated by the glow of the screen, suggesting a competitive game or a thrilling competition. The close-up shot captures the intensity of the moment, leaving the viewer to wonder what they are witnessing.
Prompt
poses standing-in-a-row: focused, competitive, passionate ; A group of gamers; close-up shot; gaming; a brightly lit esports arena with cheering fans; cinematic
Characteristic
Shot : A group of young men, possibly gamers, are wearing headphones and looking in different directions, one man is looking directly at the camera with a serious expression, the image is likely a portrait for esports team
Aesthetic Score : 0.6
Mood : serious, competitive, focused
Quality
Entropy : 6.79
Noise : 101
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is a bit noisy and grainy, there seems to be a bit of oversharpening or compression artifact on the faces
A Family’s Silhouette Against the Vast Landscape
A heartwarming image captures a family of four standing on a mountaintop, their backs to the camera, gazing out at a breathtaking panoramic view. The blue sky and green mountains create a sense of joy, hope, and adventure. The silhouetted figures against the vast landscape emphasize their connection to the natural world and evoke a feeling of wonder and scale.
Prompt
poses standing-in-a-row: happy, relaxed, joyful ; A family of tourists; long shot; tourism; a breathtaking view of a mountain range with a clear blue sky; cinematic
Characteristic
Shot : A family of four standing with their arms raised in the air, facing a mountain vista. The sky is blue and the mountains are green.
Aesthetic Score : 0.7
Mood : joyful, uplifting, adventurous
Quality
Entropy : 6.64
Noise : 107
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no significant errors in the image.
Adventure Awaits: Friends Embark on a Scenic Hike
Four young friends, filled with hope and a sense of adventure, traverse a dusty rural road. Their backpacks suggest a journey ahead, and the rolling hills and clear sky promise breathtaking views. The composition captures the depth of their experience, with the foreground figures sharp and focused, while the background softens, hinting at the vastness of their adventure.
Prompt
poses standing-in-a-row: free-spirited, adventurous, optimistic ; A group of backpackers; medium shot; travel; a dusty road leading to a distant village with palm trees; cinematic
Characteristic
Shot : Four young adults are walking down a dirt road, with backpacks on their backs, towards a small village in the distance.
Aesthetic Score : 0.6
Mood : adventurous, relaxed, carefree
Quality
Entropy : 6.94
Noise : 103
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are some minor image artifacts, such as the slight blurriness of the image, but nothing particularly distracting.
Passionate Performance: Three Singers Ignite the Stage
Three singers in suits deliver a powerful performance, their raised hands and expressive faces conveying a sense of hope and emotional intensity. The scene is both uplifting and dramatic, capturing the raw energy of their music.
Prompt
poses standing-in-a-row: harmonious, powerful, emotional ; A choir singing in harmony; close-up shot; groups; a dimly lit stage with spotlights; cinematic
Characteristic
Shot : Three people singing on a stage. The stage is dark with a blue curtain in the background. The singers are wearing formal attire. The lighting is dramatic.
Aesthetic Score : 0.6
Mood : intense, passionate, hopeful
Quality
Entropy : 6.15
Noise : 107
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly blurry, and the colors are slightly muted.
Captivating Dance Performance Under Dramatic Spotlight
A group of dancers in matching outfits command the stage with their energetic performance, illuminated by colorful spotlights against a dark backdrop. The dramatic lighting and powerful movements create a captivating and mysterious atmosphere.
Prompt
poses standing-in-a-row: energetic, synchronized, joyful ; A line of dancers; wide shot; groups; a brightly lit stage with colorful costumes; cinematic
Characteristic
Shot : A group of dancers are performing on stage. They are wearing colorful costumes and are moving their arms in a synchronized manner.
Aesthetic Score : 0.5
Mood : energetic, dramatic, theatrical
Quality
Entropy : 6.82
Noise : 109
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image appears slightly blurry, perhaps due to the movement of the dancers.
Golden Hour Friendship on the Beach
Four friends bask in the warm glow of the setting sun on a beautiful beach. Their laughter and smiles capture the essence of carefree joy and friendship.
Prompt
poses standing-in-a-row: relaxed, happy, nostalgic ; A group of friends; medium shot; groups; a sunset over a beach with waves crashing in the background; cinematic
Characteristic
Shot : Four young women stand on a beach at sunset, facing the ocean. The lighting is warm and golden.
Aesthetic Score : 0.7
Mood : happy, carefree, friendly
Quality
Entropy : 6.62
Noise : 102
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is some slight blurriness in the background.
A Moment of Focus: Scientists in the Lab
Three scientists, clad in lab coats and protective glasses, stand in a laboratory setting. The man in the foreground, arms crossed and gaze fixed on the viewer, exudes a sense of seriousness and professionalism. The lighting and composition create a sense of depth and intrigue, drawing the viewer into the scene.
Prompt
poses standing-in-a-row: focused, determined, innovative ; A team of scientists; close-up shot; groups; a laboratory with complex machinery and glowing screens; cinematic
Characteristic
Shot : Three scientists in lab coats stand in front of a microscope, looking at the camera.
Aesthetic Score : 0.6
Mood : serious, professional, focused
Quality
Entropy : 6.60
Noise : 102
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant errors. The image is a bit overexposed. The color saturation is low.
Fists Raised in Unity: City Protesters Demand Change
A powerful image captures the spirit of a diverse group of protesters raising their fists in unison, their determination reflected against the backdrop of the city. The scene evokes a sense of hope and unity, highlighting the collective strength of those demanding change.
Prompt
poses standing-in-a-row: determined, passionate, hopeful ; A group of protesters; long shot; groups; a city street with banners and signs; cinematic
Characteristic
Shot : A group of people of various ethnicities are walking down a city street with their fists raised in the air, likely a protest or demonstration
Aesthetic Score : 0.7
Mood : determined, powerful, hopeful
Quality
Entropy : 6.94
Noise : 105
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable image artifacts.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis. Here’s a breakdown:
- Camera Position: The model scored 0.49, which is slightly below the “good” range of 0.5 to 0.75. This suggests that the model’s ability to accurately interpret and reproduce camera positions in the generated image is decent, but could be improved.
- Shot Analysis: The model scored 0.58, falling within the “good” range. This indicates that the model is capable of understanding the scene described in the prompt and translating it into a visually coherent shot.
- Aesthetic Analysis: The model scored 0.11, which is significantly lower than the “very good” range of -0.2 to 0.1. This suggests that the generated image’s aesthetic deviated from the expected aesthetic described in the prompt.
Overall, the model demonstrates a good understanding of camera positions and shot composition, but needs improvement in capturing the desired aesthetic.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://docs.aws.amazon.com/bedrock/latest/userguide/titan-image-models.html