This blog post explores the capabilities of a generative AI model in creating images based on text prompts. We analyze its performance in understanding scene descriptions, camera positions, and aesthetic styles. While the model demonstrates impressive abilities in capturing the overall scene and aesthetic, it struggles with accurately representing poses. We delve into the reasons behind this limitation and discuss the potential for future improvements.