When I started experimenting with AI character work, I did what most people do. I described what I wanted to see. The scene. The props. The lighting. The clothes. I treated the model like a fast illustrator who needed a brief.
The results were technically correct and completely lifeless.
A character standing in a rain-slicked alley, leather jacket, moody lighting. Sure. But no tension. No sense that this person has somewhere to be or someone they're running from. Just a well-executed mood board image of a concept.
The shift happened almost by accident. I was generating frames for Ethan, one of the AI characters I'd been building, and I was frustrated. I'd described him exhaustively. Every physical detail locked down. But something kept slipping. He looked right but didn't feel like himself.
Out of impatience more than strategy, I rewrote the prompt. Instead of "man in leather jacket, standing in rain, dark alley, cinematic lighting," I wrote: "a man who has just made a decision he can't take back, standing in the rain, the weight of it still settling."
The difference was immediate. Not just in one image but in the whole feel of the outputs. The posture changed. The expression shifted. The lighting choices the model made felt considered in a way they hadn't before.
Describing what a thing looks like gets you surfaces. Describing what it feels like gets you something with interior life.
The Suppression Theory experiment
This led me to something more deliberate. I started testing known psychological frameworks as prompting tools, specifically James Gross's Suppression Theory. The idea that suppressed emotion reads differently in the body than expressed emotion. Someone trying not to cry looks different to someone crying. That tension is visible. And it turns out AI models respond to that distinction when you describe the internal state rather than the external appearance.
I built a test around Ethan delivering one line in an interrogation room: "I've been waiting for you."
The Kling prompt read:
The results from Kling were genuinely interesting. The pre-line beat had real weight. The stillness read as suppression rather than blankness.
When the model gets it wrong
Then I ran the same prompt in Seedance 2.0.
I got an NSFW flag.
"Output may contain sensitive content."
Which is a perfect illustration of exactly the problem. The model wasn't reading psychological subtext. It was pattern matching surface signals. Interrogation room plus intense man plus "I've been waiting for you" equals flagged content. It had no framework for understanding that the context was a character study in emotional suppression.
The content filter responded to the ingredients, not the intention.
I rewrote the prompt with softer language, removing the interrogation room framing and anything that could read as confrontational:
That version went through without a flag. Same psychological direction. Same emotional instruction. Just without the environmental cues the filter was reacting to.
The lesson: content moderation works on surface pattern matching. Your emotional prompting needs to account for that layer too.
Which models actually do this well
Not all models respond to emotional prompting equally. In my testing, Kling and Seedance 2.0 handle psychological and emotional direction significantly better than Google Veo or Omni. The latter tend to default to more literal interpretations of whatever language you use, which makes suppression and micro-expression work harder to achieve.
If you want to test this approach, start with Kling or Seedance. You will get more interesting results faster, which makes the iteration process worth doing.
How I structure prompts now
My prompts usually have two distinct zones.
The first is the identity block. Consistent physical details, the things that make this character unmistakably themselves. This stays fairly stable across generations.
The second is the moment. An emotional state or situational feeling specific to this image. This changes with every prompt. And that is where the character actually lives.
Emotion as a prompting structure is not a magic solution. Lean too hard on it and you lose consistency, because the model starts reinterpreting physical details through the emotional lens. Balance is the real craft. And this approach works better for some generation tasks than others.
But for character work, micro-expressions, and scenes where the feeling in the room matters as much as what is happening in it, starting from emotion rather than description consistently produces more interesting results.
It is a useful tool. Not the only one.