The Lab

Dialogue and Cinematic Pacing

Building tension, character and atmosphere through AI-generated dialogue and cinematic pacing.

2026 Midjourney Grok Kling 3.0 ElevenLabs

The Brief

What I was testing

This experiment started as a single short film. It ended up becoming something longer. That was not the plan, but when the first episode did well it made sense to keep going.

The premise: two vampire characters, Ethan and Seraphine, are in a gothic library. Seraphine has her sights on Rosa, a human girl she intends to prey on. Ethan intervenes. What follows is a slow-burn dynamic between Ethan and Rosa that the series builds on beyond the first episode.

Rosa is the FMC. She is the human drawn into a world she did not ask to be part of, quiet and bookish on the surface, considerably more resilient underneath. Ethan is the MMC. He was designed as a specific kind of character. Think Vampire Diaries crossed with Dante from Devil May Cry. Brooding, sarcastic, magnetically insufferable. He quickly became my favourite character in the whole AI cast roster, which probably explains why I kept making episodes.

Seraphine appears in the first part only. She is the catalyst. Cold, precise, dangerous. Her attempt to prey on Rosa is what pulls Ethan into the scene and sets everything else in motion.

The technical challenge was real. This was the first time I had introduced scripted dialogue into an AI-generated sequence. That meant solving for character consistency across multiple scenes, realistic mouth movement during speech, physical interaction between two characters in the same frame, and maintaining the gothic library environment across every shot.

Spoiler: not all of those got fully solved. But the failures were as instructive as the successes.

Video

Part 1

Part 2

Part 3

The Characters

Ethan

click to expand

Ethan is the MMC. Vampire, centuries old, presents as early thirties. The energy is Vampire Diaries crossed with Dante from Devil May Cry. Brooding and sarcastic, with an underlying intensity that makes him difficult to ignore. He intervenes when Seraphine targets Rosa, which is how the series begins.

Rosa

click to expand

Rosa is the FMC. Human, bookish, quietly observant. She does not know what she has walked into when Seraphine targets her in the library. She is the central character the series builds around across all parts.

Seraphine

click to expand

Seraphine appears in Part 1 only. She is the antagonist who sets the story in motion, preying on Rosa in the gothic library before Ethan intervenes. Her role is the catalyst, not the ongoing arc.

What I Tested

The questions I was asking

Introducing scripted dialogue into an AI-generated sequence and integrating voice performance into the edit
Building a hybrid creative pipeline combining concept art, video generation, voice synthesis and sound design
Using AI image generation to storyboard and design key shots before committing to video generation
Experimenting with camera framing and character blocking to create clearer visual storytelling in action scenes
Creating consistent characters across multiple scenes using character reference imagery

The Fang Problem

A specific consistency challenge

Vampire characters introduce a specific consistency challenge that human characters do not have. Fangs only appear in certain expressions. During neutral shots the mouth is closed. During dialogue or emotional moments it opens, and that is when the model has to decide what the teeth look like.

Rosa, as the human character, did not have this problem. Her reference sheet was straightforward and remained consistent throughout. Ethan and Seraphine were a different matter entirely.

click to expand

Ethan: fang expression reference

click to expand

Seraphine: fang expression reference

What Went Wrong

The ‘taking out the trash’ problem

The hardest shots in the sequence were the ones requiring two characters to physically interact.

The context matters here. Before the dragging scene, Ethan has already been toying with Seraphine. He is not threatened by her. The fight, if you can call it that, is entirely one-sided and he knows it. His dialogue during it includes lines like 'ooo you almost got me with that one' and 'are you even trying to hit me.' He is entertained. She is not.

The dragging scene is the conclusion of that dynamic. He grabs her and hauls her away from Rosa with the energy of someone removing a minor inconvenience, delivering the line 'So, what did we learn today Seraphine' as he goes. It is dismissive, slightly theatrical, and completely in character.

Generating it was a different story. The model understood individual character movement and it understood two characters occupying the same frame. What it could not reliably produce was directed physical force from one character onto another. The specific quality of Ethan grabbing Seraphine with intent and pulling her in a particular direction while she resists.

What came back instead ranged from the two of them standing near each other with no contact, to Ethan pushing her from behind like he was operating a mop, to both characters moving independently in completely different directions with no connection between them at all.

The irony is that the scene is meant to show Ethan completely in control. The generations made it look like nobody was.

Two of the more memorable failed attempts:

Ethan pushes Seraphine like a mop

The model understood push but not direction or intent. Ethan ends up propelling her from behind with zero dramatic logic.

Ethan walks backwards, Seraphine moves independently

Both characters are moving but they are not connected. The model generated two separate actions in the same frame rather than one directed interaction.

Physical character interaction is still one of the hardest things to get right in AI video. The model handles individual character performance well. It breaks down when the action requires force, direction and contact between two bodies at once.

Process

How I approached it

The strongest results came from following a familiar production pipeline. Story first, then visual concepting, then shot composition, motion generation and sound design. Treating the AI tools as part of a proper filmmaking process got far better results than trying to generate finished scenes in one go.

Pre-visualisation mattered a lot. Using image generation to lock down characters, environments and shot compositions before committing to video cut the number of wasted attempts considerably. It also gave me tighter control over character placement, camera framing and action beats across multiple clips.

In practice this meant approaching AI video less like a one-click solution and more like directing a scene. Testing ideas, refining prompts and gradually building the final sequence shot by shot.

Adding dialogue and voice performance made the whole thing harder but also made it feel more like an actual story. Recording the dialogue first, then processing it through voice synthesis, gave me much more control over pacing and tone.

Key Learnings

What I took away

AI video still has real limits around precise character movement and interaction. You can work around most of them with careful planning, reference imagery and iterative prompting.

The finished piece proved a hybrid creative workflow can hold up. When AI tools are working under human direction, with real decisions being made about shot composition, visual storytelling and sound, you get something with actual narrative shape.

The strongest takeaway: AI can generate visuals quickly, but the best results still come from combining those tools with human direction, timing and storytelling instincts.

Outcome

What came out of it

A multi-shot cinematic sequence. It showed that AI tools can carry character-driven narrative with dialogue and dramatic tension, if the work is treated like a proper production. Not every shot was clean. But it held together as a story.

The fang consistency problem was partially solved. Getting reliable fang geometry on open-mouth expressions required over-describing the teeth in the prompt and cycling through multiple generations to find usable frames. The physical interaction shots were harder: Seraphine grabbing Rosa, Ethan stepping in. The model struggled to place hands convincingly in contact with another character. I found workarounds that held up at the edit pace, but it was never clean.

What I did not expect was for the project to expand. What started as a single short film became a series. Part 1 exists as the complete sequence. Parts 2 and 3 are in development. The characters developed enough weight during production that abandoning them felt wrong.

The Rosa and Ethan dynamic worked better than expected. Their scenes together had a tension that landed without needing much dialogue to sell it. The blocking, the framing choices, the way the model occasionally generated an expression that was exactly right. Those moments made the edit feel less like a technical exercise and more like an actual story.

Ethan remains my favourite character in the AI cast. He was designed to be insufferable in a charming way. That part worked out.

Dynamic Camera Motion →