Talking Heads
I want to talk about how we animate our characters’ portraits in Voodoo Detective. And I want to talk about it… now!
One of the first things we did as we began to design the dialogue system for Voodoo Detective was to research how other games had handled their dialogue. We looked at old classics like Monkey Island, Grim Fandango, and King’s Quest. After all, why not learn from our elders? We also looked at more modern games like Celeste, Unavowed, Oxenfree, and The Walking Dead. After all, why not learn from our contemporaries?
There’s certainly a lot to say about the decisions we made with respect to our dialogue system, but what I’d really like to focus on, as I subtly intimated in the lede, are the animated talking portraits.
We always felt that having our characters’ faces visible during dialogue would be a powerful tool to leverage as we attempt to ground our players in the emotions of the story. One thing humans are very good at is pattern recognition. We’ll look at the clouds together and you’ll say to me, “Eric, I see an elephant up there.” I cast my gaze skyward, and by god, I see an elephant up there too. I’m glad we could share this moment. Anyway, seeing our characters’ portraits seemed like a good idea.
To that end, we came up with a few options:
Option #1: No Animation
Nobody said we had to animate these dang portraits. Take another look at what Unavowed did. While they chose not to animate the portraits, they did swap out the faces based on the emotions they sought to convey.
Pros
This is by far the cheapest, easiest option.
Cons
It’s not as interesting as having the lips actually move, now is it!?
Option #2: Lip Smacking
As the name implies, this method has us animating the characters’ mouths flapping from the “opened” to “closed” state anytime dialogue is spoken irrespective of whether or not the character is pausing between sentences.
I’m ashamed to say I lost the video clip for this version of our portrait animations, but I retained the animation itself so here’s what that looked like.
Pros
This is a step up from having no animation. We get to see the mouth move!
Cons
It’s actually quite jarring seeing a person’s mouth move when they’re supposed to be pausing in between words.
The shapes the mouth makes don’t match the dialogue.
Option #3: “Intelligent” Lip Smacking
This is a variation of the “Lip Smacking” method. Instead of playing the animation indiscriminately, we only play it when the volume of the speech becomes audible to human ears. If the character takes a long pause in between words or sentences, the lips stop smacking. This is achieved by monitoring the volume levels coming out of the voice channel.
Incidentally, it seems like the folks over at Sierra went one step further and displayed different mouth sizes based on the volume of the characters’ speech. When a character is making a lot of noise, the “big mouth” image is shown. When a character is speaking softly, the “small mouth” sprite is shown. At least that’s my best guess at how they made it look as good as it did back in 1992.
Pros
This is definitely starting to look more natural! And as a bonus, once you’ve got the lip smack animation, monitoring volume to control when the animation plays is trivial.
Cons
Like I said, it’s still a little jarring to see a character's mouth move when the mouth movement and the words spoken don’t match.
Option #4: Hand Animated Lip Sync
The idea is to sync the characters’ lips to the dialogue being spoken. This option is the most realistic and arguably the best looking.
Pros
It looks great and feels very natural. It’s what we expect.
Cons
Hand animating a face such that it perfectly syncs to the words being spoken is a tremendous undertaking when you have thousands of lines of dialogue.
Even if we had the resources to do all these animations by hand, adding them to the game would balloon the amount of memory required to run it because of all the unique animations we’d need on hand.
So at this point, I thought we were stuck with the “Intelligent Lip Smacking” option. It’s not as good as what they had in King’s Quest VI, but it’s still an acceptable solution. So I said, “good enough,” pulled on my pajamas, and went to bed. But that night, I found myself violently thrashing back and forth in my sleep, accosted by nightmares of unnatural talking heads. Their lips are moving, but the sounds don’t match! I awoke covered in sweat and I said to myself, “we can do better!”
I looked back at King’s Quest VI and said, sure they had a much bigger budget and many more people working on the game, but it’s 2021, not 1992! We’re not cave dwelling savages chiseling our games into stone tablets! King’s Quest VI is a low bar and we needed to vault it spectacularly. I’m sure Ken and Roberta Williams don’t want us living in the past.
So I did a little more research and found another option.
Option #5: Computer Generated Lip Sync
I don’t know, Eric. Computer generated? That sounds a little complicated.
Well I’m here to tell you it’s actually very complicated! Luckily, there are exceedingly intelligent people out there with machine learning savvy who have created free software anyone can use to take an audio clip and generate a reasonable guess at which facial expression needs to be shown at any given point in that audio clip. But before I go any further, I’d like to define two words you might not know:
Phoneme - Phonemes are the constituent chunks of the sounds produced by human speech.
Visemes - Just as there are phonemes for sound, there are corresponding “visemes” for the facial expressions used to produce those sounds.
Oculus has a nice chart here that explains how phonemes are mapped to visemes. So what does that all look like when put into practice?
Pros
It looks great and feels very natural. It’s what we expect.
Cons
Well, it’s not perfect. You probably noticed that sometimes the software’s best guess isn’t 100% accurate. But it’s not too bad.
I was and am exceedingly pleased with how things turned out. And really, that’s all I have to say! Thanks for coming with me on this journey!
Love,
Eric Fulton