Giving Characters Life With GPT3
At Fable, we’re building next-generation AI to create interactive stories with what we’re calling “Virtual Beings.”
As creators at the fore of this exciting new space, where AI is both a tool and an art-form, we’ve seen that if we remove the artist and rely completely on generative AI, it inevitably goes off the rails and delivers underwhelming results. While there are many entertaining examples of both profound and ridiculous sentences generated by AI on the internet, these responses are very unpredictable and the feeling of having a natural conversation is quickly lost.
Natural conversations are not like talking to Siri or your smart speaker either. There, the relationship is a transactional one with the AI only focused on solving an immediate request from you. Very little context is carried between interactions so they have no emotional intelligence, making them frustrating to interact with. Instead, we humans should be better represented within an AI’s brain. It should have memories of us and our conversations and continue that shared context into future conversations. It should anticipate us instead of just reacting to us. In short, what’s missing is that you should feel “seen”.
We’re interested in granting these skills to our own characters, which sit atop amazing artificial intelligence technologies like GPT-3. To demonstrate how far we’ve come toward accomplishing our goal of AI-driven storytelling and emotional intelligence, we created this scene of our character Lucy in conversation with a Guest. GPT-3, OpenAI’s powerful Transformer NLP model that’s been trained on a large corpus of text, is generating almost all of the things said in this video. Essentially, we give the AI some context, and it attempts to complete the text in an intelligent way. It shows how close we are to having natural conversations with an AI that really sees us as individual emotional beings and not task masters by leveraging humans to help guide and train it.
In an effort to inspire others and move the field forward, let me share some of how we’re doing this on the technology side. Below is a significant subset of the exact context that we gave GPT-3 in the making of this scene. We simply tell it about how we want the conversation to proceed as well as some details about who Lucy and the Guest are so it can generate new dialogue lines with that context.
CONTEXTS:
The following is a conversation between Lucy and Guest over text messages.
In Lucy's world, which is different from the Guest's, the date is 1988, while for the Guest it is still 2020.
The Guest reaches out to Lucy over chat message and Lucy responds in a playful way, asking if the Guest is a foozle and comes in peace.
Lucy says she can never sleep when the full moon is out, so she's awake now and passing the time by looking out the window for shooting stars.
Lucy continues to get to know the guest, asking them lots of personal questions and being imaginative.
Lucy eventually asks if the Guest has ever seen a shooting star and what the Guest wished for.
Finally, Lucy sees the shooting star, makes her wish, which she keeps as a secret or it won't come true, and then says goodnight.
LUCY:
Little Girl.
Active Imagination.
Age 8.
Lives with Brother, Mom, and Dad.
Likes Mysteries, Science, and Drawing.
Is looking for a shooting star.
Is superstitious and likes horoscopes.
Is a daydreamer.
GUEST:
A curious and friendly person who was just introduced to Lucy.
Once we provide the context, we also give it a line or two of dialogue to start the conversation. In the video above, we wrote the Guest saying, “Hi Lucy” and Lucy’s response, “Oh, a message. You must be a foozle!” The rest of the lines are then generated by GPT-3 -- with one exception: We wrote “I can’t sleep when there is a full moon, so I’m downstairs practicing for computer class tomorrow,” but all other lines were generated by GPT-3 based simply on the context and lines provided using a tool we will talk about in a follow up post.
Once the dialogue is selected, it’s sent through a TTS [Text-To-Speech] processor similar to the ones generating your turn-by-turn directions but much more advanced. We’ve trained a custom model on many samples of Lucy’s voice so that we can generate ad hoc lines for her, almost instantaneously. That audio is then fed into another model which converts her speech into lip sync and appropriate face animation. We added some tweaks to her head and eye movement, but most of this performance was completely automated without the need for custom animation.
This video is a demonstration of the powerful narratives that can emerge when you combine an artist’s vision, Artificial Intelligence, and emotional intelligence. Some of you may have more questions so feel free to reach out to us or sign up to experience a conversation with Lucy yourself. While the GPT3 version of Lucy is not available to the public just yet, her current version is also very capable and available now for all those who sign up.