The backend behind real-time, conversational AI characters for an open-world Unreal Engine game set in New York.
A game studio wanted NPCs you could actually talk to — characters that hold a conversation, in voice, and react to what's happening around them in the game world. I built the Python/FastAPI backend that made that work, and the part I'm proudest of is the context engineering: the characters don't just have backstories, they have a live sense of their surroundings.
Each character carries a mutable scenario object — not static lore, but current world-state:
The Unreal Engine client pushes updates to that state through dedicated endpoints — one to merge new context, one to overwrite it — and the scenario gets composed into the character's system prompt at request time. So the same character speaks differently depending on what's happening around it. If it starts raining in-game, the NPC's next line knows. That's the core loop: structured, mutable world-state injected into the prompt so dialogue tracks the world.
On top of that, the prompt assembly layers in hand-authored few-shot example dialogs that lock each character's voice and dialect before any real input arrives, plus per-player conversation memory keyed on player-character pairs in MongoDB, with session expiry.
One character, Prince J, was a real New York City resident. We recorded a 45-minute interview with him that did double duty: it captured a clean sample of his voice, which we cloned with ElevenLabs, and it gave us the raw material to engineer his character's persona and dialect (a Rasta/Spanglish patois). One session, two deliverables — a voice and a personality. He came to the Tribeca demo in person and met his AI self.
The LLM layer was model-agnostic, routed through OpenRouter so models could be swapped without code changes. We progressed through Hermes → Hermes 2 → Hermes-2-Pro-Llama-3-8B, Nous Research open-weights models — and the reason was concrete. The game needed characters who could authentically hold realistic, oppositional worldviews. One was modeled on Mother Sister from Do the Right Thing, and a defining trait was that she doesn't trust the police. Heavily aligned commercial models refused or sanitized that; they couldn't stay in character. The less-restricted Hermes models could, and the colorful, diverse NYC cast consistently came out better on them.
Dialogue quality is hard to eyeball one line at a time, so I built a separate SvelteKit evaluation harness — an app for testing and iterating on character responses before they reached the game. Scaffolding for emotion/sentiment classification on player input (a DistilBERT model) fed back into how characters reacted.
Python FastAPI (fully async) MongoDB (Motor) Pydantic v2 OpenRouter Hermes-2-Pro-Llama-3-8B ElevenLabs TTS SvelteKit (eval harness) Unreal Engine (REST)
The prototype was demoed as a live, interactive installation at the Tribeca Film Festival in June 2024 — players holding real-time, in-voice conversations with characters that knew where they were and what the weather was doing. It's the project I reach for when someone asks about context engineering, because it's the most literal version of it I've built: the world changes, and the characters notice.