Blog

2025-12-26

RogueLLMania: Running an LLM in Your Game Loop

Vibe Coding — A 3-Part Series

This post is part of a short series documenting a year of vibe coding: small, exploratory projects built quickly and with a focus on fun, treating tools like LLMs, CLIs, and game data as creative materials rather than products.

  1. RogueLLMania: Running an LLM in Your Game Loop
    Procedural narration in a roguelike using a local model.
    You are here

  2. Vibe-Coding a Video-Cutting CLI Tool
    FFmpeg, SuperCollider, and ambient automation.
    Read it here

  3. A Warhammer 40K MCP Server
    Structured tabletop game knowledge for AI agents.
    Read it here

Prefer to listen? If you don't want to read the whole post, you can watch the video above where I ramble about basically the same thing I talk about here. The video is a bit older and was recorded spur of the moment on top of a mountain in Norway during a trip this summer.

The Talk That Stuck

At the 2023 Game Developers Conference, Yusuke Mori of Square Enix R&D presented "Developing Adventure Game with Free Text Input Using NLP." 1 The demo was compelling, a player types free-form dialogue into a text adventure, and an LLM (large language model) interprets that input to either advance the story or allow open-ended conversations with NPCs. This wasn’t a chatbot bolted onto a game, it was an NLP (natural language processing) woven into the game’s state machine, making runtime decisions about what to do with unpredictable player input.

Just a few months earlier, ChatGPT had launched and everyone was talking to chatbots, but running a model locally? That required a powerful GPU, specialized libraries like PyTorch, and the kind of expertise usually reserved for researchers. For most game developers, the technology was fascinating but inaccessible.

That talk stuck with me. Not because I wanted to build a chatbot NPC, that didn’t feel like an interesting design problem. I felt like it hinted at something else, what if we used LLMs the way we already use procedural generation tools? Not for conversation, but for world texture, for flavor. For the kind of world-building that scales beyond what most small teams can hand-author.

Two Years Later

Fast forward to 2025. The OpenAI API is well-documented and inference costs have dropped. More importantly, tools like Ollama 2 and LM Studio 3 have made it trivial to run capable models locally. Download a model, spin up a server, and you’re generating text without worrying about API limits or sending data to the cloud.

When I was in college, motion controllers like the PlayStation Move and Xbox Kinect appeared, and a wave of fascinating indie games explored what those inputs could do. Johann Sebastian Joust became a hit not only because motion controllers were new, but because someone found a novel interaction in the newly opened design space.

So why haven't we seen that same exploration with LLMs?

Part of it is backlash, generative AI has baggage, and rightfully so. 4 But I believe an equally large reason is cost. Inference isn’t free, latency is unpredictable, and no one wants to be left holding the bag on server costs if a game doesn’t pay off. Generative AI has made its way into Microsoft Teams, Strava, Photoshop and about every other consumer piece of software, and yet it has barely touched games as runtime.

This project isn't Johann Sebastian Joust for LLMs. But maybe it can help point the way.

A Roguelike

I love Caves of Qud. 5 I don't think I've ever had a run last that long but something about its procedural storytelling, its histories, factions and systemic interactions amaze me. That led me to other traditional roguelikes like Brogue.

As a game developer, whenever I fall in love with a game, I also think about how I would make my own version of it. That's how I discovered ROT.js 6, a JavaScript library for building roguelikes in the browser. As a fan of JavaScript, I knew immediately that I wanted to build something with it.

The Design Constraints

Even with guardrails, LLMs can be dangerous. Sycophancy, misinformation, and unpredictable outputs make direct player-to-model interactions risky at best and harmful at worst. On top of that, every query cost compute time, and if player input is unbounded, you have no idea what a player might cost you.

So, the design constraints became simple, the player never queries the model directly. Instead, the game's systems decide when to query the model, and only to pass structured, predictable information.

This sidesteps almost all the negatives of using an LLM in a game:

  • No harmful player prompts

  • Predictable latency (you control when queries happen)

  • Deterministic fallbacks (the game works fine without the LLM enabled)

  • Structured inputs and outputs (you can validate what comes back)

How RogueLLMania Works

RogueLLMania is a turn-based roguelike dungeon crawler. You navigate procedurally generated chambers, fight monsters, collect artifacts, and try to survive deeper into the dungeon. The game uses traditional roguelike mechanics, turn-based movement, fog-of-war, permadeath, tile-based rendering all made easy by ROT.js. Layer on top of that an Ollama-powered local LLM-generated narration.

Here is how it works:

1. Procedural Generation

The level generator creates a chamber using cellular automata for caves or rectangular rooms for halls. It places doors, monsters, health potions, and objects like torches. Finally, it rolls from lookup tables to create an artifact with randomized properties:

  • Name: "Obsidian Shard", "Bronze Compass", "Crystal Lens"
  • Material: "obsidian", "bronze", "glass"
  • Properties: age, texture, weight, magical hints

The artifact is placed on an empty tile, and the game knows its position in the level.

2. First LLM Query: Artifact Description

The game constructs a prompt using this data, a “mad libs” approach, though in ML terms this is closer to a structured input. The prompt includes:

  • The artifact's name, material, and properties
  • Its position in the level (near a wall, in an open chamber, etc.)
  • One-shot example of a good description

The prompt explicitly requests the output in in XML tags: <description>…</description>. This makes parsing easier and allows us to strip out any CoT (chain-of-thought reasoning or auxiliary tags the model might emit.

The model returns a 2-3 sentence description of the artifact. This gets serialized as the artifact’s description property and displayed when the player interacts with it.

3. Second LLM Query: Level Introduction

Now that we have an artifact description, we send a second prompt that includes:

  • The current floor number
  • The chamber type (cave, pillared hall, etc.)
  • Monster count and types
  • The artifact's name and description (feeding the first result into the second prompt)

This is an example of recursive generation, using one LLM output as context for the next query.

The level introduction is streamed in as the model generations tokens. They appear word-by-word in an overlay as you enter the dungeon. The player can dismiss this overlay at any time. If the LLM is disabled or fails, the game falls back to deterministic text, simple templates that still convey the essential information.

4. The Game Loop Continues

Once narration is complete, the game proceeds as a normal classic roguelike. Combat, movement, inventory, fog-of-war, all of this runs without the LLM. The model is only queried at specific, controlled moments: level transitions.

Technical Implementation

The game is built with Electron, which gives us a Chromium renderer and a Node.js backend. The renderer runs ROT.js and the game logic, while the main process handles IPC (inter-process communication) with Ollama.

Ollama runs as a local HTTP server (default port 11434). The main process (Electron backend) handles the HTTP calls so the renderer doesn’t have to worry about CORS or network errors.

For streaming responses, Ollama sends newline-delimited JSON chunks. Each chunk contains a response field with a token fragment. The main process forwards these to the renderer via IPC events, and the UI appends them in real-time.

Parsing and Sanitization

I wanted to game to be somewhat agnostic of models, so the streaming parser had to handle quite a bit.

  • Models that emit <think> blocks before the actual description
  • Code fences (```) that some models use to wrap XML
  • Language hints like "xml:" before the description tag
  • Partial tags that arrive mid-stream
  • Models that never use <description> tags at all

The solution is a state machine that toggles between “inside description” and “outside description” modes, skips fence markers and think blocks, and falls back to a “soft mode” that starts streaming after a short timeout even if no <description> tag appears.

This ensures a responsive experience regardless of model behavior. The "soft mode" timeout means that if a model takes more than 900ms without starting a <description> tag, we just start streaming everything and clean it up later.

Graceful Degradation

If Ollama isn't running, if the model isn't found, or if a request times out the game:

  • Logs the error to console
  • Falls back to deterministic text
  • Shows a notification in the settings overlay
  • Continues running without interruption

The fallback system uses simple templates that still convey atmosphere:


const levelTypeTemplates = {

basic: [

"You step into a well-constructed chamber, its {tileDescription} floors bearing witness to ancient craftsmanship.",

"The entrance reveals a spacious hall with {tileDescription} beneath your feet, speaking of forgotten builders."

],

cave: [

"You emerge into a natural cavern, where {tileDescription} ground tells stories of geological ages.",

"The rough-hewn cave opens before you, its {tileDescription} floor shaped by time and underground waters."

],

pillaredHall: [

"You enter a grand pillared hall, where {tileDescription} floors echo with the footsteps of history.",

"Ancient columns rise from {tileDescription} ground in this majestic chamber of forgotten purposes."

]

};

The game randomly selects from these templates and fills in the {tileDescription} based on the level's dominant tile type. It's not as varied as the LLM output, but it keeps the game playable and atmospheric.

The "Mad Libs" Approach and Structured Inputs

I've been calling this the "mad libs" approach because it feels like filling in blanks, you have a template, and the game fills in the values. But in AI terms, this is closer to structured inputs combined with output formatting constraints.

Here's the actual system prompt that gets sent to the model:


const STATIC_PROMPT = `

You are the Chamber Herald—an AI narrator who writes a vivid 2‑3 sentence introduction when an adventurer steps into a new chamber of a dungeon.

### Reasoning checklist

1. Read every incoming XML tag and remember its values.

2. Weave the following ingredients into the prose:

• <floor> and <chamber_type>

• A sense of threat using <monster_count> and <monster_type>

• A hint of discovery using <artifact_title> and, if provided, subtly echo the mood of <artifact_description> without quoting it

3. Write in second person ("You…"), keep it mysterious and evocative, no more than 3 sentences, avoid proper nouns except the artifact title.

Return **only** the following structure (no extra commentary):

<output_format>

<description>…</description>

</output_format>

`;

In the current implementation, the system prompt and XML payload are concatenated into a single prompt string sent to Ollama's /api/generate endpoint. Ollama also supports a /api/chat endpoint with separate system and user message roles (similar to OpenAI's API format).

Note: I plan to investigate whether using the chat API's separate system/user message format produces meaningfully different results compared to the concatenated approach.

And here's how the game state gets transformed into structured XML input:


function buildLevelIntroXML(ctx) {

const dominantMonsterType = Object.keys(ctx.monsterTypes)[0] || 'none';

const floorDesc = tileAtmospheres[ctx.dominantTile] || 'ancient stone';

return `

<chamber>

<level>${ctx.levelNumber}</level>

<chamber_type>${ctx.levelType}</chamber_type>

<floor>${floorDesc}</floor>

<monster_count>${ctx.monsterCount}</monster_count>

<monster_type>${dominantMonsterType}</monster_type>

<artifact_title>${ctx.storyObjectDetails ? ctx.storyObjectDetails.title : 'NONE'}</artifact_title>

<artifact_description>${ctx.storyObjectDetails ? ctx.storyObjectDetails.description : 'NONE'}</artifact_description>

</chamber>`;

}

The prompt explicitly tells the model:

  • What information is available (via XML tags)
  • What format to return (XML tags in the response)
  • What tone and length to aim for (2-3 sentences, mysterious, second person)
  • The role it's playing ("Chamber Herald")

This is like how LLMs are being deployed for data synthesis work, you give the model structured data, ask for structured output, and use traditional programming to validate and extract what you need.

The power of this method scales with the complexity of your game state. Right now, RogueLLMania only generates:

  1. An artifact description (that has no gameplay impact)
  2. A level introduction (that's purely atmospheric)

But imagine a more complex game state:

  • Faction relationships (are the goblins allied with the merchant guild?)
  • Quest objectives (dynamically generated based on items you've found)
  • NPC dialogue (contextual greetings based on your reputation)
  • Environmental storytelling (descriptions of room damage that reflect recent battles)

As your procedural systems grow more sophisticated, the "mad libs" approach lets you feed increasingly rich context into the model while keeping outputs bounded and predictable.

Procedural Content Generation as Craft

We’ve been using procedural generation tools like Houdini for years to create art for game worlds, trees, rocks, terrain, things we consider “world texture.” These tools don’t replace artists; they let artists scale their work beyond what’s hand authorable.

I see no reason why we can't use an LLM to generate language that serves a similar purpose. Its job isn't to write your main quest dialogue or your protagonist's arc. It's to fill the spaces in between, the flavor text on a random loot, the atmospheric description of rooms, the incidental details that make a procedural world feel lived-in.

Language as Another Tool

I hope this project inspires some new thinking about how LLMs can fit into game development. Running a model locally resolves most of the major concerns: no API costs, no player data sent to the cloud, less unpredictable latency spikes, and full control over when and how the model is queried.

Modern GPUs and Apple Silicon have the compute to run capable LLM, especially if you’re not pushing graphics. For a 2D turn-based Roguelike with minimal draw calls, there’s plenty of headroom to generate a few hundred (or thousand) tokens when loading a new level.

The design lesson here isn't "add an LLM to your game." It's "find the places where procedural content already lives and ask if language could be one of those things." If you're already rolling random loot tables, generating quest objectives from templates, or stitching together dialogue from conditional branches, you're already doing a form of procedural narrative. A LLM is just another tool in that toolbox, one that happens to be very good at sounding natural and varied.

I don't know what the future of this looks like. Maybe local inference gets fast enough that we can generate dialogue in real-time during conversations. Maybe fine-tuned models become so good at your game’s specific voice that players can’t tell what was authored and what was generated. Maybe none of these matters because the next big leap is something we haven’t imagined yet.

But right now, in 2025, you can download Ollama 2, pull a model, and start experimenting. That accessibility is new, and it’s worth exploring, not because LLMs are magic, but because they’re just another tool, and understanding your tools is what craft is all about.


If you want to try RogueLLMania yourself, the full source code and build instructions are available at github.com/bradenleague/RogueLLMania.

Footnotes

  1. Developing Adventure Game with Free Text Input using NLP: https://www.gdcvault.com/play/1028755/AI-Summit-Developing-Adventure-Game

  2. Ollama: https://ollama.ai/ 2

  3. LM Studio: https://lmstudio.ai/

  4. For more thoughts on concerns around generative AI, see AI Curious.

  5. Caves of Qud: https://www.cavesofqud.com/

  6. ROT.js: https://ondras.github.io/rot.js/hp/

← Back to Blog

bradenleague.com