AI NPC Portraits for DMs: Prompts for Innkeepers, Blacksmiths, Guards, and More
Quick answerTo get usable AI NPC portraits for D&D, lead with the occupation (innkeeper, blacksmith, city guard), add one prop and one clothing material that sell the job, and give each face one memorable flaw like a crooked nose. Reuse the same art-style and lighting phrase across every NPC so the whole village looks painted by one artist.
Every prompt list on the internet covers adventurers: paladins in gleaming plate, sorcerers wreathed in flame. But a DM's actual roster is a tavern keeper, two guards, a blacksmith, an apothecary, and a beggar with a rumor to sell. Those are the faces your players stare at for whole sessions, and almost nobody writes prompts for them.
This guide covers the occupation-NPC gap: which props and materials make a job readable in half a second, how to signal wealth without ever writing "rich," how to batch fifteen villagers in one coherent style, and how to bring the same innkeeper back in session 30 with the same face. Every example is a full prompt you can paste into Midjourney, DALL-E, or any other generator.
Why do NPC portraits do more for immersion than PC portraits?
A player looks at their own character art a handful of times per campaign. Your NPCs get introduced cold, mid-scene, dozens of times, and a portrait does in two seconds what a spoken description can't: it gives players a face to attach the voice, the name, and the plot hook to. Players remember "the innkeeper with the wine-stained apron" long after they've forgotten a paragraph of narration.
Portraits also carry signal. When you slide an image across the table (or pop it up in your VTT), players instantly read that NPC as mattering. That works for you twice over: picture the quest-givers and recurring faces, skip the one-line shopkeeps, and you've silently taught your table who deserves their attention.
The economics favor NPCs too. A player might commission art for one beloved character; no one commissions the town's fishmonger. AI generation costs you a minute per face, which means portraits for a whole village are finally practical. The rest of this guide is how to make that minute produce something better than a generic fantasy hero, starting with the prompt structure that makes an occupation actually read.
What makes an occupation read instantly in a portrait?
Image models know the word "blacksmith," but on its own it produces a vaguely muscular person in vaguely medieval clothes. Occupations read instantly when the prompt names visible evidence: one prop in hand, one clothing signal, one environment hint. Three camera-visible details, and the job is unmistakable.
- Innkeeper: pewter tankard and bar rag in hand; rolled-sleeve shirt and apron; warm candlelit taproom behind. See the innkeeper reference page for more trait combinations.
- Blacksmith: tongs gripping glowing metal; scarred leather apron; orange forge glow from below. The blacksmith page covers build and grooming cues.
- City guard: halberd or polished kettle helm; tabard with a city crest over mail; stone gatehouse background. More at the city guard page.
- Apothecary: cork-stoppered glass vial; ink-stained fingers and spectacles; shelves of dried herbs and jars.
- Farmer: wide straw hat; coarse linen smock; wheat field or muddy yard.
- Fisherman: coiled net or gaff hook; oilskin coat; grey harbor light.
The prop matters most. Hands holding a job-specific object anchor the whole composition, and a bust or half-body crop keeps those hands large enough for the model to render them well. If a portrait comes back generic, the fix is almost never more adjectives; it's swapping an abstract word ("merchant") for evidence a camera could see ("weighing silver coins on a small brass scale").
How do you prompt a memorable innkeeper, blacksmith, or apothecary?
Occupation details make an NPC readable; one deliberate imperfection makes them memorable. AI models default to symmetrical, attractive, thirty-year-old faces, so give every important NPC a single flaw or asymmetry: a broken nose, a singed beard, one clouded eye, deep laugh lines. That flaw becomes the thing your players use as a name ("the bent-nose innkeeper") until they learn the real one.
Here's the pattern applied. An innkeeper, warm and worth trusting:
Fantasy character portrait of a stout middle-aged human innkeeper, ruddy cheeks and a crooked nose broken long ago, greying hair tied back, deep laugh lines, wearing a rolled-sleeve linen shirt and a wine-stained apron, polishing a pewter tankard with a rag, warm candlelit taproom blurred behind her, digital painting, bust framing, golden and warm palette
A dwarf blacksmith, built like his own anvil:
Fantasy character portrait of a dwarf blacksmith with a singed auburn beard and soot-streaked brow, thick scarred forearms, heavy leather apron over a sweat-darkened tunic, gripping tongs that hold a glowing horseshoe, orange forge light from below, dark smithy behind him, oil painting, half-body framing
An apothecary who knows more than he says:
Fantasy character portrait of a thin elderly gnome apothecary with round brass spectacles and ink-stained fingertips, wispy white side-whiskers, patched wool waistcoat, holding up a cork-stoppered glass vial of murky green liquid, shelves of dried herbs and labeled jars behind him, soft window light, muted desaturated palette, watercolor illustration
Note what each prompt does: occupation and prop up front, one flaw, one material, lighting that matches the workplace (firelight for the forge, soft window light for the shop). Around 60-80 words is plenty; past that, the model starts ignoring details.
How do you show social status without writing 'poor' or 'rich'?
"Poor" and "rich" are conclusions, not visuals, and models handle them inconsistently. Materials and condition words are what actually render. Status in a portrait is three dials:
- Fabric: burlap, coarse linen, and rough wool at the bottom; pressed linen and good wool in the middle; silk, velvet, and brocade with gold thread at the top.
- Condition: patched, threadbare, frayed, mud-hemmed, sun-faded versus tailored, embroidered, fur-trimmed, freshly dyed.
- Grooming: wind-chapped cheeks, cracked knuckles, and unevenly cut hair versus oiled beards, powdered skin, and jeweled rings.
A "wealthy merchant" prompt gets you costume-shop shine; "a merchant in a fur-trimmed velvet doublet with three gold rings on each hand, soft and unweathered" gets you a man who has never carried his own luggage. The same trick runs downward:
Fantasy character portrait of a gaunt human beggar with wind-chapped cheeks and alert grey eyes, wrapped in a patched burlap cloak over threadbare rough wool, fingerless gloves, clutching a chipped wooden bowl, overcast daylight in a narrow stone alley, muted desaturated palette, pencil and charcoal sketch
Everything in that prompt is a thing you could photograph, which is why it works. The beggar page has more low-status cues, and the muted, desaturated palette keeps the image from looking cheerfully colorful when it shouldn't. Middle-status NPCs, the hardest to hit, work best with one contradiction: a plain wool coat with a single silver brooch says "comfortable, not noble" better than any adjective.
How do you batch a whole village in one coherent art style?
Fifteen NPC portraits in fifteen different styles look like what they are: images scraped from fifteen sources. The fix is a fixed suffix. Write the character half of each prompt freely, then end every single one with the identical style block, word for word:
..., digital painting with visible brushwork, earthy natural palette, soft window light, bust framing, plain dark background
Keep the variables few and locked: one art style, one palette, one framing, one background treatment. Lighting can vary by workplace (forge glow, tavern candles) without breaking cohesion as long as style and palette hold.
In Midjourney you can push cohesion further with a style reference: add --sref plus a code or image URL to every prompt in the batch, and per Midjourney's documentation the same code applies the same aesthetic across completely different subjects. Generate your favorite villager first, then reference it for the rest.
This is the workflow the Arcane Portraits generator is built around: set the art style, palette, lighting, and framing fields once, then swap only the character-type field through its 37 types (innkeeper, blacksmith, apothecary, farmer, miner, tavern keeper...) and copy each finished prompt out. Ten minutes gets you a visually unified roster. Browse the library first if you want to audition styles before committing the whole village to one.
How do you keep recurring NPCs consistent across sessions?
The tavern keeper your players adopted in session 3 needs the same face in session 30. Three practices, in order of importance:
- Save the exact prompt text. Not a paraphrase; the literal string. Identical wording won't reproduce the image pixel-for-pixel, but it holds the anchors (crooked nose, wine-stained apron, greying hair) so a regeneration reads as the same person on a different day. Signing in to Arcane Portraits saves your prompt history for exactly this reason.
- Don't rely on seeds. Midjourney's own docs note that seeds only set the starting noise; they can't bookmark a character, and results drift across sessions and model updates. A seed plus a changed prompt is a new face.
- Use an image reference for the NPCs that matter. In Midjourney V7,
--oref(Omni Reference) takes the URL of your original portrait and pulls the new generation toward that face;--owsets the strength — default 100, and Midjourney's docs advise staying below 400 unless you're also running a high--stylizevalue. It accepts one reference image per prompt and costs roughly double the GPU time, so reserve it for your recurring cast.
A practical routine: generate the first portrait, save the prompt and the image together in your campaign notes, and when the NPC returns changed (new scar, mourning clothes), rerun the saved prompt with only that one clause edited. The full toolkit, including Leonardo and Stable Diffusion equivalents, is in our character consistency guide.
Frequently asked questions
- How do I stop the AI from making every NPC young and attractive?
- Stack explicit age and flaw anchors: middle-aged, weathered, deep crow's feet, grey-streaked hair, sun-spotted skin, a crooked nose. Models default to symmetrical thirty-year-olds, so one anchor is rarely enough; use two or three. Words like plain or ordinary do almost nothing because the model has no visual target, while a named flaw such as a chipped front tooth reliably renders and makes the NPC easier for players to remember.
- Do I need a portrait for every NPC in my campaign?
- No, and showing one for everyone dilutes the effect. Players read a portrait as a signal that a character matters, so reserve images for quest-givers, recurring faces, and anyone likely to survive more than one scene. For a typical session that means three to five portraits. One-line shopkeepers can stay theater of the mind, and you can always generate a portrait later if the table unexpectedly adopts someone.
- What framing works best for NPC portraits shown at the table or in a VTT?
- Bust or head-and-shoulders framing works best. NPC portraits are viewed small, across a table or in a VTT sidebar, so the face needs to fill most of the frame to be readable. Tighter crops also give the model more pixels per face, which reduces wonky eyes, and they hide hands unless you deliberately include a prop. Save full-body shots for big reveals like a villain or a monarch.
- Why does my blacksmith prompt keep producing an armored warrior?
- Occupation words that overlap with combat vocabulary, like smith, forge, hammer, and steel, pull the model toward warrior imagery. Anchor the civilian read explicitly: a heavy leather work apron over a plain tunic, no armor, gripping tongs at an anvil. Removing any incidental weapon words and adding workshop context such as coal smoke and hanging tools usually fixes it within one or two regenerations.
- Can I use the same NPC prompt in Midjourney, DALL-E, and Stable Diffusion?
- Mostly. Midjourney and DALL-E both handle descriptive prose well, so the same 60-80 word prompt transfers directly; drop any Midjourney-only parameters like --sref before pasting into DALL-E. Stable Diffusion checkpoints generally respond better to comma-separated tags and truncate long prompts, so compress the prose into its key phrases: dwarf blacksmith, singed beard, leather apron, tongs, forge glow, oil painting.
- How do I prompt a nonhuman NPC like a dwarf innkeeper or half-orc guard?
- Put the race immediately before the occupation and give each one visual detail: a dwarf innkeeper with a braided copper beard, a half-orc city guard with small tusks and a notched ear. Race-first ordering matters because models weight early words most heavily. Nonhuman features drift back toward human defaults easily, so if the tusks or pointed ears vanish, move those traits earlier and restate them plainly.
- How do I turn an NPC portrait into a VTT token?
- Generate with a centered bust composition and a plain, solid-color background, which makes automatic background removal clean. Crop the result square, run it through a background remover, and add a border ring with a free token-stamp tool before importing at your VTT's native grid size. Prompting for the plain background up front saves far more time than trying to cut a busy tavern scene away from a hat.
- What background should an NPC portrait have?
- A hint of workplace, heavily blurred: shelved bottles behind an innkeeper, forge glow behind a blacksmith, a stone gate behind a guard. It reinforces the occupation without stealing attention from the face. Ask for it explicitly with phrases like blurred background or shallow depth of field. If you plan to reuse the portrait as a token, choose a plain dark background instead and let props and clothing carry the job.