How to Write an AI Prompt for Your D&D Character (Formula + Examples)
Quick answerTo write an AI prompt for a D&D character, describe what a camera would see, in this order: role and ancestry, permanent visual anchors (skin, eyes, hair, horns, scars), gear and materials, mood and pose, then lighting, framing, and art style. Aim for 40-80 words of descriptive prose, not keyword lists or backstory.
You know exactly what your character looks like. The AI doesn't, and the way most people bridge that gap — pasting in backstory, or stacking keywords like epic, detailed, 8k, masterpiece — produces the same interchangeable fantasy portrait everyone else gets. Image models don't read minds and don't read novels; they render descriptions of visible things, weighted by where those things sit in the prompt.
This guide gives you a five-block formula that works across Midjourney, DALL-E/ChatGPT, Stable Diffusion, Flux, and Leonardo, explains why order and length matter, and ends with five finished prompts spanning five races that you can adapt directly. If you'd rather not hand-assemble the blocks each time, the free prompt generator composes this exact structure from trait pickers.
Why do 'shopping list' prompts produce generic portraits?
Two failure modes account for most disappointing character portraits, and they're opposites.
The first is keyword soup: elf, ranger, bow, forest, green cloak, beautiful, epic, highly detailed, trending on artstation. That style was a workaround for early Stable Diffusion models that parsed tags better than sentences. Modern models — Midjourney v6/v7, DALL-E 3, Flux — are trained on natural-language captions and parse full sentences well, so disconnected tags give them nothing to bind together. The model can't tell whether the green belongs to the cloak, the forest, or the elf's eyes, so it averages toward the most statistically common 'elf ranger' image in its training data. That average is the generic portrait you keep getting.
The second is the backstory dump: 'Kaelen grew up an orphan on the streets of Waterdeep and trusts no one after his mentor's betrayal...' None of that is visible. A camera pointed at Kaelen cannot photograph a betrayal. Every word the model spends parsing invisible history is attention not spent on his crooked nose or patched cloak — and in tag-based tools with hard token limits, it literally pushes visible details out of the prompt.
The fix for both is the same discipline: write a compact, ordered description of what a viewer would see in the final image, and nothing else. The rest of this guide covers the order, the length, and how to convert story into visible detail.
What order should a character portrait prompt follow?
Image models weight what comes early in a prompt more heavily than what comes late. So the order isn't stylistic preference — it's a priority list. Use five blocks:
- Role and ancestry. One clause naming what the model should draw: 'a fantasy character portrait of a weathered wood elf ranger.' Race before class — ancestry changes anatomy, and anatomy errors are the hardest to fix later.
- Permanent visual anchors. Skin tone, eye color, hair, and race features like horns, tusks, or pointed ears, plus scars or markings. These define identity.
- Gear and materials. Clothing, armor, and at most one or two signature props. Name materials — worn leather, scratched steel, coarse linen — because materials carry more information per word than adjectives like 'cool armor.'
- Mood, expression, pose. One line: 'guarded, tired expression' or 'sly half-smile.'
- Lighting, framing, and art style. 'Bust portrait, warm candlelight, painterly digital art.' This block controls how the image looks rather than what's in it.
A skeleton version:
A fantasy character portrait of a [age] [race] [class/role], [skin], [eyes], [hair], [race feature or scar]. Wearing [garment] of [material], [one prop]. [Expression]. [Framing], [lighting], [palette], [art style].
The Arcane Portraits generator assembles prompts in this order automatically — its ~25 trait fields map one-to-one onto these five blocks.
How long should an AI character prompt be?
For prose-based generators (Midjourney, DALL-E/ChatGPT, Flux, Leonardo's default models), 40-80 words is the reliable band. Under about 40 words you're delegating too many decisions to the model, and it fills the gaps with training-data defaults — the same face, the same armor, the same teal-and-orange glow. Past about 80-100 words, additions stop landing: models distribute attention across the whole prompt, so every extra clause dilutes the ones that matter. Midjourney's own documentation pushes toward short, clearly focused prompts over long instruction lists for exactly this reason.
Stable Diffusion has a harder constraint: CLIP, its text encoder, processes prompts in 75-token chunks (roughly 50-60 words). Interfaces like AUTOMATIC1111 handle longer prompts by splitting them into separate chunks, but concepts don't bind well across the boundary, and tokens near the front of a chunk carry more positional weight. Practical consequence: put race anchors and identity features in the first 20 tokens, and treat anything past 75 as unreliable. The Stable Diffusion portrait guide covers the tag-format version of this in detail.
A useful editing test: delete any word and ask whether the image would visibly change. 'Beautiful,' 'epic,' 'highly detailed,' and 'masterpiece' fail this test on modern models. 'Chipped tusk' passes.
What are permanent visual anchors, and why do they matter more than gear?
Permanent visual anchors are the traits your character keeps in every scene: skin tone, eye color, hair color and style, race features (horn shape, ear length, tusks, scales), and identifying marks like scars or tattoos. Gear is everything they could take off.
Anchors deserve more prompt budget than gear for three reasons.
Identity lives in the face. Players recognize their character by the face, not the breastplate. A portrait with perfect armor and the wrong face is a failed portrait; the reverse is usually fine.
Anchors are where models fail silently. Ask for 'a tiefling' without specifics and you'll get random horn shapes and whichever skin color the model favors. Vague anchors don't produce vague results — they produce confidently wrong ones. Be exact: 'deep crimson skin, smooth black ram-curl horns, solid gold eyes with no visible pupils' leaves nothing to chance. Race pages like the tiefling reference list phrasings that reliably render.
Anchors are your consistency mechanism. If you'll ever generate this character again — new pose, new outfit, new session — repeating the same anchor block word-for-word is the single highest-leverage habit for keeping the face stable, before any reference-image feature enters the picture. Write your anchors once, save them verbatim, and vary only the gear and scene blocks. The character consistency guide builds a full workflow on top of this.
How do you translate backstory into things a camera could see?
The rule: a portrait can't show what happened, only the residue it left. For each backstory fact, ask 'what physical evidence would a stranger notice in ten seconds?' and prompt the evidence.
- 'Grew up poor' becomes patched coarse linen shirt, frayed cuffs, wiry build — not the word 'poor,' which models read weakly.
- 'Veteran of a losing war' becomes a scar through one eyebrow, a dented pauldron, tired eyes, upright military posture.
- 'Noble in exile' becomes a fine but travel-stained velvet coat, a signet ring on a cord around the neck.
- 'Made a pact with an entity she regrets' becomes faintly glowing violet sigils along one forearm, a guarded expression.
- 'Cheerful tavern regular' becomes laugh lines, flushed cheeks, an easy grin, firelight.
Notice how much work materials and condition do: travel-stained velvet tells a story that 'formerly wealthy' can't, because the model has seen thousands of captioned images of stained velvet and almost none captioned 'formerly wealthy.'
Expression and lighting are backstory tools too. A paranoid spy reads paranoid through eyes cut sideways, shoulders tense under hard rim lighting; the same face under soft candlelight with a relaxed jaw reads as someone finally safe. One or two emotional cues are enough — models handle 'weary but resolute' well and turn five stacked emotions into mush.
Cap it at three or four backstory-derived details per prompt. Pick the ones that make the character recognizable at a glance and let the rest live on your character sheet.
How does the same prompt change for Midjourney, DALL-E, and Stable Diffusion?
The five blocks stay the same everywhere; the packaging changes.
Midjourney (v7). Write the blocks as flowing prose, then append parameters: --ar 2:3 for a portrait crop, and --style raw if you want your palette and style words respected instead of Midjourney's default beautification. Keep --stylize at or below the default 100 when exact anchors matter. For repeat generations of the same character, v7 uses --oref (Omni Reference) with an image URL plus --ow for strength — the old --cref from v6 no longer applies. Full parameter coverage lives in the Midjourney D&D prompt guide.
DALL-E / ChatGPT. No parameters — you write conversationally and can specify things like 'portrait orientation' in plain words. One caveat: ChatGPT rewrites and expands your prompt before the image model sees it. Handing it a complete five-block spec and asking it to follow the description faithfully gives you far more control than a one-line request it will embellish for you. The ChatGPT portrait guide covers this plus content-policy workarounds.
Stable Diffusion. Convert prose to comma-separated tags, front-load the anchors, and stay under 75 tokens: fantasy character portrait, female half-orc mercenary, olive-green skin, small tusks, amber eyes, .... Add a short negative prompt and, for non-human races, consider a checkpoint or LoRA trained on D&D ancestries.
This is why one saved trait spec beats one saved prompt string: keep the blocks as your source of truth and re-render the packaging per tool.
What does a finished prompt look like for five example characters?
Five worked examples in the prose format (Midjourney, DALL-E, Flux, Leonardo). Note the identical block order in each; only the content changes. For Midjourney, append --ar 2:3 --style raw.
Wood elf ranger — muted and naturalistic; elf ranger was the second most common race-class combo in FiveThirtyEight's 2017 analysis of D&D Beyond character data:
A fantasy character portrait of a weathered wood elf ranger in her middle years, sun-browned skin, sharp green eyes, ash-blonde hair braided back from long pointed ears, a thin scar through one eyebrow. She wears a moss-green hooded cloak over hardened leather armor, a yew longbow across her back. Calm, watchful expression. Bust portrait, soft overcast daylight, muted natural palette, painterly digital art.
Tiefling warlock — exact anchors doing the anti-default work:
A fantasy character portrait of a tiefling warlock with deep crimson skin, solid gold eyes with no visible pupils, and smooth black horns curving back over dark swept hair. He wears a high-collared coat of embroidered midnight-blue brocade, a brass amulet glowing faintly at his chest. Sly, knowing half-smile. Head-and-shoulders close-up, dramatic rim lighting against a dark background, rich saturated colors, oil painting style.
Female half-orc mercenary — tusk size dialed explicitly so the model doesn't drift monstrous:
A fantasy character portrait of a female half-orc mercenary with olive-green skin, amber eyes, and small lower tusks just visible over her lip, black hair shaved on one side. She wears a scratched steel breastplate over rough wool, a notched greatsword hilt rising over one shoulder. Guarded, tired expression. Bust portrait, cold overcast light, muted desaturated palette, gritty digital painting.
Halfling rogue — age markers stacked to force adult proportions:
A fantasy character portrait of a middle-aged halfling rogue with adult proportions, laugh lines, graying stubble, and curly brown hair silvering at the temples. He wears a patched linen shirt under a dark leather jerkin, lockpicks tucked into a chest strap. Wry, confident smirk. Half-body framing, warm candlelit tavern glow, golden and warm palette, storybook gouache style.
Silver dragonborn paladin — race anchors front-loaded, everything else minimal:
A fantasy character portrait of a silver dragonborn paladin with a prominent reptilian snout, full facial scales, and pale slit-pupil eyes. Polished plate armor engraved with sun motifs, a white wool half-cape at one shoulder. Solemn, resolute bearing. Three-quarter portrait, golden hour sunlight breaking through clouds, high-contrast dramatic palette, epic digital painting.
Each lands in the 50-75 word band. Race-specific deep dives — dragonborn, tieflings, drow, halflings — get their own guides with per-generator variants.
Frequently asked questions
- Can I just paste my D&D character sheet into an AI image generator?
- It usually backfires. Character sheets are mostly invisible information: stats, skills, alignment, and backstory that a portrait can't show. Image models will latch onto random visible fragments and ignore the rest. Instead, pull only the visual facts off the sheet — race, build, coloring, gear — and rewrite them as a 40-80 word description of what the character looks like.
- Should I name my character's class in the prompt?
- Yes, but don't rely on it alone. Class words like ranger or warlock are useful shorthand that nudges pose, gear, and mood, but models interpret them loosely and generically. Pair the class name with two or three concrete signals — a yew longbow, glowing sigils on a forearm, a holy symbol — so the class reads even if the model treats the word itself weakly.
- Do I need to write 'D&D' or 'Dungeons and Dragons' in the prompt?
- No, and it rarely helps. 'A fantasy character portrait of...' does the same genre-setting work without pulling the output toward any one franchise look. Some tools also treat trademarked names inconsistently. Describe the race and gear directly — 'tiefling warlock with crimson skin and black horns' — and the D&D flavor comes through on its own.
- How do I stop the AI from making my character too young or too pretty?
- State age and texture explicitly, because models default to smooth, idealized twenty-somethings. Use phrases like 'in her fifties,' 'weathered face,' 'crow's feet,' 'graying at the temples,' or 'crooked nose, broken twice.' In Midjourney, adding --style raw and keeping stylize low also reduces the automatic beautification that erases the imperfections you asked for.
- What aspect ratio is best for a character portrait?
- Vertical ratios suit portraits: 2:3 is the common choice in Midjourney (--ar 2:3), and portrait orientation works in DALL-E if you request it in plain words. In Stable Diffusion, render near your model's native resolution — for SDXL that's around 832x1216 for a 2:3 portrait. Square 1:1 is better only when you plan to crop into a VTT token.
- Why does the AI ignore parts of my prompt?
- Usually the prompt is too long or the detail sits too late in it. Models weight early words more heavily, and past roughly 80-100 words of prose (or 75 tokens in Stable Diffusion) extra clauses dilute each other. Move the ignored detail earlier, cut filler words like 'beautiful' and 'detailed,' and drop any clause that wouldn't visibly change the image.
- Is there a template I can reuse for every new character?
- Yes — keep the block order fixed and swap the contents: role and ancestry, then permanent features like skin, eyes, hair, and race traits, then clothing and one prop with materials named, then expression, then framing, lighting, and art style. Arcane Portraits composes prompts in this structure for free from trait pickers, and the output text pastes into any generator.