Stable Diffusion for D&D Portraits: Best Checkpoints, LoRAs, and Prompt Settings

Quick answerStock Stable Diffusion doesn't know D&D races, so start from a fantasy-trained model: Dungeons and Diffusion or A-Zovya RPG Artist Tools on SD 1.5, or an SDXL checkpoint plus the Fantasy Races XL LoRA. Render portraits at 832x1216 (SDXL) or 512x768 (SD 1.5), front-load race anchors into the first 75 tokens, and keep negative prompts short.

Most Stable Diffusion prompt guides hand you a comma-separated string and skip the part that actually decides whether your dragonborn looks like a dragonborn: which model you run it on. The base models were never trained on D&D races — that's why community checkpoints like Dungeons and Diffusion exist at all — and no prompt phrasing fully compensates for a model that has never seen a tabaxi.

This guide covers the Stable Diffusion path end to end: the Civitai checkpoints and LoRAs built for D&D character portraits, how the 75-token CLIP limit changes prompt order, the exact portrait resolutions SDXL and SD 1.5 were trained for, which negative prompts still earn their place, and how to convert a descriptive prose prompt into the comma-tag format SD 1.5 models prefer.

Why do D&D races fail on stock Stable Diffusion?

Stable Diffusion's base models learned from broad web image-caption pairs, and D&D races are vanishingly rare in that data. Ask a stock checkpoint for a dragonborn and you'll usually get a human in scale-patterned armor — the model knows "dragon" and "born" but has no visual concept binding them. A tiefling tends to come back as a human with costume-shop horns pasted on, and drow prompts drift toward ordinary elves with light skin because "dark elf" reads as a lighting note, not an anatomy spec.

Careful prompting narrows the gap. Spelling out anatomy — prominent reptilian snout, full facial scales, slit-pupil eyes instead of just "dragonborn" — pushes the model toward features it does know from dragon and lizard imagery. But you're steering against the training data on every generation, and the failure rate stays high for the least-human races.

The community's answer was to fix the model instead of the prompt. Fine-tuned checkpoints and LoRAs trained on tagged D&D character art give the model real visual concepts for each race, triggered by the race name itself. Dungeons and Diffusion — one of the earliest and best-known — exists precisely because stock SD can't draw these races. On the right model, "dragonborn" stops being a gamble and becomes a reliable tag.

Which checkpoints and LoRAs are built for D&D portraits?

All of these live on Civitai, the main hub for community Stable Diffusion models. Check the base model before you download: an SDXL LoRA does nothing on an SD 1.5 checkpoint, and vice versa.

  • Dungeons and Diffusion (SD 1.5 checkpoint) — fine-tuned specifically on D&D character art; v3 was trained from Protogen 3.4. It answers to 29 race trigger words, including dragonborn, drow, tiefling, tabaxi, aarakocra, kenku, goliath, firbolg, and warforged, plus class tags. Put the race word near the front of the prompt.
  • A-Zovya RPG Artist Tools (SD 1.5 checkpoint) — built for RPG art in general: characters, creatures, ruins, book covers. V4 leans more illustrative. Less race-specific than Dungeons and Diffusion, but stronger as an all-round fantasy painter.
  • Fantasy Races XL (SDXL LoRA) — covers 16 TTRPG races, including elf, orc, all four genasi, gith, changeling, shifter, beastfolk, centaur, merfolk, and warforged. Each race name is its own trigger word, and it stacks on top of whatever SDXL checkpoint you already like.
  • Per-race LoRAs — for the hardest races, search Civitai for the race by name. Dedicated dragonborn LoRAs exist solely to fix the head, and they outperform any prompt-only workaround.

A good default pairing: a general-purpose SDXL checkpoint you like the painting style of, plus Fantasy Races XL, plus a per-race LoRA only if the race still won't land.

How do you use a race LoRA without it taking over the image?

A LoRA is a small patch applied on top of the checkpoint at generation time, and its strength is a dial you control. In AUTOMATIC1111 the syntax is <lora:FantasyRacesXL:0.8> in the prompt itself; in ComfyUI you set the weight on the LoRA loader node.

Three rules keep it under control:

  1. Start at 0.7–0.8, not 1.0. At full strength many LoRAs impose their training set's art style and even a recurring face on top of the race features. If every seed produces the same character, the weight is too high.
  2. Use the trigger word, and use it early. The LoRA only fires reliably when its trigger appears in the prompt — orc, water genasi, whatever the model page lists. Put it in the first few tags so it lands in the first CLIP chunk (see the token section below).
  3. Stack style and race deliberately. A race LoRA plus a style LoRA both pull on the output; if you run both, drop each weight (0.6 and 0.6 beats 1.0 and 1.0) and reroll until they balance.

If you need one specific character to stay consistent across many images — not just the race — that's a different problem, solved by training a small LoRA on that character. Our consistent character guide covers when that's worth the effort.

How does the 75-token limit change how you write prompts?

Stable Diffusion reads your prompt through CLIP, whose context window is 77 tokens — two of which are reserved start/end markers, leaving 75 for you. Tokens aren't words: common words are one token, but rare and compound words split ("dragonborn" costs more than "knight"), and every comma counts.

AUTOMATIC1111 and ComfyUI accept longer prompts by splitting them into 75-token chunks and encoding each chunk independently, then concatenating the results. That independence is the trap: a descriptor in chunk two can't reliably modify a noun back in chunk one. If "crimson" lands in a different chunk than "eyes," the binding weakens — which is how you get red armor and brown eyes.

Practical consequences:

  • Front-load identity. Race anchor and character type first, then permanent features (skin, eyes, hair, scars), then gear, then framing, lighting, and style. Words earlier in a chunk also carry somewhat more weight.
  • Keep bound pairs adjacent. "Obsidian black skin" must travel as one phrase, never split across a chunk boundary.
  • Budget around 40–60 tokens. You rarely need the full 75. Cut filler quality tags ("masterpiece, best quality" does little on modern checkpoints) before cutting character detail.
  • Use `BREAK` deliberately. In AUTOMATIC1111, an uppercase BREAK pads the current chunk and starts a fresh one — useful for cleanly separating subject from background instead of letting the split fall mid-phrase.

What resolution should you render portraits at (SDXL vs SD 1.5)?

Generate at the model's native resolution, then upscale. Sampling far outside the training resolution is where twin heads and stacked torsos come from.

SDXL was trained at 1024x1024 and fine-tuned across a set of aspect-ratio buckets. For portraits, use:

  • 832x1216 — 2:3 portrait, the standard choice for character art and card-style portraits.
  • 896x1152 — a slightly wider 7:9, good for a bust or head-and-shoulders crop where you want more shoulder room.

Stay near one megapixel total; SDXL degrades on far larger or smaller canvases.

SD 1.5 is native 512x512. The workable portrait size is 512x768 — push it to 1024 tall directly and the model tiles the composition, giving your paladin a second head where the empty canvas ran out of training precedent. To get large output from a 1.5 checkpoint, generate at 512x768 and use hires fix (built into AUTOMATIC1111) or a separate upscaler pass.

For final use, upscale rather than regenerate: a 2x upscale of a clean 832x1216 render covers VTT tokens, character sheets, and screen wallpapers. Tight bust framing also concentrates the model's resolution budget on the face — the same pixels spread over a full-body shot leave the face too small to render cleanly, which is the root of most mushy-face results.

Which negative prompts still matter in SDXL?

The giant copy-paste negative prompt is an SD 1.5 habit. SDXL handles anatomy and image quality far better on its own, and model creators on Civitai note that some SDXL checkpoints actually perform worse with long 1.5-era negative lists — every negative token steers the image, and fifty of them steer it somewhere muddy.

For SDXL, start minimal and add only what you can see going wrong:

Negative prompt: blurry, lowres, watermark, text, bad anatomy

Add extra fingers, deformed hands only when hands are actually in frame — with a bust portrait they aren't, which is the cheapest hand fix there is. Add style exclusions (photo, 3d render) when a painterly checkpoint drifts toward realism you don't want.

For SD 1.5 checkpoints, the anatomy block still earns its place:

Negative prompt: deformed, bad anatomy, extra limbs, extra fingers, fused fingers, blurry, lowres, watermark, text, signature

One D&D-specific negative worth knowing: on race-tuned models, adding human face to the negative prompt helps hold a dragonborn snout or orcish jaw when the checkpoint keeps sliding back toward its human default. Use it only for the strongly non-human races — on a half-elf it will fight you.

How do you convert a prose prompt into comma-tag format?

SDXL follows natural language reasonably well, so a descriptive prose prompt — the kind our prompt generator composes — often works pasted in as-is. SD 1.5 checkpoints, though, were trained on tag-style captions and respond better to comma-separated noun phrases. Converting is mechanical once you know the rules:

  1. Keep noun phrases, drop grammar. Verbs, articles, and connectives ("wearing," "with a," "standing in") spend tokens without adding visual information.
  2. One visual fact per tag. Split compound sentences into atomic descriptors.
  3. Never separate a modifier from its noun. "Deep green skin" stays intact as one tag.
  4. Preserve the order: race and type first, permanent features, gear, framing, lighting, style.
  5. Weight the load-bearing anchors. A1111's (tag:1.2) syntax boosts a tag by 20% — spend it on the race anchor, not on "masterpiece."

Here's the same half-orc before and after. Prose, for SDXL or any prose-friendly tool:

A weathered half-orc mercenary with deep green skin, small upward-curving tusks, and a scarred brow, her black hair in a tight braid. She wears scratched steel pauldrons over boiled leather. Bust portrait, candlelit tavern interior, warm amber light with deep shadows, painterly digital art.

Converted for an SD 1.5 checkpoint:

(half-orc:1.2) female mercenary, deep green skin, small tusks, scarred brow, black braided hair, scratched steel pauldrons, boiled leather armor, bust portrait, candlelit tavern, warm amber light, deep shadows, painterly digital art

The tag version spends about 55 tokens — inside one CLIP chunk, identity first. The candlelit lighting and digital painting tags at the end set mood and style without competing with the race anchor at the front.

Frequently asked questions

Is Stable Diffusion free for making D&D character portraits?
Yes. The model weights are open and free, and running them locally through AUTOMATIC1111 or ComfyUI costs nothing beyond electricity. Community checkpoints and LoRAs on Civitai are free downloads too. What you need is hardware: SD 1.5 runs on modest GPUs with around 4 GB of VRAM, while SDXL is comfortable from about 8 GB up. Hosted services that run Stable Diffusion for you typically charge per image or by subscription.
What's the difference between a checkpoint and a LoRA?
A checkpoint is a complete model — several gigabytes containing everything the system knows about images. A LoRA is a small add-on file, usually under a few hundred megabytes, that patches extra knowledge onto a checkpoint at generation time, like a specific fantasy race or art style. You always need a checkpoint; LoRAs are optional extras layered on top, and each LoRA only works with checkpoints built on the same base model.
Should I use SD 1.5 or SDXL for fantasy portraits?
SDXL, if your hardware runs it. It produces cleaner faces and anatomy at 1024-class resolution, follows natural-language prompts better, and needs far shorter negative prompts. SD 1.5 remains relevant for two reasons: it runs on weaker GPUs, and some of the best D&D-specific fine-tunes, like Dungeons and Diffusion, were built on it. Many players run SDXL as the default and keep a 1.5 race model for the stubborn non-human races.
Can I reuse my Midjourney prompts in Stable Diffusion?
Partially. Strip all Midjourney parameters first — --ar, --stylize, --oref and the rest mean nothing to Stable Diffusion and just waste tokens. The descriptive text itself works reasonably well in SDXL, which handles prose. For SD 1.5 checkpoints, convert the prose into comma-separated tags and move the race and character type to the front. Expect a different aesthetic either way: Midjourney applies heavy house styling that no SD checkpoint replicates exactly.
Can I train a LoRA on my own character?
Yes, and it's the strongest consistency tool Stable Diffusion has. You collect a set of images of the character — typically 15 to 30, generated or drawn, with varied poses and angles but identical identity features — and train a small LoRA on them. Afterward, a trigger word reproduces that exact character in new scenes, outfits, and lighting. It takes more setup than prompt-only tricks, but nothing else holds a face as reliably.
Where do I download Stable Diffusion models safely?
Civitai is the main community hub for checkpoints and LoRAs, and Hugging Face hosts the official base models. Prefer files in safetensors format, which cannot execute code when loaded, over the older ckpt format, which can. On Civitai, check the listed base model before downloading so it matches your setup, and skim recent user images and comments to confirm the model actually produces what its gallery promises.
Do I need a gaming PC to run Stable Diffusion?
No, but you need access to a GPU somewhere. Locally, SD 1.5 runs on cards with around 4 GB of VRAM and SDXL wants roughly 8 GB or more; Apple Silicon Macs work too, just slower. Without capable hardware, hosted options run the same open models in a browser — Civitai has on-site generation, and various cloud GPU services rent by the hour. You keep the model choice and prompt control; only the computer is elsewhere.
Open the generatorBrowse the library
Stable Diffusion for D&D Portraits: Best Checkpoints, LoRAs, and Prompt Settings — Arcane Portraits