How to Generate a D&D Party Portrait Without AI Mixing Up Your Characters
Quick answerAI mixes up party members because image models read the whole prompt as one bag of traits, so descriptors attach to the wrong character. Fix it by capping a single prompt at two or three characters with strongly contrasting anchors, using a tool with real multi-character controls, or generating each member separately and compositing.
Ask an image generator for "a tiefling warlock and a dwarf fighter" and there's a good chance you get two tieflings, a dwarf with horns, or one character wearing half of each outfit. This failure has a name — attribute bleeding — and it's baked into how text-to-image models read prompts, which is why rerolling the same prompt twenty times rarely solves it.
The fixes exist, but they're scattered across tool docs, Civitai workflow posts, and Discord threads. This guide consolidates them: how to write a two-character prompt that holds, which positional cues actually work, which generators have genuine multi-character features, and the generate-separately-then-composite workflow that reliably handles a full party of four or five.
Why does AI blend your party members together?
Text-to-image models don't parse your prompt like a person reading a sentence. The text encoder compresses the entire prompt into a fixed set of tokens, and during generation every descriptor can influence every region of the image. "Red skin" doesn't stay attached to "tiefling" — it floats in the same soup as "dwarf" and can land on either character. Researchers call this concept bleeding or attribute bleeding, and a 2024 paper on the problem ("Isolated Diffusion") traces it directly to the pre-trained text encoders that squeeze all prompt information into a shared representation.
The practical consequences for party portraits:
- Trait swapping. Character A gets Character B's hair color, horns migrate to the wrong head, armor styles merge.
- Face averaging. With weak individual anchors, the model produces two variations of the same face — your party becomes siblings.
- Count drift. Ask for four characters and get three or five, because the model treats the number as a loose suggestion.
Bleeding gets worse as prompts get longer. Two characters with three or four traits each is roughly the ceiling for a single unstructured prompt; a five-member party described in one paragraph is asking the model to keep twenty-plus attributes correctly paired, and it won't. That's why the real solutions are structural — fewer characters per prompt, tools that isolate each character's text, or separate generations — rather than better adjectives.
How many characters can one prompt reliably handle?
Two. With strong contrast between them, most current models — Midjourney, DALL-E/GPT image generation, SDXL, Flux — will keep two characters distinct most of the time. Three works occasionally if each character is visually loud in a different way. Four or more in one plain prompt is a coin flip on count alone, before you even get to trait accuracy.
The key to a stable duo prompt is contrast on every axis: different race, build, palette, and pose, so no descriptor could plausibly belong to both. A dwarf blacksmith and a slender elf mage hold up far better than two human fighters in similar armor.
Two adventurers side by side in a half-body portrait. On the left, a stocky dwarf woman with copper-red braided hair, soot-streaked freckled skin, and a scarred leather apron over a rust-colored wool tunic, arms crossed. On the right, a slender high elf man with silver hair tied back, pale gold eyes, and deep-blue brocade robes embroidered with moon sigils, holding a closed spellbook. Warm firelight from a hearth to the left, painterly digital painting, muted earthy palette, dark tavern background.
Notice the structure: each character gets one uninterrupted block of description, the blocks never interleave, and shared elements (lighting, style, background) come last. Interleaving — mentioning the dwarf, then the elf, then back to the dwarf's apron — measurably increases bleeding because trait-to-subject distance matters to the encoder. Our prompt generator builds single-character blocks in exactly this locked order, which is what makes them safe to place side by side in a duo prompt.
Do positional cues like "on the left" actually work?
Sometimes — and it depends heavily on the tool. Positional language is a real signal, not a placebo, but no mainstream generator treats it as a hard constraint.
- DALL-E / ChatGPT image generation follows spatial instructions best. Side-by-side prompt comparisons consistently show it sticking closer to literal wording, including placement, than Midjourney. "On the left... on the right..." is worth writing here.
- Midjourney treats position as a weak suggestion. It often honors left/right for two characters, but it prioritizes composition aesthetics over your layout. Don't build a workflow that depends on MJ placement.
- Stable Diffusion (plain prompt) is the weakest — positional words mostly just correlate loosely with layout. If placement matters in SD, use regional prompting instead of prose (next section).
Three tips that raise the hit rate everywhere:
- Lead with the count and arrangement: "Two adventurers standing side by side" before any individual description primes the model to allocate two figures.
- Bind position to the anchor, not the garnish: "On the left, a dwarf woman..." works better than burying "standing on the left" mid-description.
- Use the environment as a separator: "...the dwarf beside the forge, the elf by the window" gives the model scene logic, which models follow more readily than abstract left/right.
Even when positions land correctly, positional cues do nothing to stop trait bleeding — they solve layout, not identity. Treat them as one layer of the fix, not the fix.
Which tools have real multi-character features?
A few generators have moved beyond "hope the prompt works" with structural multi-character support:
- NovelAI (V4/V4.5) has the most direct implementation: separate character prompt boxes — up to six per image — each processed as its own conditioning stream, plus a position selector per character. Traits in one character's box can't easily leak into another's. It's anime-focused, so it suits parties in an anime style far better than gritty oil-painting realism.
- Stable Diffusion + Regional Prompter (AUTOMATIC1111/Forge extension) divides the canvas into regions and applies a separate prompt to each, split by the
BREAKkeyword — left column gets your barbarian's tags, right column gets your wizard's. ComfyUI users get the same result with regional conditioning or Forge Couple. This is the most controllable option, at the cost of running SD locally. - Flux Kontext accepts multiple reference images (or a stitched pair) and can place two already-generated characters into one new scene — genuinely useful as a compositing step: generate each member solo, then hand Kontext both portraits and ask for the group shot.
- Midjourney has no per-character prompt isolation. Omni Reference (
--oref, a v7 parameter) takes exactly one reference image per prompt; the community workaround is stitching two character references side by side into a single file, which works but sometimes merges them. What MJ does offer is repair: the Editor's Vary Region tool lets you select one botched party member (selections of roughly 20-50% of the image work best) and re-prompt just that region with Remix enabled. - ChatGPT accepts multiple uploaded reference images in one conversation — "put these two characters at a campfire" — and follows it reasonably well, with the usual caveat that faces drift from the references.
When should you generate members separately and composite?
The moment your party hits four, or the moment any two members share a race or palette. Separate generation is the only approach where each character's traits are guaranteed to survive, because each prompt contains exactly one character.
The workflow:
- Write one prompt per member with identical shared settings. Same framing, same lighting direction, same style block, same background treatment. Only the character block changes. A half-body framing is the sweet spot — enough costume to read class and status, no feet or complex leg poses to align later.
- Give every prompt the same light source and direction. If one portrait is lit from the left and another from the right, no amount of editing hides the seam. Something directional but soft, like window light from the upper left, composites cleanly.
- Keep backgrounds plain and dark. A flat, neutral backdrop makes cutting out each figure trivial.
- Composite in any editor. Photopea (free, in-browser) or GIMP handles it: remove backgrounds, arrange figures with consistent scale — remember your halfling stands chest-high to the humans — and drop in one shared background.
- Optional unifying pass. Run the composite through img2img at low denoising (SD/Flux), or hand it to Flux Kontext or ChatGPT with "repaint this as one cohesive scene," to blend edges and unify grain.
Half-body portrait of a half-orc barbarian woman, olive-green skin, small lower tusks, black hair shaved on one side, a jagged scar through her right eyebrow, wearing a fur-trimmed leather harness and iron arm rings, confident scowl. Body angled slightly left, plain dark neutral background, soft window light from the upper left, painterly digital painting, muted desaturated palette.
Generate her three companions with the same closing sentence, swap only the character block, and the four portraits will sit together like panels from one artist.
How do you keep one art style across separately generated characters?
Style drift is the composite workflow's failure mode: four technically correct portraits that look like four different artists. Style lives in the trailing third of your prompt, so lock that block word-for-word across every member.
A reusable style block names four things explicitly:
- Medium: "painterly digital painting with visible brushwork" — never just "fantasy art," which each generation reinterprets.
- Palette: a named scheme like a muted, desaturated palette, so one member doesn't come back neon while the others stay somber.
- Lighting: identical source and direction in every prompt, e.g. "warm firelight from the lower left."
- Framing and background: same crop, same backdrop description.
Tool-specific reinforcement helps. In Midjourney, generate the first member you're happy with, reuse its style reference (--sref with the same code or image) plus the same --stylize value for the rest, and keep the parameter string identical. In Stable Diffusion, one checkpoint plus one style LoRA at a fixed weight across all runs does the same job. In ChatGPT, generate all members in a single conversation and say "same art style as the previous image" each time — style holds within a session far better than across sessions.
Batch the whole party in one sitting. Models get updated, and default aesthetics shift between versions; a portrait generated months later on a new model version rarely matches, even with an identical prompt. If you later add a player mid-campaign, regenerate the newcomer with your saved prompts and settings — this is exactly the situation where keeping the full prompt text for every member, not just the images, pays off. The same discipline that keeps a single character consistent across images keeps a party consistent across members.
Frequently asked questions
- Can Midjourney use two character reference images in one prompt?
- No. Omni Reference in Midjourney v7 accepts exactly one image per prompt. The common workaround is stitching both character references side by side into a single image file and describing each character distinctly in the prompt, but results are mixed — Midjourney sometimes merges the two references into one hybrid character.
- What aspect ratio should I use for a D&D party portrait?
- Go wider as the party grows. A duo fits 3:2 or 4:3; four or five characters standing in a line need 16:9 or wider so each figure gets enough horizontal pixels for a readable face. Square formats crowd group shots and push the model to shrink faces, which is where detail errors concentrate.
- Why does everyone in my group portrait have the same face?
- With several characters in one prompt, the model spreads its attention thin and falls back on its default face for each figure, producing a party of near-identical siblings. Give each character one distinctive facial anchor — a broken nose, heavy jaw, facial scar, or distinct age — and keep races and builds contrasting so the model can't average them.
- Can ChatGPT combine two existing character images into one scene?
- Yes. Upload both portraits in the same conversation and ask it to place the two characters together in a described scene. It handles the combination reasonably well, though faces usually drift somewhat from the references. Expect to re-upload the reference images in every new conversation, since it doesn't remember characters across sessions.
- How do I add a new party member to an existing group portrait?
- Don't regenerate the whole group — you'll lose the members that already look right. Generate the newcomer solo with the same framing, lighting direction, and style wording as the original, then composite them in. Alternatively, in Midjourney, use Vary Region to select an empty area of the existing image and prompt the new character into it.
- Is there a free way to make an AI party portrait?
- Yes. Generate each member free in ChatGPT's free tier or a free Stable Diffusion service, then composite the figures in Photopea, a free in-browser editor. Writing the prompts costs nothing either — Arcane Portraits composes the per-character prompt text for free, and you paste it into whichever generator you use.
- Do negative prompts help stop character blending?
- Only at the margins. Negative prompts like "extra person, merged faces, duplicate character" can reduce count errors in Stable Diffusion, but they can't tell the model which traits belong to which character, so they don't fix attribute bleeding. Structural fixes — regional prompting, per-character prompt boxes, or separate generations — are what actually solve trait mixing.
- Should party members face the camera or each other in a group portrait?
- Angle each member slightly toward the group's center rather than flat at the camera. It reads as a team instead of a police lineup, and the varied angles give each character a different silhouette, which helps the model keep them distinct. If you composite separately generated portraits, generate members in mirrored angles so they face inward when arranged.