How to Make Consistent AI Characters Across Multiple Images (Midjourney, Leonardo, Stable Diffusion, Flux)

Quick answerLock a written trait spec — the same 5-7 identity phrases repeated verbatim in every prompt — then add your tool's reference feature on top: Omni Reference (--oref) with --ow 300-400 in Midjourney v7, Character Reference at mid strength in Leonardo, a LoRA trained on 15-30 images for Stable Diffusion, and Kontext edits from one master image for Flux.

You finally roll a portrait that looks like your character — then the next generation hands you a different jaw, new hair, and eyes that changed color. Image models have no memory: every generation starts from fresh noise, and anything your prompt doesn't pin down gets re-rolled. Consistency is a two-layer problem, and most guides only teach one layer.

The first layer is prompt discipline: a fixed identity block you repeat word-for-word, never paraphrased. The second is tool features: Midjourney v7's Omni Reference, Leonardo's Character Reference, Stable Diffusion LoRAs, and Flux Kontext each anchor identity differently, with different strengths and failure modes. This guide covers both layers, side by side, so you can keep one character recognizable across a whole campaign, comic, or book.

Why does your character change in every generation?

Diffusion models don't remember your last image. Each generation samples from random noise, guided only by the text you gave it this time. Your prompt isn't a specification — it's a probability distribution. "An elf ranger with red hair" is compatible with thousands of different faces, so the model picks a different one each run.

Drift comes from three specific sources:

  • Underspecified traits. Anything you didn't state — jaw shape, skin tone, age, eye color — is a free variable the model re-rolls every time.
  • Paraphrased prompts. If you wrote "copper hair" yesterday and "auburn hair" today, the model treats those as different characters. Synonyms land in different places in the model's learned space; word-for-word repetition is what keeps you in the same region.
  • Competing signals. Style parameters, new scene elements, and other characters in the frame all pull attention away from your identity description. This is why a character who's stable in bust portraits falls apart in a busy full-body action scene.

Seeds don't rescue you here. A fixed seed with an identical prompt reproduces a near-identical image — useful for testing one wording change at a time — but the moment you change the pose or background, the same seed produces a different face. Seeds lock the noise, not the identity.

Which identity anchors should you lock, and which can vary?

Pick 5-7 permanent visual anchors, write them once, and repeat them verbatim in every prompt. Lock these:

  • Face structure: one concrete phrase — "sharp angular jaw," "round face with heavy brows"
  • Skin tone: a specific color, not "dark" or "fair" — "sun-weathered light brown skin"
  • Eye color and hair color + style together: "amber eyes, copper hair in a loose side braid"
  • Age band: "mid-40s" — age is one of the first things models silently reset to mid-20s
  • One asymmetric distinguishing mark: a scar, tattoo, notched ear, or heterochromia. This is your highest-value anchor because you can verify it at a glance, and it survives style changes that soften everything else.

Everything else is a variable you're allowed to change per image: pose, expression, background, lighting, and framing. Outfits sit in between — if the outfit is the identity (a warlock's patron robes, a city guard's uniform), lock it; otherwise describe each new outfit fully while leaving the face block untouched.

Here's a locked identity block in a full reference-portrait prompt, using a wood elf ranger:

Bust portrait of a female wood elf ranger, mid-40s, sun-weathered light brown skin, sharp angular jaw, amber eyes, copper hair in a loose side braid, thin crescent scar through her left eyebrow, wearing a moss-green hooded cloak over hardened leather armor, calm watchful expression, soft window light, muted natural palette, digital painting, plain neutral background

Everything before "wearing" is the identity block. It never changes. Everything after it can.

How does Omni Reference (--oref) work in Midjourney v7?

Omni Reference is Midjourney v7's mechanism for putting a specific character (or object) into new images. If you learned --cref in v6, unlearn it: --cref doesn't work in v7 — it's compatible with v6 and Niji 6 only, and Omni Reference replaces it. A lot of ranking tutorials still teach the dead parameter.

Usage: on the web editor, drag your reference image into the omni-reference slot in the prompt bar and set the strength slider. On Discord, append --oref <image URL> to your prompt. You get one reference image per prompt.

Strength is controlled by --ow (omni-weight), which runs from 0 to 1000 with a default of 100:

  • --ow 25-100: loose influence; use when transforming the character into a different art style and re-describe the traits you want kept
  • --ow 300-400: the working range for character consistency — face and identity hold while pose and scene stay flexible
  • --ow above 400: stronger lock on face, clothing, and details, but poses stiffen, the output starts mirroring the reference composition, and the docs warn results get unpredictable unless you raise --stylize alongside

Two things the docs stress: --ow competes with --stylize and --exp for influence, so if you run high stylize values, raise --ow to compensate. And Omni Reference works with your text, not instead of it — keep your full identity block in the prompt ("copper hair, crescent scar through left eyebrow") so the text and the image pull in the same direction.

How do you keep a character consistent in Leonardo AI?

Leonardo's tool for this is Character Reference, found in the Image Guidance panel to the left of the prompt bar. Upload a clean, well-lit face shot of your character — a head-and-shoulders crop of your best previous generation works well — select Character Reference as the guidance type, and set the strength.

Strength is the whole game here:

  • High: keeps the character nearly identical; best for small changes — new expression, slightly different angle, background swap
  • Mid: the sweet spot for most work; core facial features hold while you change pose, outfit, or setting
  • Low to Mid: use when the scene changes a lot, or when other characters share the frame — high strength with a second character present tends to stamp your reference's face onto everyone

Character Reference runs on Leonardo's SDXL-family models (Leonardo Kino XL and Diffusion XL are commonly recommended pairings), so check your model selection if the option looks unavailable.

The same prompt-side rule applies as everywhere else: the reference image handles the face, but your text still decides hair, marks, and build in the new image. Keep the identity block verbatim in every Leonardo prompt and let Character Reference reinforce it rather than replace it. If the face drifts anyway, raise strength one step before rewriting the prompt — changing both at once makes it impossible to tell which fix worked.

When do you need a LoRA in Stable Diffusion?

A LoRA is a small add-on model trained on images of one subject — the heaviest but strongest consistency tool. Reference features borrow a face from one image; a LoRA learns your character and reproduces them from a trigger word, at any angle, in any scene, indefinitely.

It's worth the effort when: you need dozens of images of the same character (a comic, a campaign's worth of scenes, a book series), you're working in Stable Diffusion anyway, or your character is non-human enough that face-matching features misfire — reference tools are tuned for human faces and get less reliable on scaled, tusked, or horned heads.

Community training guides on Civitai converge on similar dataset advice: roughly 15-30 varied images as a practical starting set (more is better), with about a third to half as face close-ups and the rest half-body to full-body, across different angles, expressions, backgrounds, and outfits. Variety matters more than count — 20 near-duplicates teach the model your background, not your character.

The chicken-and-egg problem — you need consistent images to train on before you have a LoRA — is exactly what the prompt-side discipline solves. Generate a large batch with your locked trait spec plus a reference feature, keep only the shots where every anchor is correct, and train on those. Civitai's on-site trainer handles the process without a local GPU; pick a short unique trigger word and include it in every prompt afterward.

How do you change pose or outfit without losing the face?

This is where most consistency breaks: you edit the prompt to change the pose, accidentally reword the identity block, and get a stranger in the right costume. The fix is structural — treat your prompt as two parts, an untouchable identity block and a swappable scene clause, and only ever edit the second.

Compare this to the reference portrait from earlier — same character, new everything else:

Full-body shot of a female wood elf ranger, mid-40s, sun-weathered light brown skin, sharp angular jaw, amber eyes, copper hair in a loose side braid, thin crescent scar through her left eyebrow, drawing a longbow on a rain-soaked cliff edge, wind pulling at her moss-green cloak, storm light, dramatic low angle, digital painting

The identity block is character-for-character identical; only the action, framing, and lighting changed. Pair it with your tool's reference feature at moderate strength — Midjourney around --ow 300-400, Leonardo at Mid — because maximum strength copies the reference's pose along with its face, which fights the new pose you asked for.

Flux handles this differently and, for edits, better. Flux.1 Kontext is instruction-based: you give it your existing portrait plus an instruction like "same character now sitting by a campfire, keep the exact face, hair, and scar" and it edits while preserving identity. The one discipline that matters: always edit from your original master image, never from a previous edit. Each edit loses a little fidelity, and chained edits compound the drift until the character melts.

How does a fixed, repeatable trait spec prevent drift?

Every technique above depends on the same underlying asset: a canonical written description of your character that never varies. Not a vibe, not a backstory — a spec. The reason is mechanical: image models map words to regions of learned visual space, and near-synonyms are not the same region. "Copper hair," "auburn hair," and "reddish-brown hair" are three different instructions. Humans paraphrase naturally; models punish it.

So write the spec once and stop retyping from memory:

  1. Fix values for the identity fields: ancestry, age band, skin tone, face structure, eye color, hair color and style, one distinguishing mark, build
  2. Order them the same way every time — front-load them, since earlier tokens carry more weight in most models
  3. Store the finished block somewhere copy-pasteable, and only ever append scene clauses after it

This is the job the Arcane Portraits generator automates: you pick your character's traits once across its ~25 fields — race, features, clothing, materials, lighting, palette, framing — and it composes the same phrasing for the same trait every single time. No accidental synonym swaps, no forgotten scar. Save the prompt, and the identity block is reusable across Midjourney, Leonardo, Stable Diffusion, and Flux with only the tool-specific parameters changing.

Before generating any scene art, spend the spec on a neutral reference first — front-facing, soft even light, plain background — because that image becomes your --oref input, your Leonardo reference, and your LoRA seed data. The full workflow is covered in our character reference sheet guide.

Frequently asked questions

Does using the same seed keep a character consistent in Midjourney?
No. A fixed seed with an identical prompt reproduces a nearly identical image, which is useful for testing one wording change at a time. But as soon as you change the pose, outfit, or background, the same seed produces a different face. Seeds lock the starting noise, not the character's identity. For real consistency in v7, combine a verbatim-repeated trait description with Omni Reference (--oref).
Can I still use --cref in Midjourney v7?
No. Character Reference (--cref) only works with Midjourney version 6 and Niji 6; version 7 doesn't support it. Version 7 replaced it with Omni Reference: add --oref with an image URL on Discord, or drag an image into the omni-reference slot on the web editor, and control strength with --ow (0 to 1000, default 100). Values around 300 to 400 work well for character consistency.
Does character consistency work for non-human races like tieflings or dragonborn?
Partially. Reference features are tuned for human faces, so they hold humans, elves, and half-orcs better than heavily non-human heads with horns, scales, or snouts. For strongly non-human characters, lean harder on the text: lock concrete anchors like horn shape, scale color, and eye type, repeat them verbatim, and consider a Stable Diffusion LoRA if you need many images, since it learns the actual anatomy.
Is there a free way to keep an AI character consistent across images?
Yes. The prompt-side method costs nothing: write a fixed identity block of five to seven permanent traits and paste it verbatim into every prompt, changing only the scene clause. Tool-wise, Leonardo offers a free daily token allowance that covers Character Reference, and training a LoRA is possible for free via community trainers. Midjourney's Omni Reference requires a paid plan because Midjourney has no free tier.
Can ChatGPT or DALL-E keep a character consistent across images?
Imperfectly. ChatGPT's image generation can reference earlier images in the same conversation, which helps for a handful of variations, but there is no strength control and identity drifts over longer sessions. Your best lever is the same as everywhere else: keep a fixed written description of the character and paste it in full into every request instead of saying 'the same character as before.'
What makes a good reference image for --oref or Character Reference?
A clean, front-facing or three-quarter head-and-shoulders shot with even lighting, a plain background, and no other characters in frame. Busy backgrounds and dramatic shadows leak into new generations, and extra figures confuse face matching. Generate this neutral reference deliberately before making any scene art, and keep using that same master image rather than switching to whichever recent output looks nice.
Why does my character's face change when I add other characters to the scene?
Two effects stack: your identity description now competes with a second character's description for the model's attention, and features bleed between figures, so hair or skin tones swap. Mitigate it by keeping each character's anchors short and distinct, lowering reference strength so one face isn't stamped onto everyone, or generating characters separately and compositing them in an image editor.
How many images do I need to train a character LoRA?
Community guidance on Civitai suggests roughly 15 to 30 varied images as a practical starting dataset, though more helps. Aim for about a third to half face close-ups and the rest half-body to full-body shots, across different angles, expressions, outfits, and backgrounds. Variety matters more than volume: twenty near-identical portraits teach the model your lighting and background instead of your character.
Open the generatorBrowse the library
How to Make Consistent AI Characters Across Multiple Images (Midjourney, Leonardo, Stable Diffusion, Flux) — Arcane Portraits