Extra Fingers, Melted Armor, Wonky Eyes: How to Fix Common AI Portrait Errors
Quick answerThe fastest fix for extra fingers is to remove hands from the frame: a head-and-shoulders or bust crop avoids the problem entirely and gives the model more pixels per face. For flaws worth keeping the image over, inpaint the region (Midjourney's editor, Stable Diffusion's ADetailer) instead of rerolling, and keep negative prompts short on modern models.
Most advice about AI hand errors is a 2022-era negative prompt list: paste deformed hands, extra fingers, bad anatomy and hope. That boilerplate was written for Stable Diffusion 1.5, does little on SDXL, and does nothing on Flux, which ignores negative prompts entirely. Meanwhile the two levers that actually prevent extra fingers, melted breastplates, and mismatched eyes — framing and art style — barely get mentioned.
This guide covers the failure mechanics (so you stop fighting the model where it can't win), the prompt-side choices that avoid errors before generation, per-tool negative prompt guidance that reflects how each model works now, and when to inpaint a flaw instead of rerolling the whole portrait.
Why does AI get hands, fingers, and armor wrong?
Diffusion models learn statistical patterns from billions of images. They don't know a hand has five fingers and 27 bones — they know what hand-shaped pixel regions tend to look like. That works badly for hands for two reasons. First, training data: hands are small in most photos, often blurred, half-occluded, or gripping something. Faces get the opposite treatment — front-lit, in focus, centered — which is why faces improved much faster than hands across model generations. Second, configuration space: a face has one basic layout, while a hand can fold, splay, point, and grip in thousands of poses. The model has seen too many configurations at too few pixels to internalize the rule "exactly five fingers, always."
Armor melts for the same underlying reason. Plate armor has structural logic — pauldrons overlap in one direction, straps anchor to something, a breastplate ends at a defined edge. The model reproduces the texture of armor (rivets, engraving, reflections) without the load-bearing logic, so you get straps to nowhere and steel that fades into cloth at the waistline.
The practical takeaway: you can't fix a structural-understanding problem with more adjectives. perfectly anatomical hands teaches the model nothing. What works is reducing how much structure you ask for in one image — which is what the next three sections are about.
Can framing choices prevent errors before they happen?
Framing is the cheapest fix in this entire guide, and it works before generation instead of after. It does two things at once.
First, it can remove hands from the image entirely. A head-and-shoulders close-up crops at the collarbone — no hands, no fingers, nothing to count. A bust portrait crops at the chest and usually keeps hands out of frame too. For a character portrait or VTT token, that's rarely a sacrifice; most published D&D art is framed this way.
Second, tighter crops concentrate resolution where errors are most visible. In a 1024x1024 full-body render, the face might occupy 100 pixels of height — too few for the model to draw two matching eyes. The same face in a head-and-shoulders crop gets 400-600 pixels, and eye, teeth, and skin errors mostly disappear on their own.
When you genuinely need hands in frame — a half-body shot of a wizard with a staff — give them a job. A hand fully wrapped around a staff, mug, or bow stave has far fewer valid configurations than fingers splayed in the air, and models handle the gripped pose much more reliably:
Half-body portrait of an elf archer resting a longbow across her shoulders, both hands wrapped firmly around the bow stave, braided copper hair, forest-green cloak with a single leather strap across the chest, soft overcast daylight, clean anime style with crisp lineart and flat cel shading
Save full-body framing for when you actually need it (reference sheets, tokens with visible stance), and expect to spend a fix pass on it.
Which art styles hide flaws, and which expose them?
Style choice changes how forgiving the same error is. Photorealism is the least forgiving: human brains are hyper-tuned to real faces, so a photoreal portrait that's 95% right reads as unsettling rather than almost-perfect. A 3D-render style sits in the same uncanny zone — clean, smooth surfaces give mistakes nowhere to hide.
Painterly styles buy you slack. In an oil painting style with visible brushwork, a slightly-off knuckle reads as a loose brushstroke; the same knuckle in a photo style reads as a deformity. Watercolor goes further — soft edges and pigment blooms blur exactly the fine detail the model gets wrong. Gouache and textured digital painting sit in between: painterly enough to forgive, crisp enough to look like modern character art.
Anime and comic styles cheat differently: they simplify. Stylized hands are drawn with a few clean lines, and large flat-shaded eyes have fewer sub-details (tear ducts, iris texture, reflections) to mismatch. The trade-off is that flat-color styles render errors crisply when they do happen — a sixth finger in clean lineart is unmissable, where oil brushwork would have swallowed it.
A good default for tabletop portraits: painterly digital or oil style, tight framing, dramatic lighting. That combination is simultaneously the most forgiving of model errors and the closest to published fantasy art.
How do you stop armor from melting into cloth?
Melted armor usually traces back to a prompt that made the model invent structure it doesn't understand. Three habits fix most of it.
Name the layers in order. "Armor" is a texture request; "a steel breastplate worn over a padded gambeson, pauldrons buckled with leather straps" is a construction diagram. Explicit layering — what's under, what's over, what fastens it — gives the model an order to follow instead of blending everything into one metallic-cloth surface.
Pick one metal and one accent, not five materials. Every additional material multiplies transition edges, and transitions are where melting happens. Polished steel with leather strapping renders far cleaner than "ornate gold-and-silver filigree armor with gemstone inlays and mithril trim." Ornamentation is an error multiplier: engraving, filigree, and repeating rivets are exactly the kind of small repeated detail models scramble, the same way they scramble fingers.
Crop so the armor you show is armor the model knows. A breastplate and pauldrons in a bust crop is a well-photographed, well-painted subject with tons of training data. Full plate from head to sabatons in one frame asks for ten connected pieces to all articulate correctly.
Bust portrait of a weathered human knight, cropped at the chest, salt-and-pepper beard and a scarred brow, a polished steel gorget and breastplate worn over a padded gambeson, single leather strap across the right pauldron, lit by warm candlelight with deep shadows, textured oil painting style with visible brushwork, muted earthy palette
If you're composing prompts in the Arcane Portraits generator, this structure is built in — clothing, materials, and framing are separate fields, so the layered description comes out in a consistent order every time.
Which negative prompts still work per tool (SD 1.5 vs SDXL vs Flux vs Midjourney)?
Negative prompting is where most guides are years out of date. What works depends entirely on the model.
Stable Diffusion 1.5: the one place long negative lists still earn their keep. Base 1.5 has genuinely weak anatomy, so deformed hands, extra fingers, extra limbs, bad anatomy, watermark, text helps, and community negative embeddings trained for 1.5 (EasyNegative, bad-hands variants from Civitai) help more.
SDXL: anatomy is much stronger out of the box, and Civitai model authors widely report that long 1.5-era negative lists can actively degrade SDXL output. Keep it short and specific to what you're actually seeing:
deformed hands, extra fingers, watermark, text
Note that 1.5 negative embeddings don't carry over — embeddings are tied to the text encoder they were trained on, so EasyNegative does nothing useful in SDXL.
Flux: there is no negative prompt. Flux dev is guidance-distilled — it runs at CFG 1, and classifier-free guidance is what makes negative prompts work, so there's nothing for a negative to act on. Describe what you want positively ("hands resting on the sword's crossguard") instead of what you don't.
Midjourney: --no is the negative mechanism, equivalent to weighting a term at -0.5. Two caveats from Midjourney's own docs: each word after --no is read independently (--no modern clothing reads as "no modern" and "no clothing," which can trip moderation), and positive prompting usually beats exclusion.
DALL-E / ChatGPT: no negative prompt field at all, and writing "no extra fingers" injects the concept of fingers into the prompt. Phrase positively or crop the hands out.
When should you inpaint instead of rerolling?
Use a simple triage rule: if the composition, face, and lighting are right and one region is wrong, inpaint that region. If two or more structural things are wrong — pose broken, face generic, armor melted — reroll; a fresh seed is cheaper than three rounds of surgery.
In Midjourney, select the bad hand or eye with Vary (Region), or use the web editor's selection brush, and regenerate only that area. You can edit the prompt for the selected region — describing just the fix ("a gloved hand gripping the sword hilt") works better than resubmitting the whole portrait prompt for a hand-sized patch.
In Stable Diffusion, use img2img inpainting with "masked only" and denoising strength around 0.4: high enough to redraw the hand, low enough that the patch keeps the surrounding style and lighting. Better yet, automate it — the ADetailer extension detects faces and hands, masks them, and inpaints each at full resolution as part of every generation. It's the standard fix for the small-face problem in full-body renders.
Everywhere, generate in batches of four before fixing anything. Hand quality varies a lot seed to seed; picking the best of four and inpainting one flaw beats fighting the worst of one.
One warning: inpainting inherits your style. If the base image is painterly, inpaint with the same style wording in the prompt, or the patched hand comes back photoreal and sticks out worse than the original error.
How do you fix asymmetric eyes and faces?
Mismatched eyes — different sizes, different colors, one looking slightly elsewhere — come from the same pixel-budget problem as hands: the face was too small in frame for the model to coordinate two matching eyes. The fixes stack.
Give the face more pixels. Tighter framing is the primary fix, same as before. If you need the wider shot, run a face-detail pass: ADetailer with a face-detection model at denoising strength 0.3-0.35 redraws the face region at full resolution without changing its identity. Plain upscaling doesn't do this — it sharpens the asymmetry it was given.
Stop demanding perfect symmetry from the pose. A dead-on, centered, symmetrical pose is the hardest test you can set, because every left-right mismatch is directly comparable. A three-quarter portrait turns the head slightly, so small asymmetries read as perspective instead of anatomy. Real portrait painters use the same trick.
Let lighting absorb the difference. Strong one-sided lighting — a dramatic rim light or a single candlelit source — puts half the face in shadow, and a slightly-off eye in shadow reads as mood, not mistake. Flat, even lighting exposes everything.
Inpaint the one bad eye last. Mask just the off eye and regenerate at low denoising (0.3-0.4) so it matches its neighbor. For photorealistic styles specifically, face-restoration models like CodeFormer can rebuild both eyes at once — but skip them on painterly art, where they smooth away the brushwork.
Frequently asked questions
- Why does AI sometimes add whole extra limbs, not just fingers?
- Extra limbs usually come from rendering at a resolution or aspect ratio the model wasn't trained for. Stable Diffusion 1.5 was trained at 512x512; render it at 1024 or in a tall frame and it tiles the subject, duplicating arms or even whole torsos. Generate near the model's native resolution (1024 for SDXL) and upscale afterward instead of rendering huge in one pass.
- Does adding 'detailed hands' or 'five fingers' to a prompt actually help?
- Mostly no. Diffusion models can't count, so 'five fingers' doesn't enforce five fingers, and 'detailed hands' mainly tells the model to make hands more prominent — which can make errors bigger and more visible. Describing what the hands are doing ('both hands wrapped around a tankard') helps more, because a gripped pose has far fewer ways to go wrong than open fingers.
- Do negative embeddings like EasyNegative work in SDXL?
- No. Textual-inversion embeddings are trained against a specific text encoder, and SDXL uses different encoders than SD 1.5, so 1.5-era embeddings like EasyNegative have no useful effect there. If you want embedding-style negatives on SDXL, use ones trained specifically for SDXL, or just write a short targeted negative prompt — SDXL rarely needs more.
- Why do teeth look wrong in AI portraits?
- Teeth fail for the same reason fingers do: they're a row of small, repeated, similar elements, and the model reproduces the pattern without counting. Wide grins are the worst case. Prompting a closed-mouth expression, a slight smile, or a stern look avoids the problem entirely, and it usually suits fantasy character portraits better anyway.
- Does upscaling fix a bad face or bad hands?
- Not by itself. Upscalers add pixels, not structure — a wonky eye becomes a sharper wonky eye. What works is an upscale combined with a detail pass, like Stable Diffusion's hires fix with ADetailer, which detects the face, redraws it at full resolution, and blends it back. In Midjourney, upscale first and then use Vary Region on anything still wrong.
- Is it faster to fix small flaws in Photoshop than to inpaint?
- For tiny flaws, often yes. Cloning out a sixth fingertip, cleaning a stray strap end, or darkening a doubled line takes a minute with the clone stamp or healing brush and risks nothing else changing. Inpainting wins when the region needs actual redrawing — a whole hand, an eye — because a manual repaint requires drawing skill an inpainting model supplies for free.
- Why does the same prompt give perfect hands one time and a mess the next?
- Every generation starts from a different random seed, and hands sit right at the edge of what current models do reliably, so quality swings run to run. That's why batching works: generate four images, keep the best, and fix its one flaw. If a specific seed gives you great anatomy, reuse that seed when testing prompt changes so you're comparing wording, not luck.