More Isn't Always Better: Balancing Decision Accuracy and Conformity Pressures in Multi-AI Advice

Mar 23, 2026•Yuta Tsuchiya, Yukino Baba•View PDF

TL;DR Highlight

A 348-person experiment proves that 3-panel AI improves decision accuracy over a single AI, but 5-panel adds no benefit — and unanimous AI agreement triggers dangerous over-reliance.

Who Should Read

Product developers and UX engineers building advisory services that combine multiple chatbots or AI assistants. Especially teams designing AI decision-support systems in healthcare, legal, or finance.

Core Mechanics

AI panel of 3 improves accuracy over single AI (Income task: 0.706→0.737), but adding a 5th shows no further gain — "more is better" is wrong here
When all AIs unanimously agree (CON), users blindly follow — overreliance — with Switch Fraction spiking up to 0.88
Even a single dissenting AI in the panel meaningfully reduces conformity pressure and increases the rate of users maintaining their own judgment (RSR)
When a 5-panel splits 3:2 (DIV_3), it only creates confusion with no accuracy gain — evenly split AI disagreement is counterproductive
Humanizing AIs (photos, names, conversational tone) has no significant effect on average accuracy or reliance, though it does increase perceived usefulness on the Dating task
Used GPT-4o to generate SHAP-based natural language explanations attached to AI advice — reduces hallucination while improving interpretability

Evidence

Income task: AI_3 significantly outperforms AI_1 (0.706→0.737, p=.012); AI_5 shows no significant difference
Dating task: AI_3 outperforms AI_1 (median 0.64→0.68, p=.002); AI_5 is borderline (p=.064)
5-panel CON condition: Agreement Fraction 0.99, Switch Fraction 0.88 (near-unconditional AI following); DIV_3 condition: Switch Fraction drops to 0.30 with no accuracy improvement
3-panel CON vs DIV: RAIR (rate of following correct AI) is higher in CON (Income: 0.90 vs 0.46), RSR (rate of holding correct own answer against wrong AI) is higher in DIV (0.60 vs 0.21, p<.001)

How to Apply

When showing multiple AI outputs to users simultaneously, default to 3 and highlight minority opinions separately to encourage critical review
Add a reflection trigger in the UI when the AI panel is unanimous — e.g. "AIs agree, but please review this yourself" — to prevent blind over-reliance
Limit humanization elements (avatars, names, conversational tone) to specific tasks where emotional judgment matters. Average accuracy and reliance are largely unaffected, so avoid unnecessary implementation cost

Code Example

snippet

Related Resources

Original Abstract (Expand)

Just as people improve decision-making by consulting diverse human advisors, they can now also consult with multiple AI systems. Prior work on group decision-making shows that advice aggregation creates pressure to conform, leading to overreliance. However, the conditions under which multi-AI consultation improves or undermines human decision-making remain unclear. We conducted experiments with three tasks in which participants received advice from panels of AIs. We varied panel size, within-panel consensus, and the human-likeness of presentation. Accuracy improved for small panels relative to a single AI; larger panels yielded no gains. The level of within-panel consensus affected participants' reliance on AI advice: High consensus fostered overreliance; a single dissent reduced pressure to conform; wide disagreement created confusion and undermined appropriate reliance. Human-like presentations increased perceived usefulness and agency in certain tasks, without raising conformity pressure. These findings yield design implications for presenting multi-AI advice that preserve accuracy while mitigating conformity.