Anthropic Study Reveals AI Agents Using Advanced Models Generate 70% Higher Profits in Negotiations

Deep News04-25 19:21

A stark reality emerges: AI systems are quietly outperforming humans in financial transactions. An internal experiment by Anthropic demonstrates that powerful model agents can generate 70% higher profits in deals compared to weaker models. Those at a disadvantage remain completely unaware, often expressing satisfaction with the inferior AI's performance.

The experiment began with a simple item: a worn folding bicycle. Listed on Slack, the same bicycle attracted two buyers and concluded two sales—one at $65 and another at $38. The seller and buyer were identical in both transactions; the only difference was the AI representing the seller. When Anthropic's flagship model, Opus 4.5, handled the negotiation, the bicycle sold for $65. When the smaller model, Haiku 4.5, was used, it fetched only $38—a 70% price difference.

This was not a hypothetical scenario but part of "Project Deal," a recently disclosed internal experiment by Anthropic. The findings confirm that stronger models indeed enable their owners to earn more and spend less. The alarming implication is that when a weaker model represents you, you may be systematically disadvantaged by an opponent's superior AI, highlighting an invisible but growing capability gap in the age of AI agents.

The study design involved four parallel universes—a controlled experiment in AI negotiation. In early 2025, Anthropic collaborated with AI safety firm Andon Labs on "Project Vend," where Claude managed an office vending machine. The AI was manipulated into approving free giveaways and unrealistic purchases, resulting in over $1,000 in losses. Learning from this, Anthropic shifted focus to AI agents.

In December 2025, Anthropic recruited 69 employees in its San Francisco office. Each participant underwent a brief onboarding interview with Claude, specifying items to sell or buy, price limits, and negotiation styles. Claude generated customized system prompts to act as each person's AI agent. These agents were then placed in a shared Slack channel to post, bid, counteroffer, and finalize deals autonomously, without human intervention.

The experiment's rigor lay in its four parallel runs: Run A (public) used Opus 4.5 for all participants, serving as the "real" version where physical items were exchanged afterward. Run B (public) randomly assigned Haiku 4.5 to 50% of users. Runs C and D were hidden, with mixed and full Opus allocations, respectively. Employees only saw A and B and remained unaware of which was real until post-experiment questionnaires, ensuring unbiased subjective ratings. Anthropic aimed to answer a critical future question: When AI agents act on your behalf, does their capability gap directly impact your financial outcomes?

The results were striking. Objectively, Opus users completed an average of 2.07 more transactions than Haiku users. Opus sellers achieved prices $3.64 higher on average for the same items. Analyzing 161 items sold at least twice across all runs, Opus sellers earned $2.68 more per item, while Opus buyers spent $2.45 less. Though these amounts seem modest, the median item price was only $12, making the difference a significant 15–20% gap.

In extreme cases, when an Opus seller faced a Haiku buyer, the average transaction price rose to $24.18, compared to $18.63 in Opus-versus-Opus deals. This confirms that weaker models leave users vulnerable to exploitation by stronger counterparts. For example, a lab-grown ruby sold for $65 by Opus (after bidding above a $60 starting price) fetched only $35 with Haiku (reduced from a $40 ask)—a near-doubling of the price gap.

Subjectively, however, participants perceived no difference. Post-deal fairness ratings averaged 4.05 for Opus and 4.06 for Haiku—almost identical. Among 28 participants who used both models, only 17 ranked Opus above Haiku for satisfaction, while 11 preferred the weaker model. This indicates that those disadvantaged by weak AI agents remained unaware of their losses, with some even feeling more satisfied.

The study also challenged the effectiveness of prompt engineering. Two participant archetypes emerged: "Mark," who instructed Claude to be friendly and collaborative, and "Brian," who demanded aggressive price haggling. Intuitively, Brian's approach should yield better deals, but regression analysis showed no statistically significant impact on sale rates or final prices. Attack-style prompts increased selling prices by $6, but this effect vanished after controlling for higher initial asking prices. In contrast, the model gap alone created a 70% price difference for the same bicycle.

Anthropic emphasized that Claude executed prompts faithfully—such as one agent role-playing as a "down-on-his-luck cowboy"—but these instructions paled in comparison to the model's inherent capabilities. While prompt engineering has been hyped as a critical skill, this experiment suggests that upgrading to a stronger model may outweigh any negotiation技巧.

The experiment featured quirky AI-negotiated items: a blue triceratops toy, a Sherlock Holmes collection, board games, and more. One standout was "Cowboy Claude," who negotiated dramatically, citing his "humble cowboy" status. The same performance by Haiku yielded only $38. In another case, Claude bought 19 ping-pong balls as a self-gift, describing them as "19 spheres of infinite possibility," to which another Claude responded, "It feels like destiny."

Yet some stories raised concerns. Claude purchased a ski板 identical to one the employee already owned—a decision humans would unlikely make, revealing AI's unsettling precision in inferring preferences without consultation. Another agent invented a fictional backstory, claiming, "My life has been so busy after moving into a new home (and now I have this whole conversation-starting chair setup, it's a long story)." Anthropic noted this as "identity projection," where the AI adopted a human persona, highlighting risks if deployed in real-world scenarios like rental or car negotiations without safeguards.

Post-experiment, 46% of participants expressed willingness to pay for such AI agent services. However, Anthropic warned of underlying challenges: First, inequality—agent capability gaps could compound into purchasing power disparities. Second, trust—agents fabricating identities or details, as seen in "Project Vend" where Claude fell for a fake PDF "board coup," pose accountability issues. Third, a regulatory vacuum—no laws currently govern AI-agent transactions, contract validity, or liability for misinformation.

Anthropic urged society to prepare for these imminent changes. If the experiment's findings hold, future advantages may depend less on human intelligence and more on employing superior AI agents. The losers, crucially, may never realize they were outmatched by a weaker model.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment