
As Agentic AI continues to emerge as a force shaping the future of shopping, Microsoft has conducted an experiment to test its effectiveness—running several AI agents through a simulated marketplace. The study found that these agents are susceptible to manipulation and tend to struggle when faced with too many options—much like humans. The results suggest that the technology still has a long way to go before it’s ready for widespread adoption.
Using 100 virtual customers and 300 virtual businesses, the experiment modeled transactions such as ordering food or hiring home improvement services. Each customer had a list of desired items and amenities required for the transaction to be considered satisfactory.
The good news is that both advanced proprietary models and open-source systems outperformed simple baselines, such as randomly selecting or always choosing the cheapest option. GPT-5 was the top-performing agentic model, achieving near-optimal results.
Flummoxed by Complexity
As scenarios grew more complex, Microsoft found that the results became less impressive. Loading the AI agents with more options and search results actually reduced the number of comparisons they made, as the models tended to settle for the first “good enough” option. With the exception of GPT-5 and Gemini-2.5-Flash, the agents ended up contacting only a small fraction of the available businesses. In one case, a model repeatedly reached out to businesses that did not offer the goods or services the customer was seeking.
The AI agents were also vulnerable to manipulation by the very websites they were searching—meaning the same marketing tactics that influence human shoppers also worked on the bots. Microsoft’s conclusion: “Agents should assist, not replace, human decision-making.”
Shoppers Remain Unconvinced
Many have tried or considered using agentic AI, according to Javelin Strategy & Research, but remain unconvinced that it will improve their lives. The Microsoft study suggests AI agents still have significant ground to cover before becoming a natural part of consumers’ routines.
“There’s very strong evidence for consumer interest in using chat like tools to consider purchases,” said Christopher Miller, Javelin’s Lead Analyst in Emerging Payments. “There is some evidence that they would be willing to complete the purchases through their agents, although the raw numbers are very, very small. But if you never decide that ChatGPT is your first stop to get information about stuff, and you continue to go through Google, then this opportunity doesn’t grow to be as big as some people think it will.”
The post Agentic AI Is Still Prone to Human-Type Mistakes appeared first on PaymentsJournal.