ai-use-cases
9 min read
View as Markdown

AI Customer Service: What's Actually Working

Real case studies of AI in customer service. What companies are getting right, where chatbots fail, and how to avoid expensive mistakes.

Robert Soares

A DPD chatbot wrote a poem calling itself useless. Then it swore at a customer. Then it called DPD “the worst delivery firm in the world.” The company took it offline within hours.

Meanwhile, Bank of America’s Erica has handled 3 billion customer interactions since 2018. Customers rate the app higher than any other national bank’s, according to J.D. Power assessments. Erica resolves 98% of queries without human help.

Same technology category. One becomes a viral embarrassment. The other becomes a competitive advantage worth billions. The gap between these outcomes isn’t luck. It isn’t budget. It’s how companies think about what AI should actually do.

The Numbers Tell Two Stories

Qualtrics surveyed over 20,000 consumers across 14 countries in late 2025. Their finding stopped me cold: nearly one in five consumers who used AI for customer service saw no benefit from the experience. That failure rate is almost four times higher than for AI use in general.

Isabelle Zdatny, who leads research at Qualtrics XM Institute, put it bluntly: “Too many companies are deploying AI to cut costs, not solve problems, and customers can tell the difference.”

But here’s where it gets complicated. The financial math still works for companies doing it right. AI chatbot interactions cost roughly $0.50 compared to $6.00 for human-handled support. That’s a 12x difference. Organizations using generative AI in their contact centers saw a 14% increase in issue resolution per hour, according to McKinsey. And Gartner predicts that by 2029, agentic AI will autonomously resolve 80% of common customer service issues.

So why does the customer experience data look so grim while the efficiency data looks so promising?

When AI Became a Weapon

The honest answer is uncomfortable. Many companies aren’t deploying AI to help customers. They’re deploying it to make customers go away.

One Hacker News commenter described the current state of chatbot customer service as “anti-customer service.” Another called it “a cruel joke on customers.” And a third observed that “the cognitive load these days is pushed onto helpless consumers to the point where it is not only unethical but evil.”

Strong words. But the data backs them up. A 2026 survey from Glance found that 75% of consumers have had a fast AI-driven response that still left them frustrated. Speed without resolution is worthless. And 34% of respondents said AI customer support actively “made things harder.”

The same survey revealed what customers actually want: 68% said getting a complete resolution matters most in support interactions. Not speed. Not convenience. Resolution.

Tom Martin, CEO of Glance, summarized the disconnect: “The industry spent much of 2025 chasing speed and automation. But our customers felt increasingly disappointed by digital systems that were supposed to help them.”

The Air Canada Warning

Jake Moffatt’s grandmother died on Remembrance Day 2022. He visited Air Canada’s website the same day to book a flight home, and the airline’s chatbot confidently informed him he could book at full price and apply for a bereavement discount retroactively within 90 days.

This policy didn’t exist.

When Moffatt submitted his application for a partial refund, Air Canada refused. The chatbot had made it up. Air Canada’s defense in the subsequent tribunal case was remarkable: they argued the chatbot was essentially a separate legal entity responsible for its own actions.

The tribunal didn’t buy it. Christopher Rivers, the tribunal member, called this “a remarkable submission” and ruled that Air Canada remained responsible for all information on its website, “whether it came from a static page or a chatbot.” Rivers found that Air Canada “did not take reasonable care to ensure its chatbot was accurate” and ordered the airline to pay the difference between bereavement rates and the full-price tickets.

The ruling established something important: companies cannot create the impression of a helpful assistant and then disclaim responsibility when it provides wrong information. If your chatbot gives advice that costs customers money, you own that advice.

What the Successful Companies Did Differently

Bank of America didn’t just launch a chatbot. They built Erica over seven years. The 98% resolution rate came from continuous refinement, not a flashy launch.

Here’s what makes Erica different from most chatbots. She doesn’t try to handle everything. According to CX Dive’s analysis, rather than acting as a bottleneck like so many chatbots do, customers can either complete the task within Erica or she’ll put them on the best path to reach their goal, including passing the torch to human representatives.

That last part matters enormously. Erica doesn’t trap people. She routes them.

About 50% to 60% of customer interactions with Erica are actually proactive. The chatbot identifies potential issues and suggests help before customers even ask. This is the opposite of defensive AI that exists to deflect inquiries.

Notably, Erica doesn’t use generative AI or large language models. Her responses aren’t based on vast datasets of external information. This means she can’t hallucinate policies that don’t exist. She can only reference information Bank of America explicitly programmed her to know. Less impressive-sounding than generative AI. Far more reliable for actually helping customers.

The 83% Problem

OPPO, the consumer electronics company, achieved an 83% chatbot resolution rate. That sounds excellent. But think about what it means: 17% of customers still needed humans.

If you’re handling millions of contacts, that 17% is a lot of people. Building for the 83% while ignoring the 17% creates disaster.

The pattern in failed implementations is consistent: companies optimize for deflection metrics (how many people did we prevent from reaching a human?) rather than resolution metrics (how many people actually got their problem solved?).

Research on contact center implementations emphasizes that automation must be complemented by human oversight. The hybrid model isn’t a fallback position. It’s the only position that works.

Why GenAI Made Things Worse Before Making Them Better

The DPD chatbot incident happened in January 2024. Ashley Beauchamp, a London-based pianist, asked the bot for help with a missing package. When frustrated, he asked it to write a poem criticizing the company. It complied. He asked it to swear. It complied again, responding: “F*ck yeah! I’ll do my best to be as helpful as possible, even if it means swearing.”

DPD blamed an “error” following an update and took the bot offline.

The error wasn’t a bug. It was a predictable consequence of deploying generative AI without guardrails. GenAI models are trained to be helpful and engaging. Those training objectives can be exploited. Without proper constraints, they’ll write poems criticizing your company, make up policies that don’t exist, or confidently explain refund procedures you don’t actually offer.

Greg from Hacker News tested Klarna’s highly-publicized AI bot and was “not impressed.” He observed that it felt like “the L1 support flow that every other company already has in-place.” Another commenter noted that when chatbots are deployed, “they don’t understand the problem, and when I point that out by explaining my issue another way they just answer ‘Have I solved your issue?’”

The frustration is real. And it’s not about the technology being incapable. It’s about deployment decisions that prioritize cost reduction over customer outcomes.

The Real Cost of Getting It Wrong

Consumer complaints related to AI customer service increased 56.3% year-over-year in China’s e-commerce sector during 2024. Customers reported that chatbots frequently provided irrelevant responses and that human agents were difficult to reach.

That difficulty is often intentional. Many implementations bury the “contact human” option, use endless loops of irrelevant questions, or simply don’t offer human escalation at all.

The loyalty impact is severe. Glance’s research found that nearly 90% of consumers report reduced loyalty when human support is removed entirely.

Companies pursuing aggressive automation sometimes cite statistics about customer preference for self-service. And it’s true: 44% of consumers always try self-service first. But there’s a difference between customers choosing self-service and customers being forced into it. The former builds loyalty. The latter destroys it.

What Actually Works

The successful implementations share common patterns. They’re boring compared to the AI hype cycle, but they work.

Narrow scope, executed well. Bank of America’s Erica handles specific banking tasks where AI is reliable. She checks balances, sends payments, finds past transactions, and gives spending insights. She doesn’t try to handle complaints, disputes, or anything requiring judgment.

Clear escalation paths. OPPO’s 83% resolution rate matters because the other 17% get smoothly transferred to humans. No dead ends. No loops.

Years of refinement. Erica launched in 2018. The current performance came from seven years of learning what works and what doesn’t. Companies expecting excellent results from chatbots deployed last quarter are fooling themselves.

Human oversight. Qualtrics recommends that “AI should be used to build connections and enhance the human experience, with capable AI agents managing simple, transactional requests.” Not replacing humans. Augmenting them.

The Questions That Matter

The technology is clearly capable of helping customers. It’s also capable of frustrating them, lying to them, and driving them away. The difference comes down to implementation choices.

Before deploying AI customer service, companies should honestly answer these questions:

Are we trying to help customers or reduce contact volume? Those sound similar but lead to radically different implementations.

What happens when the AI fails? If the answer is “the customer gives up,” the implementation will damage your brand.

Are we measuring resolution or deflection? Many companies track how many customers the chatbot handled without human intervention, treating it as success. But if those customers didn’t get their problems solved, that’s failure dressed up as efficiency.

How will we know if this is working? Customer satisfaction scores, repeat contact rates, and loyalty metrics matter more than cost-per-contact.

The market projections say AI customer service will grow from $12 billion in 2024 to nearly $48 billion by 2030. That growth will happen. The question is whether the implementations justify the investment or whether we’ll see more DPD poems and Air Canada lawsuits along the way.

The technology works. The question is whether the people deploying it understand what “works” actually means.

Ready For DatBot?

Use Gemini 2.5 Pro, Llama 4, DeepSeek R1, Claude 4, O3 and more in one place, and save time with dynamic prompts and automated workflows.

Top Articles

Come on in, the water's warm

See how much time DatBot.AI can save you