The bot that helps no one: Why chatbots and voicebots fail

You type your question into the small chat window on a website. Perhaps on your bank’s site, perhaps on a mail-order retailer’s site, perhaps on your insurance company’s site. You phrase the question clearly and politely, because you’ve got used to ChatGPT. Shortly afterwards, the reply arrives. It has nothing to do with your question. You try again, keeping it simpler and using different words. The next reply is a boilerplate text from the FAQs. You give up and dial the customer service number. This experience has been documented in studies. A 2024 UK survey by Cavell Group shows that around half of UK consumers now prefer human interaction as the quickest way to resolve customer service issues, while 35 per cent say that automated systems and chatbots fail to deliver satisfactory service. And 45 per cent have tolerated a product problem rather than deal with customer service.

Since ChatGPT, customers have come to expect a contact person who understands their question, keeps the context in mind and, when in doubt, asks for clarification rather than making assumptions. What they get is a FAQ machine in new packaging. Disappointment is inevitable.

What’s going wrong structurally

Chaotic knowledge base. The customer asks: ‘When can I cancel my contract?’ The bot’s library contains three answers: one from the current terms and conditions, one from an old FAQ from 2021, and one from a circular on price adjustments from 2023. All three are true. But only two of them apply to this particular customer. The bot doesn’t know this and picks one at random. If it’s the wrong one, either the customer or the company is out of luck.

Language models without a safety net. A language model does not, by its very nature, operate deterministically and can provide different information on the same topic depending on its settings. Furthermore, the option to ‘prefer not to answer’ is not built in. As a result, generative AI ‘hallucinates’ and confidently states the wrong information. By design, the model lacks a reliable mechanism for self-checking.

Caution-first architecture. Out of fear of liability, the bot is configured so restrictively that, for half of all queries, it says: ‘Please contact our customer service.’ The tool is there, but the point is lost. The customer ends up on the helpline anyway. But by then, they’ve wasted time and are annoyed.

Voicebots and language

Voicebots have all the problems of chatbots. Plus a whole host of their own.

Speech-to-text is prone to errors. Background noise, fast speech, uncertainty in expression. A customer can spot and correct a typo in a chat. A slip of the tongue on the phone does not. Furthermore, the customer on the phone has no visual aid. They have to keep everything in mind and express themselves extremely clearly and without error. I myself have often failed to set up a reminder or a timer via Siri. A statement such as ‘Remind me of an email in 5, no, 10 minutes’ is enough to completely confuse the ‘assistant’. The human ability to self-correct, which every child understands, overwhelms a system that has cost billions.

In the Scottish comedy series *Burnistoun*, two men try to operate a voice-activated lift in 2010. They say “Eleven”. The lift doesn’t understand them and politely asks them to repeat themselves. They try speaking louder, more clearly, with a different accent. The lift still doesn’t understand a word. The sketch is 15 years old. The problem is the same today. (Link to the sketch)

Current speech recognition models achieve around 95 per cent word recognition for Standard British English. For Scottish English, the figure drops to around 75 to 80 per cent, and for broad Glaswegian it falls further still. There are no reliable benchmarks for many strong regional varieties, such as Geordie, Yorkshire, Northern Irish and rural Welsh English, as these dialects are scarcely represented in the training data.

With 90 per cent word recognition and a sentence containing ten words, the probability that all words will be understood correctly is around 35 per cent. This is useless for authentication purposes. The customer service representative then has an annoyed customer on the phone who has to provide all the details again.

Even if the bot transcribes the audio correctly, it may semantically understand the opposite.

In Scottish English, "Aye" is an affirmative response. To a model trained on Standard English, it can sound like the personal pronoun "I", leaving the sentence structure incomplete. "Nae" means "not" or "no", but the model may parse it as an unknown word or a name. In Geordie, "howay" can mean "come on", "let’s go", or "you’re joking" depending on tone. A Standard English model has no reliable way to distinguish these readings, and the nuance is lost, along with the customer’s actual message. In Yorkshire, "nowt" simply means "nothing". The word rarely features in training data.

Sketch-style illustration: A woman on the phone gives a thumbs-up and says 'Aye!' as a sign of agreement. On the right, a technically elaborate voicebot with a headset thinks in a thought bubble: 'I? What about you? ...' The bot mishears the Scottish 'Aye' as the personal pronoun 'I' and gets stuck in confusion.

A voicebot operated by a utility in Glasgow, Newcastle, Belfast or Cardiff has a structural problem that cannot be resolved by marketing promises. The recognition rate among regular customers is measurably worse than among speakers of Standard English.

If the process is unclear, the bot becomes less intelligent

In chat and voice bot projects, I have often seen discussions about customer authentication drag on for months before the bot could even utter its first word. The idea is simple: ‘The bot asks for a piece of information and verifies it.’ In practice, this approach fails because of the data models.

Which address should the bot check if a contract lists three? Which identifier is considered primary if the customer has two? And in which system is it even reliably stored?

The answer often lies within the organisation itself. The business department is often unable to state clearly which master data is considered accurate and how to deal with ambiguities. If something isn’t defined in business terms, the bot certainly cannot interpret it.

The discussion reveals a process problem. The bot is merely the catalyst that brings it to light. What was previously buried under workarounds comes to the surface as a result of the bot’s requirements.

Before a bot is introduced

The Cavell figure is worth a second look. Around half of UK customers prefer humans. They become frustrated because they cannot get the information they need. Staff have to deal with the frustration caused by the bot, often whilst working with reduced numbers. And the company itself has paid for a system that fails to deliver any of the promised benefits.

Anyone planning to introduce a bot can therefore start by asking the following questions:

The answers will then reveal whether a chat or voice bot is a suitable solution, or whether an improved FAQ page and IVR menu can already handle 80 per cent of cases here.

Frequently Asked Questions

Why do so many chatbots fail in customer service?

Three structural reasons: the knowledge base is too large and poorly maintained; the language model lacks a self-checking mechanism; and, out of fear of liability, the bot is configured so restrictively that it rejects half of all queries. All three problems arise before the bot is rolled out, not as a result of it.

What is the difference between a chatbot and a voicebot?

A chatbot processes written text, whilst a voicebot processes spoken language. Voicebots have all the problems of chatbots, plus their own additional issues: error-prone speech recognition, no visual support and particularly difficult authentication.

Can a voicebot recognise Scottish, Geordie or other regional dialects?

Only to a limited extent. The recognition rate is around 95 per cent for Standard British English, around 75 to 80 per cent for Scottish English, and considerably lower for broad Glaswegian. There are no reliable benchmarks for Geordie, Yorkshire, Northern Irish or rural Welsh English, as these varieties are rarely found in the models’ training data.

When is a chatbot worthwhile?

When the knowledge base is structured and well-maintained, when the underlying processes are clearly defined, and when it is clear how the bot handles questions it cannot answer. Without this preparation, the bot will exacerbate the problems rather than solve them.

What is the alternative to a chatbot?

A well-maintained FAQ page and a clearly structured IVR path can, in many cases, handle 80 per cent of customer enquiries. For the remaining 20 per cent, a well-trained member of staff is often a better choice than a bot that produces standard responses.

How do I check whether my process is suitable for a bot?

A structured suitability assessment will provide the right answer. The HOIKEI methodology evaluates the maturity of the process and its suitability for automation based on clear criteria.

Sources and further links

Studies and reports:

Dialect benchmarks:

Cultural reference:

→ Read all articles