The other day, an engineer friend of mine asked me a question that has stayed with me ever since. He asked, quite seriously, whether AI could now do everything he could. An experienced professional who has been designing structural systems for decades suddenly feels unsettled and fears that his work will soon become redundant.
We see every day how AI can speak with convincing fluency and answer complex questions with ease. Recently, we’ve been hearing more and more about AI that appears to act independently: placing orders, processing transactions, and operating systems. This is known as agentic AI. It fuels fears of soon becoming redundant. That is precisely why it is worth taking a sober look at the matter. What can these agents actually do, where do they help, and where do they do more harm than good?
This guide provides a clear overview of the subject, without hype or doom-mongering. It is aimed at anyone involved in making or supporting decisions on AI, regardless of their sector. Those in the energy sector looking for the basics will find them in the introductory guide AI in the Energy Sector.
What is Agentic AI, and how does it differ from ChatGPT?
Generative AI, such as ChatGPT, generates something – a text, an image, a summary. It responds and then stops. Agentic AI goes one step further. An agent pursues a goal, makes decisions along the way and takes action. It doesn’t just read an email; it replies to it, creates a task in an ERP system and triggers the next action.
The difference is enormous. Generative AI that writes a misleading sentence simply produces exactly that. A human reads it and decides how to proceed. An agent that makes a decision acts accordingly. And without hesitation. The email isn’t left unanswered for long; it’s replied to immediately; an invoice is approved. The pitfall? A mistake in the text directly results in real-world damage. That is precisely what makes autonomous AI so appealing and, at the same time, so tricky.
Why does this technology seem so powerful?
My friend’s question is entirely understandable. At first glance, AI seems like magic. A language model exhibits behaviour that we previously associated only with thinking humans: fluent, meaningful speech. Our brains automatically conclude that there must be a mind behind it, because that’s how it’s always been. With AI, for the first time, this conclusion is incorrect. It generates language without understanding.
This gap gives rise to unrealistic expectations, and these expectations lead to projects that are bound to fail. I explain where this impression comes from and how understanding can dispel the spell in AI Without the Hype: Why AI seems like magic and how understanding demystifies it.
A dice or a clockwork mechanism: why AI doesn’t always do the same thing
A traditional IT system is like clockwork. The same input, the same output, every time. This is known as determinism, and most of our business processes and the software we use rely on this reliability. AI based on large language models works differently. It ‘rolls the dice’. It can give different answers to the same question because it doesn’t calculate what is correct, but rather estimates what is likely to be right.
This isn’t a flaw; it’s simply the nature of this technology, and for many applications it’s actually an advantage. But it does mean that AI cannot be used everywhere. Especially not where we have traditionally relied on clockwork. Where a task has a single correct answer, the dice are the wrong tool. The article Dice or Clockwork: Why AI Never Says the Same Thing Twice explores what determinism and stochastics mean in practice.
But don't people make mistakes too?
This is the most common objection raised when someone points out that AI is prone to errors. It sounds fair and has a grain of truth to it. Nevertheless, it misses the point, because it only addresses the frequency of errors and not their nature.Ten case workers make ten different, unrelated clerical errors. By the end of the working day, you have ten flawed cases. AI is not deterministic. It can make the same systematic error in all processes simultaneously, or ‘hallucinate’ in a different place every other time it runs. There is no stable, predictable pattern of errors that one could work through. The same average error rate is scaled up from a tolerable single error to systemic damage. Furthermore, a human usually realises when they are reaching their limits and will ask for clarification or escalate the matter. It is precisely this intuition that AI lacks. I discuss this topic in People make mistakes too: why this statement is misleading.
Why do we find it so difficult to put the error rate into perspective?
A 95 per cent accuracy rate sounds impressive. Put into absolute terms, however, that means fifty errors every day out of a thousand transactions. The same statement, yet two completely different perceptions. Instead of feeling reassured, you suddenly find yourself facing fifty angry customers.
It is well established that we struggle with figures of this kind, and this applies even to highly educated experts in their own fields. Anyone wishing to evaluate AI offerings should be aware of this blind spot, both in others and in themselves.
Can AI agents compensate for missing features?
One tempting idea is that an AI agent could bridge the gap where guided processes are incomplete or interfaces are missing by interacting with the user interface just as a human would. No need for complex interface development; the AI understands what needs to be done and clicks its way through. Independent research paints a more nuanced picture. Individual operational steps are handled reasonably well, but multi-stage processes spanning a range of systems are precisely the sticking point that manufacturers are addressing and working on.
Success rates are rising rapidly. But just because an agent can complete the task under laboratory conditions does not mean it can be relied upon in a production environment without safeguards. For a business-critical process, ‘almost right’ is simply wrong.
Won’t all this sort itself out in the end?
The strongest objection to any critical assessment is that the technology is constantly improving. This is both true and false. The models are becoming more capable, but the reliability gap is not down to a lack of maturity. It is structurally linked to the very characteristics that constitute the technology’s strength, and some of these are, in fact, fundamentally insurmountable.
This is an important insight for any investment decision. It is risky to bet that the next generation of models will simply overcome today’s limitations.
How can I tell if a provider is promising more than they can deliver?
Nowadays, many things are marketed as ‘agent-based’ when, at their core, they don’t require an agent and aren’t actually agent-based. This phenomenon has been given a name: ‘agent washing’. The good news is that you can spot it in a matter of minutes during a meeting with a supplier if you ask the right questions.
When asked whether the agent can handle a task, most providers reply with a confident "yes". And often that’s actually true.
The better question is: what happens if he gets it wrong, and how does he realise that he needs help? A reputable system provider will openly discuss limitations, provide clear figures, and specify the use cases in which their tool is unsuitable.
Why people must be taken into account in this equation
AI is only half the picture. The other half is the person working with it. And that person is more prone to blind trust than they realise. Research from recent decades shows that people often trust automated systems more than their own senses, other people, or the facts right in front of them. This phenomenon is known as automation bias and is well documented. In practice, this means that anyone introducing AI into a company without taking this effect into account is introducing a risk that isn’t listed in any data sheet specification. The machine itself may have a clear error rate. What it does when interacting with humans is a completely different matter.
This is precisely the subject of the in-depth article Death by GPS and automation bias: why we blindly trust AI, which brings together three studies spanning three decades and derives concrete recommendations for everyday business practice.
How do I decide whether a process is suitable for AI?
Ultimately, it all boils down to a single key question that should be asked before any meeting with a supplier. Not what the AI can do, but what your own process requires. Three criteria can help you assess this.
Does the task allow for any degree of variation, or does it require a single, correct answer? How much damage would be caused if an error occurred? And is there a person who checks the result before it takes effect? Where a process does not require absolute precision, the resulting damage is minimal, and a check is carried out, AI is a useful tool. Otherwise, a deterministic system is the better choice.
A personal experiment currently underway
Writing about autonomous AI is one thing. Building it yourself, running it and seeing for yourself, through your own data, where it works and where it doesn’t, is quite another. That is exactly what I am doing right now. I am building myself a little research agent that regularly scans the web for new publications on my topics and sends me a short digest.
The lesson behind this entire series is evident right from the design stage. What works reliably and safely is deliberately not a free-acting agent, but a tightly controlled process with a clear beginning and end. It searches and makes suggestions; I make the decisions. I write down what I learn from this once the agent has been running for a while. The report will then be published as a separate post in this series.
Where do you start?
We can now answer the engineer’s question from the beginning. No, AI cannot do everything he can. It can do some things faster, some things not at all, and the crucial thing - namely, taking responsibility for whether a structure will hold – it cannot do at all. What it can do is serve as a tool, provided you know what it is for.
That is precisely the crux of the matter. The real issue is the decision itself. What problem do we want to solve, and is autonomous AI the right way to go about it? Whoever asks this question first turns the balance of power on its head. It is not the provider who defines what is possible, but the problem that defines what is necessary.