Main content

Getting real about agentic AI projects - what's working, and what's not?

Jon Reed Profile picture for user jreed March 24, 2026
Summary:
The AI project lessons are boiling over - but so is the agentic AI keynote hyperbole. How will customers find their way? Grab your beverage of choice, and have a look at my spicy review of what's working on agentic AI projects, and what's not. I guarantee some surprises.

reality check

In my tarmac adventures, I always find myself knee deep in enterprise AI. Too often, I see a wide gap between vendors' aspirational AI statements and customer realities. (We see that gap in our diginomica network findings as well: Why 93% of enterprises use AI but most aren't seeing the ROI they were promised). 

But then there are those priceless conversations, where AI practitioners talk candidly about what's working, and what's not. 

In well-designed settings, AI can deliver new things: new forms of automation, new business models, new ways of interacting with customers, new ways of decisioning, drawing on hybrids of structured and unstructured information. 

Three things enterprise AI is getting right

Enterprise AI is shifting from the Big AI frontier model mentality. Despite the relentless agentic AI keynote carnival, there are three things enterprise AI is getting right. 

Context at the time of inference - getting AI systems the best information at any moment in time, for that particular company and user, in a well-governed way. (This is a work in progress, of course; we don't even agree on what that real-time data layer should look like, and so-called context graphs are setting off buzzword alerts right and left). 

Constraining LLMs in a "compound systems" architecture, combined with other forms of machine learning, along with deterministic systems, and external tool calls to verifiers, rules-based automation, or sources of database truth. 

A promising distinction between off-the-shelf frontier models and domain-specific/smaller models, informed by relevant data.

Smaller fit-for-purpose models change the AI cost discussion. Why? The cost of inference with large-scale frontier models - and all their "reasoning" bells and whistles - isn't coming down anytime soon. 

As Constellation's Esteban Kolksy wrote, we're on the verge on notable shift. Instead of just pounding AI as a blunt instrument in our 'office productivity' suites, savvy companies are operationalizing it: 

Intelligence is becoming infrastructure. For decades, enterprise software executed predefined rules and workflows. AI introduces reasoning capability into the system itself. Instead of simply processing transactions, applications can interpret context, generate outputs, and assist decision-making... When a capability becomes pervasive across systems, it becomes infrastructure.

AI market noise obscures clear thinking

So if we're getting three foundational things right, why is so much going wrong? 

Too often, executives join this conversation with unrealistic/untested ideas about building enterprise AI at scale. What Claude Code can do on your couch is not setting up a good expectation for enterprise-grade workflows. Caution ahead: "AI Layoffs" that turn business mistakes into visionary stances cause stock market bumps do not validate AI's current abilities, which must be put to the industry test. 

Headcount reductions are not reducible to one variable, but most of them point to freeing up capital for AI investments, rather than AI capabilities themselves. But there are exceptions, and we should learn from them. In a monster survey of 4,000 executives, PwC found that only 10 to 12 percent of AI projects are delivering cost or revenue benefits. But I was equally drawn to the 10 percent that are working. Understanding where success does lie is the more urgent task.

Too many of these surveys draw on results from off-the-shelf LLMs, or brute force productivity tools like Microsoft Copilot that are, surprise surprise, coming up short. The AI and productivity conversation is also complex (see this useful AI productivity stat roundup). 

What is an executive to make of so many conflicting headlines? AI is supposedly transforming industries, but AWS is knocking systems offline with vibe coding. We are in dire need of jugular context that weighs the upside and the downside. 

We are in dire need of candid/sober AI dialogue

Every now and then, I find the right conversation partner for a candid AI dialogue. For the last nine months, one of my best partners has been Andreas Welsch, an enterprise AI practitioner with multiple books out on this topic. I tend to be a bit incendiary in my takes; comparing my research with Welsch's field views gets us to a more balanced place. 

In his December podcast, we hashed out what's working (and what's not) in agentic AI (What Enterprise AI Actually Wins At). Some of my takes might ruffle feathers, but avoiding what's not working seems like a good idea, no? 

Agentic AI - what's not working? Start with "AI First"

1. "AI First" isn't working. If you have to impose your AI tools on your workforce, something is wrong with your tools. During my DisrupTV appearance on the paradox of AI leadership, I questioned the link between "AI First" and "outcome thinking." 

Imposing tools on your people is not outcome thinking, right? Because outcome thinking means however we get the outcome, whatever that outcome is: better serving our customers, building better products... When you talk about 'outcome thinking,' you're actually challenging yourself to change how you relate to your employees - and to not force AI down their throats

Who, at this point in time, would not use a tool that helped them do their jobs better?  Almost no one. So if they don't like your AI tools, it means your AI tools suck. 

Okay - gasket blown. So what's the alternative? A culture of experimentation: 

As a leader, what I want to see is: are you creating a culture of experimentation and innovation, instead of imposing AI use mandates on your people?

Are you creating safe sandboxes for them to play with these tools, to play around with real data in a safe environment, not not the kind of OpenClaw stuff that got exposed this week, for basically giving someone complete access to all of your authorizations, but a secure environment where you can actually experiment and come to your team and say, "Hey, look what I built.'

Bonus: secure enterprise sandboxes also reduce that Shadow AI risk.  What else isn't working? 

2. Agentic for the sake of agentic. - Who would advocate using a jackhammer for a project when an old school, acoustic hammer would work just fine? Agentic AI has a particular set of pros and cons - and it's far from the cheapest tool in your toolbox. I recently heard about a lead gen situation where agentic AI was mistakenly being used to send a survey to each customer after an intake - and not doing it reliably. Meanwhile, your rules-based survey-send worked just fine. Or: rethink your entire survey process, and invite a user to interact with an agentic bot to talk through their experiences, and submit them. Then use AI to prioritize and parse the long-form survey results. AI shines in the process rethink.

Why are we hyping agent-to-agent science experiments when customers need wins now? 

3. Multi-agent protocols. Vendors talked way too magically about agent-to-agent protocols last fall. I blew another small gasket with Welsch on this one: 

 Almost anything 'multi' isn't working right now. Not at scale - so stop it. Yes, standards are important, but vendors talked too much about the A2A protocol this fall. Putting agents in the same room and hoping they will understand and talk with each other is not working. 

MCP, on the other hand, I'm much more bullish on, as long as MCP security issues are accounted for (MCP orchestrates access to multiple data sources/enterprise systems for AI agents). There are two notable exceptions to my multi-agent rant. Exception one: a specialized workflow sharing the same data context, e.g. a series of supplier management agents. As I said to Welsch: 

Within a specialized workflow, you could have an orchestration agent and then a handful of specialized agents that are very task-specific, and that type of orchestration does generally seem to be working pretty well. But again, this is a very focused set of parent and children agents, if you will. And why is it working? Because they share the same context on the same data platform, and that's why it works... Customers need wins right now, not science experiments. 

Another potential exception? Decision intelligence/support, via a harmonized data layer: 

In a decisioning context, it's a little easier, because you can take different data from different vendors and harmonize it onto one data platform, and then you can have your agents feeding off of that data... What's more difficult right now is executing transactions end-to-end from different vendors. I've talked with ISV partners who are doing this now. [Without an end-to-end data/process platform], it's much more like classic integration projects at the moment. That will change over time, and the standards are important. But I would say vendors over-emphasized standards this fall, versus how to get started.

What else isn't working? Layering agents onto bad processes, crappy data and custom code

I guess it's obvious by now, but you can't do this and expect a good result. AI is icing on your architectural cake. If your cake is half-baked, your AI is going to stink. (Yes, AI shows promise for automating metadata cleanup, and AI can do pretty well ingesting unstructured data at times, but for auditable use cases, data quality is a winner). 

Reckoning with AI workslop and downstream impacts

Oh, and AI workslop isn't working. AI deployments should govern the surge of machine-generated content - as well as the downstream impact of cleaning up flawed AI materials/code. Downstream impacts don't always negate upstream productivity gains, but they are certainly part of any ROI equation. As I said to Welsch:

We are accelerating imperfect work inside the organization that's machine-generated. So there's a lot of work that needs to be done there to figure out how we're not piling on noise onto our colleagues. A classic example my colleague Brian Sommer hammers is actually not from internal employees, it's being bombarded with [AI-generated] job applications, from ambitious job seekers who are applying to hundreds (or thousands) of places at the same time. They're justifying it by saying, 'Well, you're kind of doing the same to us. You're using the same tools to screen us out.' 

What's working on agentic AI projects? The list might surprise you

1. Granular autonomy - giving customers the level of AI autonomy they want, and the ability to dial it it up and down at a per-process level, is the winner:

Vendors need to stop talking about the fully autonomous enterprise and fantasies about that, and give people control over how much autonomy they want, and you're going to get a result. 

2. Evaluation and observability. As I said to Welsch: 

The next thing that's working is evaluation and observability of agents. It's incredibly important to be able to evaluate agents and be able to, ideally, make real-time adjustments in their behavior, but definitely to audit them afterwards. Look at audit trails, see what went wrong. Vendors aren't doing a great job on this topic overall, but the technology is there to help here - including open source tools. And, there are some wonderful exceptions: evaluation vendors that help maximize agentic workflow design and accuracy.  

Welsch added traceability to the mix: 

I think that's absolutely critical. From what I've seen over the past couple of weeks, whether it's pharma or it's financial services, certainly industries with heavy regulation and the need for clear documentation - the same is true there as well. 'What did the agent do? Why did it do it? What data did it use in the process of making a decision or proposing a decision? Where did the data come from?' Things like that. Traceability is really, really important in that same context as well, to be able to show to your regulators, to your compliance, to your risk departments, and say, 'Hey, this is what the agent did. And here's the whole trace of why it happened, and what exactly has happened.'

3. Explainability - this one may surprise. I hold the view that "reasoning" doesn't take us much further towards LLM explainability; LLMs are still largely black boxes. But: explainability -  via the so-called "context window" -  helps to mitigate that: 

The next thing that's working is explainability via tools, RAG/knowledge graphs. In a lot of enterprise contexts, AI explainability gets a little bit better, because when you see the demos, you see the source documents that that are being pulled for various workflows, and questions you might have of your agent, or your assistant, or whatever you want to call it... That's working right now, because it's taking a bit of the edge off the mysterious 'Where did you get this information from' issue.

4. Use case design - more from the category of the bloody obvious, but it bears a mention: you can have two identical customer service agentic AI deployments - even using the same technology - and one can succeed, while the other fails. It's all in the design: how easy/intuitive is the escalations to the humans? How effective is the risk mitigation for outliers? Was the system designed to make the overall customer experience better, or is it intended for the stated goal of headcount reductions, the heck with the customer satisfaction scores (when you deploy AI properly, CSAT should go up, not down!).  

5. AI readiness is working - this is a monster topic, too big for one article - but AI readiness is working. AI readiness means acting on the organizational implications of proper use of AI. How? By breaking down data silos - and building cross-departmental teams tasked with rethinking processes, and establishing governance frameworks. Obviously, data and process excellence is a core aspect of AI readiness. Do it right, and you should stack up wins along the way. 

I rounded up some of these "AI readiness" projects in my 2025 retrospective, The Year in Review - AI in the real world. Two strong examples: 

Mark Samuels' round up is compelling because it shows even organizations on tight budgets can stack up data/analytics wins, while improving data quality. And Derek du Preez's process intelligence use cases share eye-watering numbers companies have achieved, prior to even considering agentic scenarios. 

The other crucial aspect of AI readiness? Data readiness for the needs of agentic AI systems. This is a different set of requirements than a data lakehouse, or any other data repository that has come before. The good news? We know a lot more about the kinds of annotations and metadata agentic systems need, but: there is still plenty of debate on the best type of AI data layer, and how to roll it out in an economical way. 

And yes, AI can help with the data quality pursuit, but from what I've seen, the fantasy of removing humans from data cleaning initiatives is just that. Yes, you may be able to reduce the data analysts needed, but the unwisely downgraded human domain experts can still spot problems in the data that no machine can - just as machines can surface anomalies across vast data sets humans would struggle to identify. (For more on humans thriving amidst machines (and vice versa), see my session recap: CCE 2025 - finding human purpose amidst the AI noise vortex).

My take - generic AI productivity tools fall short, but better ways are emerging

I've already mentioned the final three things that are working: context, compound systems > standalone LLMs, and the push for domain specific and/or right-sizing language models.

I used to scathingly refer to context engineering as a bandaid to compensate for the inherent limitations of LLMs. Whether we loathe the buzzword or not, the art/science of context engineering has come a long way. No, it's not the path to intelligence as some vendors exuberantly imply, but it's absolutely a worthwhile discipline (Why context engineering - not prompting - is the key to building reliable AI agents). 

These approaches don't just get you better LLM output. You can apply them to building industry apps. For enterprises, obsessions with model power are a distraction. As per this talk on vertical AI by Christopher Lovejoy

Our bet really is that when it comes to vertical AI applications, the system that you build for incorporating your domain Insights is far more important than the sophistication of your models and your pipelines. So the limitation these days is not like how powerful is your model and whether it can reason to the level you need it to. It's more can your model understand the context in that industry for that particular customer...

Of course, it's not necessarily easy for smaller companies with limited tech resources to manage a modern data context architecture. For now, customers may be better off relying on a third party to provide their AI apps/data platform. But as Esteban Kolsky and I discussed in our podcast on operationalizing AI, you might rely on a third party platform, but you never outsource your AI vision - or your creative applications with the technology.

So what should customers do in 2026? They can start by detoxing from AI keynote theater and tracking process results, not AI features. The right AI service provider matters. A recent episode of the Business of Tech nailed this down:

The winners will not be the providers with the most AI features turned on. They'll be the ones who can show in a QBR that a workflow changed, a metric moved, and an outcome improved. That is the proof standard the market is moving towards, and the time to build it is before clients demand it... Audit every AI deployment against a workflow change, not a feature checklist. If the process did not change, the AI did not deliver value. 

Pick three to five service metrics, e.g. ticket resolution, escalation rate or approval exceptions: 

Establish a baseline before rollout. If you cannot show movement against the baseline, treat the tool as cost, not value... Most clients do not need more features. They need process, redesign, role, clarity, training and measurement. 

In 2024, I contrasted AI project success and failure: Attention enterprises - your AI project success in 2024 is not a given. Though the technology has advanced, the eight project success principles haven't changed much. Number eight? 

 AI projects are not auto-magical; they should be subject to the same project disciplines as any other technology.

I argued for three key project components: 

Systems of accountability,  maturity models to track your evolution, and customer-driven KPIs: "How is this project improving your service to customers, suppliers, and other stakeholder groups (yes, employees count as a stakeholder group also)."

Some will protest that thinking of AI as "projects" is legacy. Shouldn't AI be truly embedded if you want it to have impact? Probably. AI works best via an iterative push for improvement across processes and stakeholder groups. But projects still need accountability. 

Successful projects build momentum, while lukewarm projects are, well - lukewarm. I don't believe the reams of studies documenting AI's ROI problems are as bleak as they seem. PwC, which issued those fairly dire AI project revenue findings I cited earlier, also asserted, in the exact same study: 

The companies gaining transformational value from AI aren’t confining its use to small efficiency gains. Instead, they’re using it to transform end-to-end workflows and redefine how they create value. 

That rings true - but you have to start somewhere. The pitfalls are everywhere, but if you dig, the successes are there too - we've documented a number of them on these pages. Some are modest, but modest successes add up, even as markets demand exponentialism and dreamy 10x productivity gains. How about we start with repeatable results instead, and go from there? 

If you want to dig a bit deeper, check out this hot seat appearance with the CRM Konvos team, Where do we go from here? A candid AI enterprise recap with CRMKonvos - also embedded below. 

Loading
A grey colored placeholder image