Why the boundary problem is the biggest challenge in agentic AI
- Summary:
-
AI rockstar Andrej Karpathy has launched a self-learning agent that shines a light on the boundary problem. Unremarkable at first read, but a deeper analysis shines a light on the biggest challenges and even bigger opportunities in agentic AI.
Andrej Karpathy, who led the computer vision team behind Tesla's self-driving brain (and many other AI things) has recently demonstrated a proof of concept for a very simple self-learning agent called auto-research. He claims this is the future of agentic AI, which is at least a decade away from delivering real value. This lands in stark contrast to the popular narrative that trillions of agents running on trillions of dollars of AI infrastructure are just around the corner. Karpathy frames this conventional narrative as scaling agentic slop.
I have a felt sense he is right despite the trillions of dollars betting against his thesis. I'll unpack in more detail why the bet on the conventional agentic AI narrative may mostly prove to be an expensive and unhelpful lesson in the long run. But suffice it to say that today's agentic AI mania has eerie parallels to the hype of microservices a decade ago - a similarly expensive bet that mostly never paid off in a big way, other than for the hyperscalers with billion dollar IT budgets.
To be honest, Karpathy's fascination with autoresearch is not an easy story to explain since it requires a paradigm shift to appreciate what this research is pointing toward. Most of the explanations he has given don't land easily in the conventional frame informed by optimizing for extraction. You have to shift your frame to optimizing for learning or learning-how-to-learn.
Also, the results thus far have been interesting, yet meager, and frankly hard to explain the significance of. For example, autolearn evaluates everything against one scalar, a single number that tells it whether the change helped or not. This is a deliberate constraint, not a limitation. While everyone else is excited about optimizing across multi-dimensional vectors, Karpathy is pointing at something more fundamental: if you can't define improvement as a single clear number, you don't yet understand the problem well enough to automate it.
Early pioneers who have taken Karpathy up on the hidden invitation buried in this project have seen promising results. Shopify CEO Tobias Lütke, banged autoresearch against Shopify's Liquid template engine algorithm that he created in 2005. In one night, he improved Liquid's performance by more than half on an important process and more than halved memory overhead. These are big things when applied to reducing the costs of running the infrastructure powering Shopify. And even more prescient now as skyrocketing memory costs impose new constraints.
An even more important question buried in the news about this is what evoked the CEO of a multibillion-dollar company to devote his limited free time to exploring a new algorithm. This stands in stark contrast to the popular fad of CEOs focusing on more "important" things like developing headcount reduction schemes to make room for more AI budget so more shareholders can milk the stock gains a bit more.
What Karpathy is pointing towards will actually require creating the conditions for your best and most expensive staff to attune with the learning-how-to-learn mindset implicit in the new paradigm. In this frame, the value comes from applying this paradigm to the boundary between humans and AI, not replacing the humans with AI.
Rhymes with micro-services
About ten years ago, I walked into a microservices talk at the Java World conference in San Francisco and was surprised to see developers packed shoulder to shoulder. They all seemed very excited about the prospect of autonomous services, each doing one thing, independently deployable, and communicating through lightweight APIs. Netflix and Uber were doing amazing things with them. And my editor at the since renamed SearchSOA news blog began assigning me stories at an increasing cadence to the effect of "Is SOA dead."
Well today SOA is well and truly dead, but the thing that took its place was not quite microservices. The site went through multiple name changes that started with SearchMicroservices and pivoted on from there as the intervening fads evolved. I think the fundamental thing wasn't that micro-services were entirely unhelpful. It's just that they weren't helpful for most enterprises lacking multi-billion dollar IT budgets and chaos engineering talent for testing how they broke.
The boundary problem
I really struggled to understand, much less explain, why. It all sounded really good on paper. After a few years, esteemed software luminary Martin Fowler observed that the fundamental challenge enterprises were bumping up against was what I'd call the boundary problem. Teams would try and decompose their applications into lots of services, but then these ended up being tightly coupled in new and mostly unexpected ways. For example, the services themselves were stateless, meaning they could theoretically operate on their own.
But they were still dependent on the state stored across the boundaries that emerged. They often required synchronous communication, which meant they stopped working when another micro-service had a hiccup. They also shared database schemas through the back door, which limited the cadence of updates. I saw one recent statistic that claimed that 90% of micro-services were still deploying microservices in batches like the old monolithic apps they were supposed to replace.
Flash forward to today and the fully decentralized autonomous agents everyone is talking about have eerie parallels to micro-services in 2016. All of the consultancies have developed white papers explaining why this new thing is an important architectural shift in minute technical detail in five to nine bullet points with catchy new phrases. It certainly is important for the consultancies' engagement sales, but only time will tell if this plays out differently for enterprise early adopters.
Some things are actually quite new and different. The new agentic APIs like MCP can take advantage of LLMs (Large Language Model) strength in translation to support fuzzy interfaces rather than the rigid and brittle contracts implicit in micro-services. The agents can communicate through natural language to manage ambiguity. The observability infrastructure has also evolved so agents can observe their own performance and adjust. Also, the application layer is much richer with more capable and connected and contextually aware cloud platforms. End-to-end data and now context management have become first class priorities for most cloud providers.
My take
The current agentic AI hype directs attention towards agents as the important thing: how capable they are, how many you can deploy, how autonomously they can operate. But the lesson from microservices is that the agents aren't the hard part. The hard part is the boundary.
In Karpathy's setup, the human focuses on improving the prose document that tells the agent what to learn about. The value lies in making the boundary between human craftsmanship and machine execution explicit, learnable, and improvable.
Karpathy believes it could take a decade to shift the conversation from optimizing the agents to optimizing the boundary in a meaningful and useful way. In the meantime, he expects AI to blend into the roughly 2% GDP growth trajectory that has held for two and a half centuries, not the explosive transformation the hype cycle promises.
His first prototype is pretty simple and it's not entirely clear how this might scale to millions of self-learning agents that scale in the way something like DNA-systems have learned over billions of years implicit in bacteria and even brains. For example, how is it that the programs running on zebra DNA can spit out a brain fully capable of controlling a zebra body in walking right out the gate, while human brains take a few years to learn this feat?
The question Karpathy seems to be inviting us to sit with is what might be possible if we can learn to point AI to point AI back on itself and the boundary between itself and humans. Autoresearch is actually a small part of Eureka Labs, a much bigger project to optimize the boundary between AI and humans learning about AI.
Karpathy gives one example of how this shift showed up from him personally while learning Korean. He started with online courses, moved to group classes, and finally found a one-on-one tutor he resonated with. The magic she introduced him to and which he is most excited about was this quality of attention and attunement between them in guiding the learning process. In a discussion on the Dwarkesh Podcast he recalls:
Instantly from a very short conversation, she understood where I am as a student, what I know and don't know. She was able to probe exactly the kinds of questions or things to understand my world model. No LLM will do that for you 100% right now, not even close. But a tutor will do that if they're good. Once she understands, she really served me all the things that I needed at my current sliver of capability. I need to be always appropriately challenged. I can't be faced with something too hard or too trivial, and a tutor is really good at serving you just the right stuff. I felt like I was the only constraint to learning. I was always given the perfect information. I'm the only constraint. I felt good because I'm the only impediment that exists. It's not that I can't find knowledge or that it's not properly explained or etc. It's just my ability to memorize and so on. This is what I want for people."-
The excitement that Karpathy is pointing us towards lies in imagining how we might bring this quality to AI systems observing the boundary between them and us, and then scale this up into a kind of AI academy for cultivating flourishing for eight billion people learning about this boundary. He is not talking about learning the technical details of optimizing the LLMs – they are hitting their limits in terms of distilling what happens inside of us. It's more analogous to optimizing the molecular machinery that lives in the space between the raw DNA and whatever informs zebra brains to walk the day they are born.