DORA the Explorer - how to unpack AI capabilities paradoxes - and why you need to know if you're in coding
- Summary:
- Google’s DORA first report on AI coding maturity provides a valuable starting point and a bevy of helpful metrics. It also points to some interesting paradoxes that require additional context and might benefit from an expanded HR role.
Google’s DevOps Research and Assessment (DORA) has released its first-ever analysis on how companies are succeeding or struggling with AI-driven coding. The research group has been making valuable contributions to advancing the academic rigor in understanding software development maturity frameworks since 2012, and was acquired by Google in 2018.
They began by alluding to AI’s impact on software development in 2023 and, in 2025, introduced a mid-cycle special report on the Impact of AI in Software Development. This more recent version introduces an inaugural DORA AI Capabilities Model that identifies seven AI-related organizational capabilities associated with high-performing organizations. These have varying statistical associations with ten potential outcomes worth measuring.
This feels like an excellent starting point for organizations to assess their current use of AI tools and identify opportunities for improvement, regardless of where they sit on the AI maturity curve. They frame it as how AI can be an amplifier for both better and worse:
AI’s primary role in software development is to amplify. It magnifies the strengths of high- performing organizations and the dysfunctions of struggling ones. The greatest returns come not from the tools themselves, but from investing in the foundational systems that enable success.
It also introduces several curious paradoxes that may require new metrics, context, and roles. Here are two to consider: 1) Why is the perceived developer experience of individual productivity correlated with higher rates of burnout? Why does breaking work into smaller chunks improve product performance but decrease the experience of developer effectiveness? Hint: HR is not mentioned even once.
I'll get to that in a moment, but first, let's unpack the high-level framework.
High-level framework
The report itself starts with the capabilities and then walks through how an organizational level of competence in these is associated with the following outcomes:
- Organizational performance: The overall success of the organization, based on characteristics like profitability, market share, and customer satisfaction.
- Team performance: The perceived effectiveness and collaborative strength of an individual’s immediate team.
- Product performance: The success and quality of the products or services the team is building, based on characteristics like helping users accomplish important tasks and keeping information safe, and performance metrics such as latency.
- Software delivery throughput: The speed and efficiency of the software delivery process
- Software delivery instability: The quality and reliability of the software delivery process.
- Code quality: An individual’s assessment of the quality of code underlying the primary application or service they work on.
- Individual effectiveness: An individual’s self-assessed effectiveness and sense of accomplishment at work.
- Valuable work: The self-assessed amount of time an individual spends doing work they feel is valuable and worthwhile.
- Friction: The extent to which friction hinders an individual’s work. Lower friction is generally considered a positive outcome.
- Burnout: Feelings of exhaustion and cynicism related to one’s work. Lower levels of burnout are generally considered a positive outcome.
And here are the capabilities that are associated with these outcomes in varying degrees:
- Clear and communicated AI stance: Ambiguity creates risk. A clear policy provides the psychological safety needed for effective experimentation.
- Strong version control practices: As AI increases the velocity of change, version control becomes the critical safety net that enables confident experimentation.
- Quality internal platforms: A platform provides the automated, secure pathways that allow AI’s benefits to scale across the organization.
- Working in small batches: This discipline counteracts the risk of AI generating large, unstable changes, ensuring that speed translates to better product performance.
- Healthy data ecosystems: The benefits of AI are significantly amplified by high-quality, accessible, and unified internal data.
- AI-accessible internal data: Connecting AI to your internal documentation and codebases moves it from a generic assistant to a specialized expert.
- User-centric focus: A focus on user needs is essential to ensure that AI- accelerated teams are moving quickly in the right direction.
Putting it into practice
The report itself dives into a lot of nuance for the capabilities and outcomes that would be difficult to summarize into this brief dispatch. The high-level advice for working with these two concepts is to organize a team to undertake a value stream mapping (VSM) exercise to learn how to do these things effectively. This is a tactical exercise to get the ball rolling on understanding how work flows through the system from idea to customer. Think of value stream mapping as a complement to looking at specific metrics and experience for understanding where and why these are occurring.
In the long run, it can inform an organization-wide practice of value stream management, which is a more extensive, continuous process. This is a much bigger topic beyond the scope of their report. That said, even the more tactical value stream mapping involves an interactive, iterative process. Unfortunately, both tactical and strategic variants share the same three initials, VSM, so I will still use their convention for using VSM for describing value stream mapping. The report says:
In an era of rapid AI adoption, the greatest risk is pouring massive investment into chaotic activity that doesn’t move the needle. DORA research shows that AI acts as an amplifier of positive and negative behaviors and outcomes, so it’s essential to identify and address dysfunction in the flow of value. VSM is what separates disorganized activity from focused improvement, allowing you to target the most impactful capability.
Things to consider:
- Prioritize the act of mapping over delivering artifacts.
- Get process details out of individual heads into a shared space.
- Prioritize making work flow smoothly and predictably rather than optimizing steps.
- Teams should be empowered to experiment, learn, and adapt without fear or reprisals.
Other tidbits
Interestingly, an extremely high user-centric focus only has a small increase in team performance. Things that get in the way include a feature factory mindset that's more focused on velocity than user value, adopting new technologies that don’t solve user problems, and organizational silos that inhibit developers from gaining the deep context and empathy required to align their efforts with end users.
Quality internal platforms were associated with large increases in organizational performance. Think about how to reduce developers' cognitive load. Start with a minimum viable platform and identify a golden path that makes this demonstrably better and that can be extended over time with feedback and observability. They talk about the HEART acronym, which includes Happiness, Engagement, Adoption, Retention, and Task Success.
It's also important to craft a shared understanding of a clear, communicated AI stance, which can have a large impact on AI effectiveness and a moderate increase in organizational performance. It needs to include input from across the organization. It also needs to be a living document rather than a one-and-done static policy. But if it changes too frequently, it can lead to confusion. As the report states:
That leadership vision, however, can’t be implemented from a single silo. A policy created only by legal or security is unlikely to work in reality. The stance should be authored by a cross-functional working group with representatives from engineering, legal, security, IT, and product leadership. A group of this kind is uniquely positioned to balance risk management with the practical realities of developer workflows.
Another example is the notion of a healthy data ecosystem, which has also been associated with a large increase in organizational performance. Metrics include timeliness, data incidents, and data quality (accuracy, completeness, and timeliness). An adjunct is that AI-accessible data makes it easier to capture the full context, which can have a significant impact on individual effectiveness and code quality.
One good practice is to assign a team to a pilot project on a specific app or service, then work through the data quality issues. You want to avoid polluting AI with bad examples, such as deprecated projects, experimental code, or code that violates best practices. You might even consider including new developers on this. One interesting measure is how long it takes new developers to learn to use a company’s data infrastructure, not to mention its tooling.
Version control can work better if it goes beyond just code management to include explainable commit messages, prompt management, agent configuration files, etc.
The paradox of batch size
This is where it gets interesting. Very small batches are associated with a medium increase in product performance, a medium decrease in friction (a good thing), but an unsubstantiated effect on individual effectiveness. However, large and very large batch sizes can both lead to substantial increases in individual effectiveness.
Keep in mind that when they say 'individual effectiveness,' we are talking about how it feels to the developer. As per DORA:
More importantly, we argue that individual effectiveness should not necessarily be pursued as a goal in and of itself. Rather, individual effectiveness is a means to realize greater organizational, team, and product performance, as well as improved developer well-being.
While this necessary shift from raw code generation to thoughtful decomposition and verification may feel like a loss of individual speed, it is precisely this discipline that can unlock sustainable team-level performance, and help prevent downstream chaos.
The report has several theories on why this might be the case:
- Overhead – the human process of breaking a problem into many smaller ones shifts focus from writing code to decomposing, prompting and verifying.
- Review friction – the cognitive load to review a small chunk of unfamiliar code increases the effort of contextualizing it.
- Tooling mismatch – existing AI tools might add layers of manual overhead.
New skills and processes might be required to align the developer's perceived experience with organizational performance, not to mention customer experience. Here are some examples:
- Refining story slicing skills – training product managers and developer teams to break larger stories into smaller slices of value. They argue that the common feeling that a feature is too big to be broken down is rarely true.
- Learn how to commit new updates at least once per day.
- Learning how to use feature flags to decouple new updates from broader release. This can help build trust with product managers who want to see the whole set of features before release.
- Developing managerial habits to limit the work in progress.
My take
This seems like a helpful framework for organizations at all stages of their AI development journey. One additional dimension that could make it more meaningful would be to carve out a new role for HR across this process. As the report observes:
To achieve a return on investments made in acquiring and adapting to AI, teams should also attend to how they communicate, collaborate, and operate across their broader socio-technical context.
Now, let's come back to this paradox between individual effectiveness and burnout. In their framing, individual effectiveness is measured by a sense of accomplishment, while burnout is measured by feelings of exhaustion and cynicism related to work.
Both can be associated with increased AI usage. And remember that just because someone feels more personally effective, the software quality or end customer experience might not be improving as a result.
The report posits that most tasks can be de-composed into smaller, more manageable tasks that require not just learning new skills but also making them feel familiar and easy. Also, there is this notion that developer onboarding time to full productivity can provide valuable insight into data quality, software tool infrastructure, and other important capabilities.
Getting to the bottom of these issues and working with them all seems like they could benefit from expanding the role of HR experts from compliance watchdogs to human empowerment advocates. Why are developers more prone to burnout with AI, and how do you translate these insights into meaningful conversations across the organization? How might organizations develop learning pathways and cultural norms that make writing and committing smaller batches of work as effortless as writing and committing larger ones?
For sure, this will require collaborations across the organization. For example, small batches might feel challenging because developers lack the broader context, which might require new processes or better tools to make it easier to see the bigger picture when working on small batches. A starting point can include defining the outcomes, organizing a cross-functional team, and mapping the flow. Work through the proof-of-concept to prioritize AI capabilities that might help.