How Bayesian-inspired uncertainty management could shape the future of trustworthy AI -

George Lawton

March 13, 2026

Dyslexia mode

Summary:: Concr CEO Irina Babina and CTO Matthew Griffiths unpack how Bayesian foundation models can excel at uncertainty management to build trust in medical AI digital twins. Their insight has important implications for user experience design that augments rather than replaces human context and expertise.

Business person staring at concrete wall covered in question marks, uncertainty concept © peshkov - Canva.com

Bayesian approaches and foundation models that treat uncertainty as a first-class primitive are helping to mitigate a critical gap in more trustworthy and efficient health care. This stands in stark contrast to the Large Language Model (LLM)-based statistical approaches driving current AI hype and investment. This has important implications for other industries bumping up against the limits of LLMs in handling edge cases and exceptions.

One example of this shift in medicine was that three of four talks at the Royal Society of Medicine's (RSM) Digital Twins and Emerging Technologies in Clinical Practice conference addressed the role of Bayesian techniques. One of those speakers, Irina Babina, CEO of Concr, an AI oncology startup, says:

At RSM, there were several presentations, and out of four talks, three have sort of come away with a common denominator of Bayesian digital twins of the future. Those approaches are definitely the ones that capture biology better, that we can use to start to unpick certain challenges that we're facing in different indications.

Babina did not set out to build an AI company. She trained as a geneticist, spent twelve years doing academic research, and watched with increasing frustration as good science failed to translate into clinical benefit. This sparked her transition to funding research that might deliver better results and eventually helped direct hundreds of millions of pounds into UK research.

In the midst of funding management, she reconnected with a former colleague who sparked her interest in a new approach to building digital twins to simulate the risks of cancer treatment interventions. This piqued her current interest in modeling the risks of capital allocation, which also aligned with her previous research interests. She explains:

Before deploying large capital, wouldn't you run the simulation first? Similarly, before giving somebody a very toxic therapy, surely you should run some simulations about how they are likely to respond. So that really resonated.

Bayesian frame of mind

The Bayesian specifics came later through an iterative process of asking better questions until the answers became compelling. Once you strip away the mathematics, the Bayesian approach informs a willingness to sit with uncertainty and to frame better questions. Babina says sparked enthusiasm for improving the process she was already familiar with in medical research:

I didn't know anything about Bayesian frameworks, I didn't know much about Data Science, and the more that was explained to me, the more I was asking questions, the more I realized how powerful it could be.

Concr’s founding team had been inspired by recent progress in using a similar Bayesian foundation in astrophysics research. In that domain, they were helping to characterize galaxies with incomplete data. These were structurally similar to the challenges of building patient models from fragmented and noisy patient records. Babina says:

Real-world and research data is very messy, and it's inherently unreliable, because we don't even have information, or correct information that we collect about a given biology of response, or given patient response to a therapy, or why the patient has actually responded? Why has the patient survived? Inherently, there is no certainty about the data that we collect.

In her view, the LLM paradigm runs into a structural problem trying to make sense of this. LLMs struggle when there are questions or uncertainties about the inputs. This can be dangerous in oncology because the data are structurally incomplete, and the stakes of overconfidence can be severe.
Babina found three compelling properties of Bayesian approaches: explainability, flexibility, and computational efficiency. With explainability, Bayesian models can tell you not only what they predict but why. It provides a framework for understanding which components in a model contributed to which aspect of the prediction and with what degree of confidence. This also helped researchers trace causal chains through complex, interconnected biological systems. This was critical because scientists don’t fully understand biology.

Flexibility in the face of dynamic change is also important because cancer is not a static disease. As patients progress through therapy, the tumor's biology changes. A model that requires full retraining for each disease shift would be very difficult to use as a clinical tool. A supporting aspect of this flexibility is that Bayesian approaches provide an efficient framework for updating specific prior beliefs (priors) without the overhead of starting again. Babina explains:

Rather than re-computing everything and retraining a whole model, you're only changing one component of it, or the priors, and then you get an adjusted output. And that is really efficient in terms of compute power, and it's really effective in that you're utilizing all available information to make a decision.

Critically, the output is not a specific answer per se. Rather, it shines a light on the probability of responses to all available standards of care or emerging therapies.

New processes required

One of the challenges of working with Bayesian approaches is the need to break a question or goal across multiple supporting processes. Concr’s approach was to architect three core components working together.

Biological components describe the relationships between the cancer data, represent the state of the tumor and impute missing data.
Intervention components model the molecular and clinical interactions of a therapy with the cancer biology.

Outcomes components simulate the outcomes of applying a given intervention to a specific biology.

This three-part structure enables selective updating, which is helpful in a domain characterized by sparse, novel, and fragmented data. Concr CTO Matthew Griffiths elaborates:

When we receive new data, it will likely only address a small aspect of one of these components. For example, when modeling a novel therapy, we would only need to fine-tune the interventions component to refine the model's understanding of the method of action of the therapy. So we can rapidly learn from small and sparse datasets on novel therapies. When receiving new individual patient data, we can also further fine-tune the biological model to account for batch effects that affect the data inputs.

One challenge was that although oncology data is highly structured, there are no standards for the data that are adhered to across datasets. The Concr team had to do a lot of work to define its own internal standards and formats that are both simple enough to be compatible with the diversity of datasets they receive and complex enough to capture the biology driving the response.

The Concr team refines this model by running simulations prospectively and then comparing the predicted results to what happened in real time. This requires a stepwise approach, iterated one data set at a time. They have now done this across 17 clinical trials and have models of over 5,000 patients.
Another challenge lies in assessing performance. Relevant technical and clinical metrics operate in distinct domains, and there is no single number or metric that resolves all of the differences. Griffiths explains:

Assessing the performance of these models is a complex topic, and there isn't a perfect metric to capture how these models work. For example, in treatment selection/optimization, we might care more about relative vs absolute accuracy (i.e., the ability to predict the difference in outcome between treatment choices, rather than the outcome itself). There is often a tension between assessing the technical performance of the model to replicate reality (e.g. if the model predicts an 80% chance of complete response, does that correspond to an 80% in real life), vs the clinical utility of the model (i.e. if you use the model to identify a responder cohort/change treatment choice, how much does that change your response rate/survival).

Closing the UX gap

Another challenge is that clinicians currently make treatment decisions by assembling clinical, imaging, molecular, and diagnostic information across multiple platforms and integrating it in their heads. Concr is working with them to develop a layer that brings all of this together, in concert with the Bayesian models, to improve their workflow. Babina says:

Many clinicians across geographies say that we just want to have everything in one place. We want to integrate clinical, imaging, molecular, and diagnostic data in one place. And they currently do it in their head, and very often they have to use different platforms to be able to gather that information.

Griffiths observes that what a CTO cares about and what a clinician cares about are not just different in degree but in kind, which requires bridging this interdisciplinary gap to improve collaborations:

I've learnt that the clinicians and translational scientists are generally interested in surprisingly different information and outputs from me. As the person in charge of developing the models and producing the predictions, I'm primarily focused on verifying the model, and ensuring that the predictions are as accurate as possible, whereas the clinicians are interested in how the results can be used to change and justify their decision-making.

The kinds of data, figures and metrics that are relevant to each process are fairly distinct. Over the course of developing the software, we have had to work hard to bridge this multi-disciplinary gap in mindset and approach to make the software work for clinicians. It's been satisfying to see a consistent and widely applicable core workflow emerge as we've refined our combined understanding of the translational problem.”

One concern raised about clinical AI systems is the potential to erode existing skills. For example, early research on AI-powered colonoscopy tools may have reduced some doctors' ability to identify early tumors, yet did not seem to catch as many as the best doctors on their own. Babina says it was important to find ways to surface information to clinicians so that they can refine and improve their skills over time rather than replace them:

What you can do is almost double click on, ‘well, why is that the case?’ Rather than focusing entirely on the top line, you want to understand the underpinning ‘why’ for every single patient. And if you had a tool to be able to do that in a really easy and accessible way, then you're beginning to almost add value to the knowledge rather than subtracting from the skill.

My take

The prevailing thesis driving both Big AI development trajectories and investment mania is that value can be delivered by scaling data center construction and models alone to discover emergent capability. This is also driving narratives that these more capable models will spark a mass destruction of existing business models characterized as the 'SaaSpocalypse', 'SecPocalypse', and 'COBOL-acolypse'. A side implication is that businesses could eliminate all those front-line experts, such as programmers, security analysts, and subject-matter experts, to reduce costs.

But this paradigm seems to conflict with the messy edge cases that require experts, including vendors on the front lines and engineers steeped in their craft, to work with uncertainty to solve new problems. One prominent example of this mania was Elon Musk’s insistence that full self-driving was right around the corner over a decade ago. The financial market seems to be voting that this aspiration might be just around the corner, even more strongly than back then. The reality on the ground has been declining year-over-year sales. Maybe it will be different with the robots...

In contrast, the Bayesian paradigm championed by Concr suggests that the way to deliver meaningful outcomes that create new value lies in improving feedback loops between humans and AI to navigate uncertainty. In this model, people are actively in the loop, thinking about how to structure the problem better, defining what uncertainty means in context, and validating that representation in a way that can be calibrated back to reality.

That said, LLMs could also play a supporting role in this emerging paradigm by improving aspects of this process they are good at, such as organizing language, structuring unstructured input, and connecting disparate data pipelines. Complementary techniques such as Bayesian models could handle other aspects, such as representing uncertainty in complex, sparse, and high-stakes domains, in ways that support rather than replace human judgment.

It's telling that the majority of speakers at the RSM conference arrived at the same conclusion in a space as high-stakes as healthcare. That’s a different kind of signal than AI benchmark scores or data center growth strategy. It points to something that cannot be blindly scaled with money and data centers alone. It will require scaling iterative feedback loops between experts in different domains and AI systems, as part of an iterative process that includes and even requires people operating at their most curious and most patient, with a willingness to sit with uncertainty at ever larger scales.