Main content

What if your security problem is really your data problem? Here's why Databricks is shifting the blame as it enters cybersecurity

Alyx MacQueen Profile picture for user alex_lee December 18, 2025
Summary:
Cybersecurity principles are simple, says Databricks' Field CISO. The hard part is doing them at scale across thousands of users and hundreds of applications. The company thinks its data platform – not another security tool – is the answer.

cybersecurity

Omar Khawaja spent nine years as CISO of a $26 billion organization responsible for 12 hospitals. He joined data and AI platform vendor Databricks after concluding that security problems can't be solved with security tools alone. That same thinking underpins Data Intelligence for Cybersecurity, the company's new platform. 

Khawaja, now VP of Security and Field CISO, makes a straightforward argument cybersecurity principles are well-established. The difficulty is doing them at scale across thousands of users and hundreds of applications. And that, he says, is a data problem – not a security one. He explains:

I could boil all of cyber down to maybe seven or eight principles and say, if you just do these eight simple things, you will have protected your organization. However, when you go to go do this practically in the real world, very little about it actually feels easy.

He compares it to medicine. Doctors know that if patients ate well, exercised, slept enough and avoided harmful substances, a swathe of health problems could disappear. The principles are simple, but human reality makes them hard. Security operations face the same problem. Logging events, detecting anomalies, investigating incidents and hunting threats are straightforward activities that become unwieldy when multiplied across tens of thousands of users, hundreds of applications and operations in multiple countries.

The agent architecture

Databricks' answer is Agent Bricks, a framework for building AI agents that can automate the analytical work currently done by human analysts. The platform integrates with the company's Lakehouse architecture, which combines data warehouse and data lake capabilities to unify security telemetry.

The governance model for these agents comes from hard-won experience with enterprise AI. Khawaja describes a four-stage approach covering data preparation, agent building, deployment and evaluation. Each stage has specific controls:

You've got the world of data, the world of models, and then you have to have some way of serving and embedding into an application the actual AI, whether it's agentic or non-agentic. And then, of course, you need governance across each three of those.

The evaluation stage uses Large Language Model (LLM) judges to monitor agent outputs in real time, checking for correctness against defined parameters. These can include quality thresholds, deviation from ground truth, and practical constraints such as cost per response – Khawaja notes that preventing a single agent response from costing hundreds of pounds is itself a governance concern.

Databricks has documented this approach in its AI Security Framework (DASF), a 121-page open source document the company developed over seven months with input from government organizations, regulators, standards bodies and researchers. It identifies 62 risks across 12 components of AI systems, mapping each to specific controls.

Tackling the false positive problem

False positives are a persistent drain on security operations. Analysts receive alerts, investigate them, and determine whether threats are genuine – a process that eats up time and wears people down.

Khawaja breaks down why this happens:

If you think about what happened, the human did some analysis, and based on that analysis, they were able to determine that this is a false positive. The beauty of Agent Bricks and agentic AI is the ability to take the analysis that the human did, create a playbook, and then have the playbook run in an automated way.

The distinction matters. Deterministic systems can only execute precisely defined recipes. Humans can navigate ambiguity but don't scale. Traditional automation failed because the variation and complexity of real-world security scenarios made programming responses impractical – you'd spend more time writing the rules than just doing the work. Agentic AI sits in the middle, capable of handling ambiguity while operating at machine speed.

The result, as Databricks sees it, is that analysts shift from doing repetitive investigative tasks to supervising AI agents that execute playbooks. As Khawaja notes: 

Every analyst in the Security Operations Center (SOC) now is a SOC manager or director and has 20 people reporting to them.

The platform's data ingestion reflects Databricks' roots as a data company rather than a security vendor. The company says customers are processing data at petabyte scale daily – far beyond what typical security platforms handle.

Lake Flow Connect, the company's data pipeline tool, handles ingestion, transformation, normalization and orchestration. Khawaja highlights a problem that security teams will know well: when upstream log sources change format, they can silently break detection pipelines. Teams only discover the problem during incident investigation, realizing their detections haven't been working for months. Lake Flow Connect monitors for these breaks.

The platform supports both internal telemetry and external threat intelligence feeds, with connectors for streaming, batch, structured and unstructured data sources.

The open source angle

Databricks makes much of its open source foundations in a market where vendor lock-in worries buyers. The core technologies underpinning the platform – Spark, Delta Lake, MLflow and Unity Catalog – are open source projects, several founded by Databricks' own leadership. Khawaja observes:

Our customers don't want to go to a technology where they are going to get locked in. I hear this frequently from my CISO peers – they don't want to be held hostage by a specific vendor.

Customer deployments cited by Databricks include Arctic Wolf, which processes over eight trillion security events weekly on the platform; Palo Alto Networks, reporting three times faster AI-powered threat detection; and SAP Enterprise Cloud Services, claiming 80% reduction in engineering time alongside five times faster rule deployment.

My take

The idea that cybersecurity is really a data problem isn't new, but Databricks appears to have built something real here. The company has started from genuine platform capabilities rather than bolting AI features onto a security product, and open-sourcing the AI Security Framework suggests confidence that transparency won't hurt its competitive position. The conundrum for security teams is whether they can actually use this. Databricks isn't selling a box that replaces your Security Information and Event Management (SIEM) system; it's selling infrastructure for organizations that can build sophisticated security operations on a data platform. That's a genuine opportunity for teams with the right skills – but it also means the hard part hasn't gone away. It's just moved from "our security tools can't keep up" to "can we build and run this ourselves?" Plenty of vendors promise autonomous security response, but not many explain how they stop those agents doing damage. Databricks' four-stage framework with built-in evaluation is at least an answer to the question of what "safety-governed action" actually means. 

Loading
A grey colored placeholder image