NEW

Panther joins Databricks to build the future of the security lakehouse. Read more →

Platform

Solutions

Resources

Company

Book a demo

Platform

Solutions

Resources

Company

Book a demo

Panther joins Databricks to build the future of the security lakehouse. Read more →

See all blogs

BLOG

Agentic Security Orchestration: Where Agents Fit and Where Humans Still Matter

Michelle

Dufty

May 27, 2026

At 30 minutes per investigation, a SOC analyst can meaningfully review about 15 alerts in an eight-hour shift — a number most teams blow past before lunch. AI agents are starting to close that gap. Production deployments now show triage agents processing millions of alerts a year and compressing per-alert analysis from 30 minutes to under 5 minutes.

The same technology produces both the strong outcomes and the disappointing ones. The variable is how deliberately a team designs the boundary between what an agent decides and what a human still owns.

This article covers what agentic security orchestration actually means, the SOC work agents handle well, the work that still belongs to humans, how to design the handoff between them, the failure modes to plan for, and a staged rollout that earns autonomy through evidence.

Key Takeaways:

Agentic security orchestration replaces scripted SOAR playbooks with goal-driven AI agents that dynamically select tools and investigation paths as incidents evolve.
Agents excel at high-volume, repetitive SOC work (alert triage, multi-source investigation pivots, detection rule drafting), while humans remain essential for novel threats, organizational context, and irreversible actions.
The handoff between agents and analysts requires deliberate engineering: confidence thresholds per alert type, full audit trails of agent reasoning, and human-in-the-loop approval for hard-to-reverse actions.
A staged rollout earns trust through measurable evidence rather than granting autonomy upfront, and agent findings should compound into detection rules and runbooks.

What Agentic Security Orchestration Actually Means

Agentic security orchestration is an architectural approach where AI agents dynamically select, sequence, and execute security tools based on the evolving context of an incident. That shift replaces fixed automation paths with systems that choose among tools and steps as an investigation develops, rather than adding one more agent to the stack.

From scripted SOAR playbooks to goal-driven agents

Traditional SOAR platforms automate known, repeatable workflows: if-this-then-that logic chains for predictable scenarios like phishing triage or failed-login lockout. A real investigation rarely follows a straight line. Looking into a suspected data exfiltration event typically means querying the SIEM, correlating EDR process trees, cross-referencing threat intelligence, and weighing the asset's business criticality, with each query reshaping the next one.

That chain can't be pre-scripted, which is exactly where fixed playbooks break.

Agentic orchestration changes the model. Instead of following a fixed decision tree, the agent receives a high-level objective and uses an LLM reasoning layer to determine what tools to call, in what order, based on what it has learned so far.

Why the orchestration layer matters more than any single agent

The orchestration layer is what turns isolated AI components into a usable SOC workflow. Individual AI components can't form a unified decision-and-execution loop without orchestration. In most SOCs today, SIEMs, AI SOC tools, SOAR platforms, and ITSM systems still operate in isolation.

Analysts switch contexts between them during a single investigation, and the agent layer inherits that fragmentation unless something coordinates it. The production deployments that actually work use multi-agent designs with a dedicated orchestration agent coordinating specialized roles.

The Security Work Agents Do Well

Agents are most useful on security tasks that are high-volume, evidence-based, and easy to verify after the fact. Three categories of SOC work have strong evidence supporting agent involvement, though broader practitioner sentiment remains mixed:

1. Alert triage and enrichment at machine speed

Alert triage is the clearest production use case for AI agents today. Cresta's security team cut triage time by at least 50% with Panther's AI SOC analyst, especially on complex investigations. Other production deployments show the same pattern — triage agents processing more than five million alerts in a year and reducing per-alert analysis from roughly 30 minutes to about 60 seconds.

As Jacob DePriest, CISO at 1Password, puts it, "I think we're going to see more as well. And things I'm excited about in the security space are things like on the incident response side of things, maybe increasing the speed of our triage."

2. Investigation pivots that don't fit a static playbook

Agents also help when an investigation needs many context-dependent pivots across different data sources. Beyond triage, agents can perform contextual, hypothesis-driven investigation across multiple telemetry sources. Reported production examples illustrate the pattern: to reach a single investigation conclusion, an AI system may query six or more data sources, generate well over a dozen investigative hypotheses, and execute hundreds of correlated queries, with every step surfaced for analyst review.

GitGuardian's security team, for example, reports gaining "absolute certainty about an alert in less than 20 minutes" after centralizing investigations into Panther, where unified data and cross-source queries replaced manual pivots across disconnected tools.

3. Drafting detection rules, queries, and reports from natural language

Draft generation is another strong fit because analysts can review the output before it changes production behavior. AI-assisted generation of detection rules from natural-language descriptions is now a common pattern across the industry, including community tooling that translates between formats like SIGMA, SPL, and YARA.

Within Panther AI, the AI Detection Builder lets analysts describe a behavior in natural language and generates complete Python detection rules with test cases and metadata. Analysts review and ship from there.

The Security Work That Still Belongs to Humans

Humans still own the parts of security work that depend on novelty, business context, and accountability. Three types of security work still need a human behind the keyboard, and no amount of agent tuning changes that:

1. Novel threats and genuinely ambiguous signals

Historical pattern matching helps with familiar attacks, but it does not solve genuinely new ones. AI security systems are trained on historical data, and that's what makes them effective at recognizing variants of known attacks.

Anomaly-based approaches can sometimes catch genuinely new behavior by flagging deviations from a learned baseline, but the further a threat sits from anything the model has seen, the more it depends on a human analyst to recognize it for what it is.

2. Decisions that depend on organizational and business context

Security decisions often depend on facts your telemetry never captured. Agents don't know whether the CFO is traveling internationally, whether a critical batch process legitimately runs on an anomalous server, or whether a vendor's IP range was just authorized but the change hasn't propagated.

Organizational context includes tacit knowledge, informal authority structures, and business priorities that are not captured in any log.

3. Accountability for irreversible actions

Irreversible or high-impact actions still need a person who owns the decision. Some actions can't be delegated to an agent under any reliability threshold, because someone has to own the consequences when they go wrong. The principle is now codified in federal AI guidance: a human is assigned responsibility for the actions of an AI system.

An automated agent that revokes credentials, blocks network segments, or isolates hosts without a human check can cause more damage than the threat itself. James Nettesheim, CISO at Block, makes the same point: "We still want a human in the loop overall. We're extremely bullish on adopting agentic coding and analysis."

Designing the Handoff Between Agents and Analysts

Whether the platform is a SOAR, an MCP server, or any other orchestration system, the responsibility for initiating an action belongs to people, not the model. Everything else in the design (confidence routing, audit trails, approval gates) exists to make that responsibility workable in production.

Operationally, that means defining how confidence routes work, what evidence the agent must show, and where human approval is mandatory:

Confidence thresholds and escalation logic

Confidence routing works best when thresholds are explicit, alert-specific, and revisited over time. Teams typically use confidence tiers to route work: clearly benign alerts may be closed after enrichment, uncertain alerts go to human review, and high-confidence critical threats can trigger containment steps with stakeholder notification.

Tune those thresholds by alert type, and revisit them as you learn where the model is reliable and where it is not.

Showing the work: explainability and audit trails

Agents need to show their work clearly enough for an analyst to verify the disposition. Every AI disposition needs a visible evidence chain. Agentic systems operating as black boxes make it difficult for analysts to understand why a particular decision was made, and that opacity is one of the main barriers practitioners cite when evaluating these systems. To meet the bar for deployment, an AI system has to surface an evidence chain for every disposition.

Within Panther AI, Human in the Loop Tool Approval pauses before the AI SOC analyst executes sensitive actions and shows the proposed action, including the tool name and parameters, for review and approval. All decisions log to audit trails, addressing SOC 2, PCI-DSS, and ISO 27001 requirements.

Human-in-the-loop approval for sensitive actions

Human approval gates protect privileged actions; treat them as a security control, not a governance preference. Agents that both ingest untrusted input and execute high-privilege actions represent a demonstrated attack path, not a theoretical one.

In high-risk or less predictable settings, agents can suggest or prepare actions, but final decisions belong behind an explicit human approval gate.

Failure Modes to Plan for Before You Deploy Agents

Hallucinations, over-broad tool access, and declining analyst attention compound when teams trust an agent before constraining it. Plan for all three failure modes before you give an agent autonomy.

Hallucinated context and overconfident conclusions

Hallucinations become much more dangerous when the model can act on them through tools. Tool use under too much autonomy or permission is one of the most well-documented agent risk patterns, classified as "Excessive Agency" in the standard LLM risk taxonomy. The failure mode is straightforward: the agent acts on a confident hallucination before a human sees it.

Mitigation starts with regular review of AI-closed work and with keeping evidence gathering distinct from consequential decisions wherever possible.

Tool sprawl and unbounded agent action

Every new tool grant expands the blast radius of a bad model decision or a manipulated integration. A compromised postmark-mcp npm package injected a BCC field into email tool calls, silently exfiltrating outgoing email content. The agent's own logging showed authorized behavior while the exfiltration proceeded.

Apply least-privilege to every agent tool grant before deployment: triage agents get read-only access, and write actions are scoped to narrowly defined workflows.

Skill atrophy when analysts stop reviewing the easy alerts

If agents handle all routine work, analysts can lose the repetition that builds investigative judgment. When AI handles routine triage, analysts lose the repetitive practice that builds investigative pattern recognition. Decades of automation research show that extended error-free automation periods can reduce vigilance and erode operator skills.

Preserve opportunities for analysts to work directly from raw alerts and operate without AI assistance at least some of the time.

A Staged Rollout That Earns Trust Before Granting Autonomy

The safest way to deploy agents is to earn autonomy through measured performance in your environment. Deploying broadly before validating on your specific environment creates avoidable risk, especially for lean teams. Trust in agent decisions has to be measured, not assumed.

Start where outcomes are easy to compare, expand only when reliability is stable, and feed the useful patterns back into durable detection rules and runbooks.

1. Start narrow on high-volume, low-risk alert types

Start with alert classes where humans can easily compare the agent's recommendation to a known-good review process. Run agents in shadow mode first: the agent produces verdicts alongside human triage, but humans make all actual decisions. Begin with phishing alerts, identity anomalies (impossible travel, new device logins), and known-bad indicator matches.

Docker's security team reported managing a 3x increase in log volume while reducing false positive alerts by 85% year over year after deploying Panther. Clean up the detection layer before layering agents on top, because agents will inherit and amplify whatever false positives already exist.

And build human-in-the-loop oversight into the workflow from day one, not as an afterthought once the agent is already in production.

2. Expand scope as reliability becomes measurable

Scope expansion follows stable performance, not enthusiasm about what the model might do next. Only expand scope when the current scope has demonstrated stable performance. The metrics that matter: false negative rate on auto-closed alerts, false positive rate trend direction, alert backlog, and analyst hours reclaimed. Review them over time rather than relying on a single measurement.

3. Codify what the agent learns back into detection rules and runbooks

The long-term value of agent-based workflows comes from turning repeated patterns into durable operational assets. Feed agent findings back into version-controlled detection rules and human-readable runbooks. When an agent consistently closes alerts matching specific attribute patterns, those patterns are candidates for suppression rules that eliminate false positives before they reach the agent.

When an agent consistently follows the same investigation sequence for a class of alerts, formalize that sequence as an explicit runbook. Detection-as-code makes this workflow natural: a new rule derived from agent findings becomes a pull request with tests and CI/CD integration.

Building a SOC Where Agents and Analysts Amplify Each Other

This model works best when humans spend their time on work that genuinely requires human judgment: novel threats, ambiguous signals, decisions tied to organizational context, and actions you can't easily undo.

Agents handle the volume; humans handle the judgment. Orchestration is what connects them, and a detection-as-code pipeline is what compounds it — every investigation, whether an agent or an analyst ran it, feeds the next one.

The teams that get this right won't be the ones with the most autonomous agents. They'll be the ones who design the sharpest boundary between what agents handle and what humans handle, then move that boundary deliberately as evidence accumulates.

Panther follows the same model: specialist agents for speed, human-in-the-loop approval for sensitive actions, and full audit trails so lean teams stay in control of the decisions that matter.

See how Panther can help your team build this balance.