NEW

Panther joins Databricks to build the future of the security lakehouse. Read more →

Platform

Solutions

Resources

Company

Book a demo

Platform

Solutions

Resources

Company

Book a demo

Panther joins Databricks to build the future of the security lakehouse. Read more →

See all blogs

BLOG

AI Incident Response: Where Agents Help and Where Analysts Still Lead

Michelle

Dufty

Jun 14, 2026

Every SOC running AI agents in production hits the same question eventually: which parts of incident response can agents actually own, and which parts blow up the moment you hand them over?

The volume pressure is real. Most teams can't keep pace with incoming alerts, and LLM-assisted triage cuts ticket completion time by roughly 40% in tested workflows. Those numbers are pulling security leaders toward broader automation.

But the teams getting it right aren't automating broadly. They're drawing clear lines between work agents can absorb (alert triage, log correlation, incident summaries) and work that still requires analyst judgment (novel scoping, legal calls, adversarial reasoning).

This article walks through where AI agents reliably help during an incident, where analysts still need to lead, the predictable failure modes when agents touch IR, and how to build a workflow that gets the division right.

Key Takeaways:

AI agents have shown promising results in alert triage, log correlation, and remediation recommendations.
Human analysts remain essential for scoping novel incidents, making legal and reputational calls, adversarial reasoning, and translating lessons learned into better detection rules.
Predictable failure modes (hallucinated conclusions, siloed data, cascading automation) require explicit governance: permission tiers, circuit breakers, and human escalation paths.
Building an AI-augmented IR workflow starts with telemetry depth and data quality, not the AI layer itself.

Where AI Fits Into the Incident Response Lifecycle

Before mapping where agents help and where analysts lead, it's worth scoping what this article covers. The focus here is on using AI agents inside SOC workflows: speeding up detection, triage, investigation, containment, and remediation, with humans still designing the procedural logic. This is force multiplication. Agents operate continuously, without the staffing constraints that prevent small teams from maintaining 24/7 coverage.

A separate problem, responding to incidents where AI systems are the target or vector (prompt injection, model poisoning, adversarial inputs, model theft), is covered by frameworks like MITRE ATLAS and is out of scope here.

Where AI Agents Reliably Help During an Incident

AI agents help most when the work is data-heavy, repeatable, and high-volume. In practice, that usually means tasks where the evidence is structured, the workflow is familiar, and the cost of delay is higher than the cost of drafting a first pass.

Alert triage and enrichment at machine speed

Alert triage is where AI agents deliver the most measurable ROI. LLM-assisted triage reduces ticket completion time, with the most pronounced gains on tickets requiring cross-tool investigation. So, use AI for bounded, well-defined triage tasks where the evidence is structured and the workflow is familiar. Keep analysts on higher-consequence decisions that require organizational context.

Log correlation across high-volume telemetry

AI agents correlate logs across disparate cloud telemetry faster than any analyst can manually. The value compounds when correlation spans identity, network, and workload sources in a single query, surfacing a credential anomaly in Okta alongside the unusual API call it enabled in AWS without an analyst pivoting between consoles.

Drafting incident summaries, timelines, and stakeholder updates

Drafting incident documentation is the highest-confidence AI use case in SOC workflows. Letting LLMs handle first drafts of incident reports delivers significant time savings, with analysts reviewing and editing rather than writing from scratch. AI surfaces indicators, suggests next steps, and summarizes findings. Then, practitioners review and approve rather than generating from scratch.

Recommending remediation paths from historical case data

AI remediation recommendations work best when they draw on your organization's historical incident data, threat intelligence feeds, and MITRE ATT&CK mappings. When a new phishing campaign hits, the agent surfaces how your team contained similar incidents last quarter and which containment steps resolved fastest.

Where Analysts Still Lead

Human analysts still lead when the work depends on judgment, context, and reasoning beyond known patterns. The four areas below cover where that line falls in practice.

Scoping novel incidents that don't match prior patterns

Novel incident scoping still depends on analyst judgment. When attackers operate through legitimate tools and credentials, no pattern-matching engine can fully scope the intrusion. Pattern-matching systems struggle to generalize under rapid environmental change, especially when signals are sparse, noisy, or incomplete.

Scoping such an intrusion requires an analyst who can reason about what a human attacker would have done given their access level and dwell time: forward-looking, hypothetical reasoning that pattern-matching can't replicate.

Decisions with organizational, legal, or reputational weight

Decisions around breach notification, evidence preservation, and public disclosure carry liability and reputational consequences no AI agent can be held accountable for. Determining which jurisdictions apply and when the notification clock started requires organizational context no AI agent has.

Adversarial reasoning and threat hunting beyond the known

Agentic AI can carry a hunting hypothesis across tools, pull evidence autonomously, and surface patterns across millions of events. What it can't do is reason about a specific adversary's intent against your specific environment, the part of threat hunting that depends on human curiosity and organizational context.

As Brandon Kovitz, Senior Manager of Detection Response at Outreach, puts it, "The human understanding of intent is something that AI is never going to replace."

Codifying lessons learned back into detection rules

Detection engineering remains a human-led task because writing a detection rule requires deciding what to detect and why that behavior is meaningful. When this work gets skipped, out-of-the-box AI/ML tooling reflects generic assumptions instead of your environment.

Failure Modes to Plan For When Agents Touch Incident Response

AI agents introduce predictable failure modes in incident response, and you can plan for them in advance. Bad output in isolation is the surface risk. The bigger problem is how weak reasoning, partial evidence, or excessive permissions propagate through a live workflow.

Hallucinated conclusions and overconfident triage

LLMs can produce confident, false conclusions during active investigations, a failure mode formally called confabulation, where the model produces false content that can contradict its own prior analysis. Hallucinations and other inaccurate outputs require human review before action, even when the model sounds confident.

In one documented case, an LLM misrepresented a commit ID as a SHA1 hash IOC, sending analysts down an incorrect investigative path.

Missing context from siloed data and shallow log retention

AI agents produce weaker conclusions when they reason over incomplete evidence. An AI investigation often depends on chained queries across SIEM, EDR, CTI, and asset context. When one link is missing, the agent can still return an answer from partial evidence, just a less reliable one.

Data supply chain vulnerabilities, maliciously modified data, and data drift all degrade AI system accuracy. Data integrity is the foundation under every output your agent produces.

Cascading automation without circuit breakers

Automated systems without staged rollback controls can turn a single error into a broad outage. On July 19, 2024, a faulty Falcon sensor content update affected Windows systems at a global scale. A single automated change propagated without staged rollback controls in place to catch it.

For AI agents with remediation authority, the risk is sharper. A traditional automation typo is deterministic and traceable, but an AI agent "evaluates context, makes a probability decision, and executes it", and failures can occur in milliseconds.

Building an AI-Augmented Incident Response Workflow

An AI-augmented incident response workflow depends on process and data foundations before automation. The order matters because agents amplify whatever inputs, permissions, and review controls you already have.

1. Establishing telemetry depth as the foundation for agent reasoning

Telemetry depth sets the ceiling for agent reasoning quality. If your logs don't capture an action, your agent can't reason about it. And the gaps a skilled analyst might recognize and compensate for will silently degrade AI outputs.

You need coverage across control plane, identity, network, and workload telemetry. Without that foundation, you're automating garbage in, garbage out, at accelerated speed.

2. Defining agent boundaries, tool permissions, and human escalation points

Agent boundaries determine whether automation stays useful and safe. Agentic AI reasons through a problem and acts on it, creating a different trust surface than scripted SOAR playbooks. Tier your agent permissions. Read-only and enrichment tasks can run without pre-approval: pulling threat intel, correlating logs, enriching alerts.

Reversible containment actions, like isolating a host or disabling a session, should be weighed against false positive risk and business impact, with incident handlers notified the moment they execute. Irreversible or high-impact actions, such as wiping endpoints, revoking credentials at scale, or modifying production policy, require explicit human approval before execution.

That matches the operating model Matt Muller, Field CISO at Tines, argues for: "AI assisted humans are going to be the ones who are most successful. AI with guard rails is going to be, I think, the path forward for the foreseeable future."

3. Mapping agent actions to the IR lifecycle

Mapping agent actions to the incident response lifecycle keeps automation constrained to the phases where it performs well. The PICERL framework (Preparation, Identification, Containment, Eradication, Recovery, Lessons Learned) remains the operational backbone. AI agents fit cleanly into Identification (alert normalization, correlation, enrichment) and portions of Containment (executing reversible actions on high-confidence alerts). Preparation and Lessons Learned remain human-led.

4. Measuring agent performance with the same rigor as analyst performance

Agent measurement should use the same rigor you apply to analyst measurement. Track the same core incident metrics for agents and analysts: mean time to detect, mean time to contain, false positive rate. Then add agent-specific metrics: auto-close accuracy (validated through weekly human review), escalation precision, and agent action reversal rate.

Those three numbers tell you whether your agents are earning the trust you've given them, or whether they're quietly accumulating errors that will surface during your next real incident.

How Panther Supports AI Incident Response

Panther approaches AI incident response by connecting telemetry, detection logic, and review controls in one workflow. Panther is a complete AI SOC platform and security analytics platform. Its closed-loop architecture connects a Security Data Lake (on your own Snowflake or Databricks instance), a detection-as-code engine with version control and CI/CD, and Panther AI with access to detection logic, the security data lake, and organizational context, so confirmed investigation outcomes feed back into the detection rules that fired them.

In practice, Panther's approach centers on three controls:

AI SOC analyst investigates alerts by pulling context across the security data lake and connected tools via MCP.
AI: Human in the Loop Tool Approval pauses high-stakes actions for explicit analyst approval, with all decisions logged in audit trails.
AI Detection Builder turns a plain-language behavior description into a production-ready rule with test cases.

Panther emphasizes flexible data retention and storage options, including deployments in customer-managed Snowflake environments. Cockroach Labs hit this constraint with their legacy SIEM, which forced retention down from 90 to 30 days. After moving to Panther, they ingested 5x more logs while cutting SecOps costs by $200K+, with 365 days of hot storage: exactly the telemetry foundation Pillar 1 describes.

Designing a SOC Where Agents and Analysts Each Do What They're Best At

The division of labor between AI agents and human analysts is durable. Novel incidents, legal decisions, adversarial reasoning, and detection engineering feedback all operate above the pattern-matching boundary by definition. The goal is a workflow where agents handle data-heavy, repeatable work at machine speed, and analysts focus their judgment where it actually changes outcomes.

Panther's architecture supports this split by giving agents deep telemetry access while keeping analysts in control of high-stakes decisions.

Get both sides right: measure agent and analyst performance with equal rigor, plan for the failure modes documented above, and build on telemetry foundations deep enough to support reliable AI reasoning. The teams that do will run faster and miss less.