How AI is changing the SOC operating model. Listen now →

close

How AI is changing the SOC operating model. Listen now →

close

BLOG

Agentic AI Security Risks: What Changes When AI Can Take Action

Your team deployed an AI agent two weeks ago to handle routine cloud operations: reading logs, calling APIs, modifying configurations, escalating tickets. No one clicks "approve" on any of it. This morning, you notice the agent touched a production database it was never scoped to access. Valid credentials, authorized API calls, nothing in your SIEM flagged it.

That's the pattern attackers are exploiting. The agent wasn't compromised in the traditional sense. It did what it was asked to do, with permissions it technically had, across systems that each logged their piece but stitched none of it together. This is no longer theoretical: attackers are targeting agent identities, orchestration layers, and supply chains, not just model outputs.

This article covers what makes agentic AI security different from generative AI security, the attack surface that opens up when models can take action, the risks already showing up in production, and the identity, logging, and detection controls that actually work for autonomous agents.

Key Takeaways:

  • Agentic AI creates fundamentally different risks because agents act autonomously, persist memory across sessions, and chain operations across systems without human approval.

  • The new attack surface includes exploitable MCP servers, tool poisoning, memory injection, and inter-agent message manipulation.

  • Traditional IAM, detection logic, and logging weren't built for autonomous decision-makers and miss the reasoning chains behind agent actions.

  • Defense starts with centralized agent logging, scoped agent identities, human-in-the-loop controls for high-impact actions, and detection rules targeting agent abuse patterns.

What Sets Agentic AI Apart From Generative AI

Agentic AI changes your threat model because agents can act, persist context, and operate across connected systems. That difference shows up in three places: action-taking authority, persistent memory, and multi-step autonomy.

Action-taking authority replaces passive output

This is the first security change that matters, and probably the most important one. Agents call APIs, write to databases, modify files, and send emails autonomously through tool use. The practical difference is straightforward. A successful prompt injection against a chatbot produces a bad answer.

The same attack against an agent can execute commands in connected systems.

Persistent identity and memory across sessions

Compromise lasts longer when memory does. Agents maintain short-term working memory, long-term memory stores, and persistent credentials across sessions, turning memory into a durable attack surface. A single corrupted memory entry can persist across sessions and propagate to other agents, compounding errors over time.

It's enough of a problem that agent memory and context manipulation is now treated as its own vulnerability class. Tool orchestration and memory management are the two areas where multi-agent attacks concentrate, because both are shared surfaces that any agent in the cluster can read from or write to.

Multi-step autonomy across connected systems

When agents chain steps, one compromised step doesn't stay contained. Agents break goals into subtasks, call tools in sequence, spawn sub-agents, and adapt based on intermediate results, all without human checkpoints. As integrations grow, a compromised step can cascade through interconnected systems.

Traditional threat modeling methods like STRIDE and PASTA were built for systems with fixed components and predictable data flows. Multi-agent systems break both assumptions: components spawn dynamically, and decisions emerge from interactions between agents rather than from a single execution path.

Existing frameworks are useful starting points, but the threat modeling work itself has to be done against your specific agent topology.

The New Attack Surface When AI Can Act

When AI can act, the attack surface shifts into the systems that give agents autonomy. Two surfaces are where exposure appears first: the tools and protocols agents call, and the data and messages they treat as trusted context.

Tools, APIs, and MCP servers as exploitable entry points

Tools and MCP servers create direct trust boundaries that attackers can target. MCP (Model Context Protocol) servers connect agents to tools, APIs, and data sources. Each server is an independently attackable trust boundary. Multiple critical CVEs have already landed, including CVE-2025-49596 and CVE-2025-59528. The attack surface has expanded to include attacks like prompt injection that target the agent layer directly, not just the underlying model.

Tool poisoning exploits how agents process tool descriptions. A malicious tool description can influence or override how the model uses other trusted tools, even ones the agent has used safely before.

Memory, context, and inter-agent messages as injection vectors

Memory and message channels let attacker instructions blend into normal context. Indirect prompt injection works because agents can't tell the difference between legitimate data and embedded attacker instructions in the content they retrieve. The risk is highest when an agent combines private data access, untrusted content, and external communication paths.

The first real-world instance of malicious indirect prompt injection was documented against an AI-based ad review system. Spoofed inter-agent messages can misdirect entire agent clusters, because most current frameworks don't enforce separation between them.

The Most Pressing Agentic AI Security Risks Today

The most urgent risks are already showing up in deployed systems. The list below groups the main abuse patterns by operational impact so you can prioritize controls and detection coverage.

1. Prompt injection that triggers real-world actions

Unlike prompt injection against a chatbot, prompt injection against an agent doesn't just produce a bad answer, it can trigger writes, API calls, or external requests in connected systems. CVE-2026-21520 is an information disclosure vulnerability in Microsoft Copilot Studio that allows an unauthenticated attacker to view sensitive information over a network.

GitHub Copilot (CVE-2025-53773, CVSS v3 7.8) was confirmed vulnerable to RCE via indirect prompt injection.

2. Excessive agency and over-permissioned agents

Excessive permissions turn agent mistakes or manipulation into broader system impact. There's a name for it: excessive agency, agents with too much functionality, too many permissions, or too much autonomy, where manipulated outputs trigger damaging real-world actions. Hardcoded shared API credentials and overbroad agent privileges in production deployments turn a single compromise into a much larger one.

The broader problem is privilege escalation through scope creep, every additional permission an agent picks up over time becomes a permission an attacker inherits if the agent is compromised.

3. Identity sprawl from non-human agent accounts

Non-human agent identities create a governance problem at scale. AI agents authenticate with API keys, service accounts, and OAuth tokens, and there are more of them every month than your IAM team can keep track of.

As agent deployments scale, every new agent adds another set of long-lived credentials, another permission scope to audit, and another identity that doesn't fit cleanly into human-centric IAM workflows.

4. Supply chain risks in agent frameworks and MCP servers

Framework and MCP supply chains can expose agents to classic software weaknesses through autonomous execution paths. The frameworks and servers powering agent toolchains carry the same classic vulnerabilities, path traversal, command injection, code injection, but now they're exposed to autonomous callers that won't pause to question suspicious behavior.

Multiple critical CVEs were disclosed within months of MCP's introduction, and the supply chain and execution layers remain the most active areas of exposure.

5. Memory poisoning and context manipulation

Memory poisoning lets an attacker plant a single instruction that silently influences agent behavior across future sessions, without needing to maintain continuous access.

Some agent platforms automatically inject memory contents into every new session's context, meaning memory poisoned in one session causes malicious behavior to execute silently in subsequent sessions. In sleeper scenarios, the compromise doesn't surface for weeks.

Why Traditional Security Controls Fall Short

Traditional controls miss agent abuse because they were built around human users and fixed automation. The mismatch shows up first in three places: identity, detection logic, and auditability.

IAM wasn't built for autonomous decision-makers

IAM was built for two kinds of identities: humans, who authenticate once and stay relatively static, and service accounts, which run fixed code paths with fixed permissions. Agents are neither. Their access needs emerge at runtime based on task context, and they shift across systems within a single execution.

Coarse-grained role assignments, single-entity accounts, and limited non-human identity coverage all break under that model.

Detection logic struggles to capture intent, not just behavior

Authorized actions are not enough to tell you whether an agent acted within scope. SIEM detection asks whether an action is anomalous for an identity. For agents, the real question is whether the action matches the agent's authorized intent, and intent isn't a field in any log format.

Researchers documented a case where a malicious agent exploited an established cross-agent session to deliver covert instructions. All traffic was authenticated, all API calls authorized, and no individual action was anomalous. Traditional detection would classify the entire attack as normal.

Logging gaps hide agent reasoning chains

API-level logs do not capture the reasoning path behind agent actions. When an agent touches multiple SaaS apps and cloud services in a single task chain, each system logs independently, with no way to stitch events into a causally linked trail. Without a record of inputs, outputs, and the decision chain that connected them, you can't answer the most important post-incident question: why did the agent do that?

How to Detect and Defend Against Agentic AI Threats

You need visibility first, then controls that match how agents actually operate. The steps below move from log collection to identity, approvals, and detection rules that focus on agent-specific abuse patterns.

1. Centralize agent activity logs in your SIEM

Centralized logging is the foundation for every other control in this article. You can't detect what you can't see. At minimum, you need: every agent decision, every tool call and its outcome, token usage per session, and a full audit trail. That's the floor for any meaningful detection or post-incident review. None of this works without the underlying detection and alerting pipelines doing their job first.

Start with tool_call and session_id events before you try behavioral baselining. Panther supports this with 60+ native connectors and a direct HTTP ingestion API, so you can pipe agent activity logs into the same data lake as your cloud and identity telemetry.

Docker case study shows a similar challenge scaling cloud log ingestion across multiple providers. By implementing detection-as-code workflows, they reduced false positives by 85% while tripling log ingestion and improving cross-cloud visibility. That same foundation, centralized, queryable logs, is the prerequisite for every detection strategy that follows.

2. Treat agents as first-class identities, not service accounts

Agents need their own identities and scopes. Each agent should have scoped permissions, tied to the specific tasks it's authorized to perform. Think in two dimensions: access TO the agent (who can configure or modify it) and access BY the agent (which systems it can reach during execution).

Use short-lived certificates or workload identity federation rather than long-lived API keys. Start every agent at the most restrictive autonomy level, human approval required on every action, and require documented governance sign-off before expanding scope.

There's a four-scope model you can point at if you need a formal structure.

3. Keep humans in the loop for high-impact actions

Human approval is the safest default for the actions that can cause the most damage. For a lean team, the most defensible starting point is tool-level human-in-the-loop controls on three categories: any outbound network call to an external endpoint, any write or delete on production data, and any credential or permission change. 

These map to the highest-severity stages of agent abuse. Successful security operations rely on humans assisted by AI, operating within defined guardrails, not AI acting unsupervised on high-impact decisions. Human in the Loop Tool Approval pauses AI execution before sensitive actions and logs every approval decision for audit purposes.

4. Write detection rules for agent abuse patterns

Agent abuse creates patterns you can detect if you log the right events. Once an agent's tool invocation is compromised, the attack expands wherever those tools reach: connected SaaS apps, downstream APIs, sub-agents that inherit the same context. The blast radius of a single bad tool call is rarely contained to one system.

There are documented case studies of this pattern. Watch for anomalous tool call sequences (like a file read immediately followed by an external HTTP POST), first-time tool invocations, and recursive behavior where agents spawn sub-agents or enter repetitive loops.

Indirect prompt injection is hard to catch with input-layer filters because the malicious instruction is embedded inside legitimate-looking content the agent retrieves. The reliable signal shows up later, in what the agent does with that content.

Detection logic should run on tool output events, not just tool calls. Writing those rules in Python or YAML through Panther docs means they're version-controlled, testable in CI/CD, and auditable.

Securing Agentic AI Starts With Visibility Into What Agents Actually Do

Visibility is the starting point for securing agentic AI in production. Centralize agent activity logs. Give every agent a scoped identity. Keep humans in the loop where it counts. Write detection rules that target agent-specific abuse patterns instead of generic anomalies.

Panther's approach maps to that need: centralizing agent and cloud activity in a Security Data Lake and writing detection-as-code rules. That puts agent activity, identity events, and tool calls in one queryable place, with detection logic you can write, test, and version.


See it in action

Most AI closes the alert. Panther closes the loop.

Share:

Bolt-on AI closes alerts. Panther closes the loop.

See how Panther compounds intelligence across the SOC.

Bolt-on AI closes alerts. Panther closes the loop.

See how Panther compounds intelligence across the SOC.

Bolt-on AI closes alerts. Panther closes the loop.

See how Panther compounds intelligence across the SOC.

Bolt-on AI closes alerts. Panther closes the loop.

See how Panther compounds intelligence across the SOC.

Get product updates, webinars, and news

By submitting this form, you acknowledge and agree that Panther will process your personal information in accordance with the Privacy Policy.

Get product updates, webinars, and news

By submitting this form, you acknowledge and agree that Panther will process your personal information in accordance with the Privacy Policy.

Get product updates, webinars, and news

By submitting this form, you acknowledge and agree that Panther will process your personal information in accordance with the Privacy Policy.