PODCAST

How AI is changing the SOC operating model. Listen now →

Platform

Solutions

Resources

Company

Book a demo

Platform

Solutions

Resources

Company

Book a demo

How AI is changing the SOC operating model. Listen now →

See all blogs

BLOG

Adversarial AI: Attacks, Risks, and Defenses for Security Teams

May 11, 2026

Your detection program covers ATT&CK. Your IAM policies lock down production. Your SIEM catches the usual suspects. Then your engineering team ships a customer-facing LLM feature, an internal RAG assistant, and an AI agent with access to your ticketing system — and suddenly there's an entire attack surface your detection rules don't touch.

Meanwhile, the phishing emails hitting your employees are getting harder to spot, the malware samples your EDR catches are starting to make live API calls to OpenAI, and attackers are stealing Bedrock credentials to offload their compute costs onto your account.

Adversarial AI is a two-sided problem: defending the AI systems you deploy, and defending against AI-enabled attacker tradecraft. This article covers both. We'll walk through the attack categories that matter, how threat actors are already using AI in the wild, and a defensive playbook you can start executing this week without adding headcount.

Key Takeaways:

Adversarial AI is a dual-sided problem. Security teams must defend both against attacks on their own AI/ML systems (evasion, poisoning, prompt injection) and from AI-powered offensive techniques (deepfake phishing, LLMjacking, AI-assisted malware).
Key attack categories for AI/ML systems include evasion attacks, data poisoning, model extraction/inference, and prompt injection. Prompt injection ranks #1 among LLM application risks.
Threat actors are already operationalizing AI at scale. Some malware samples now make direct API calls to LLM services for command-and-control and detection evasion.
Lean teams can build an adversarial AI defense program in phases by starting with logging and identity hardening, then layering in AI-specific detection rules and model-layer guardrails.

What Is Adversarial AI?

Adversarial AI covers two related problems: attacks that target AI systems themselves, and attacks that use AI as a weapon against defenders. Both are active, documented, and accelerating. And both now fall inside the SOC's scope of responsibility.

Those two directions create different defensive jobs for the same team.

Attacks Targeting AI and ML Models

Attacks targeting AI and ML models exploit the statistical, data-driven nature of how these systems make decisions. The attack surface runs from training data pipelines through inference APIs to the model artifacts themselves.

A single adversarial attack on a facial recognition system used by a foreign tax authority caused over $77 million in losses. Google, Amazon, and Tesla have all had machine learning systems compromised in documented cases.

AI Weaponized Against Defenders

Threat actors already use AI to move faster across phishing, social engineering, malware adaptation, and reconnaissance. Active ransomware groups are up 49% year over year, partly because AI automation is lowering the skill barrier for new operators.

As John Hubbard, Cyber Defense Curriculum Lead, SANS, warns, "Attackers are absolutely using AI; they are moving faster. If we don't match their speed and ideally go faster than them now, we're going to be stuck."

Why Security Teams Own Both Sides of This Problem

Security teams need coverage for both sides: securing deployed AI and defending against AI-enabled attacker behavior. These are distinct workstreams with different tooling, different detection rules, and different skill sets. That separation is also codified across Secure, Defend, and Thwart focus areas.

Adversarial ML emulation tooling is lowering the barrier. You don't need an ML PhD on your team to test AI system resilience anymore. What you need is detection coverage for when those tests catch real attackers.

Common Attacks on AI and ML Systems

Four attack categories account for the most operationally relevant threats to your AI/ML deployments. MITRE ATLAS catalogs 16 tactics, with reported counts varying by version for techniques, mitigations, and case studies.

These categories map to different layers of your AI stack, from training data to inference APIs to prompt handling.

Evasion Attacks

Evasion attacks target a deployed model at inference time by crafting inputs that cause misclassification or bypass detection. The adversary doesn't need to modify the model or its training data; they just need query access.

Documented real-world examples include bypassing Cylance's AI malware detection and evading deep learning detectors for malware C&C traffic. No ATLAS case studies were found describing live deepfake image injection to evade mobile KYC verification (such as ProKYC).

Data Poisoning Attacks

Data poisoning corrupts the training pipeline. Attackers either degrade overall model accuracy or implant backdoors that trigger attacker-specified outputs on specific inputs while the model behaves normally otherwise. Research shows that poisoning rates as low as 0.001% of an uncurated training dataset can be effective.

The supply chain variant is especially dangerous. MITRE ATLAS documents multiple Hugging Face compromises and an AI supply chain compromise pattern, while separate sources describe a PyTorch supply chain attack and a rules file backdoor targeting AI coding assistants like Cursor and GitHub Copilot.

Model Extraction and Inference Attacks

Model extraction attacks steal intellectual property or extract private information from deployed models via their APIs, with no access to model internals required.

A related and increasingly common variant is LLMjacking: attackers steal API keys or cloud credentials and use them to access LLM inference APIs, transferring compute costs to the compromised account.

Researchers have tracked campaigns that exploit vulnerable Laravel instances to exfiltrate cloud credentials and abuse LLM services like Anthropic's Claude.

Prompt Injection and Jailbreaking

Prompt injection ranks #1 among LLM application risks, and it's the attack category most likely to bypass a team's existing detection rules.

Indirect prompt injection is the more dangerous variant. Malicious instructions are embedded in external content the LLM processes: documents, emails, Jira tickets, calendar invites. A Copilot zero-click attack called EchoLeak enabled remote, unauthenticated data exfiltration via a single crafted email. There's no evidence of in-the-wild exploitation yet, but the technique exists.

How Adversaries Use AI to Attack Your Organization

Threat actors are integrating AI across the attack lifecycle, from initial access through lateral movement. The main operational patterns are phishing and social engineering, cloud resource abuse, and malware adaptation. Each has a different detection and control footprint, so treat them as distinct workstreams.

AI-Generated Phishing and Social Engineering

AI-generated phishing and social engineering already improve attacker speed and plausibility. AI-enhanced phishing can boost attack profitability up to 50x, and 28% of breaches are initiated through phishing or social engineering, making it one of the leading initial access vectors.

Muddled Libra goes further. The group has used generative AI in social engineering, including cloned executive voices in callback scams.

LLM Hijacking and Cloud Resource Abuse

LLMjacking turns exposed AI credentials into direct cloud cost and access abuse. Threat researchers have documented multi-stage attacks against SageMaker environments that can lead to broader cloud compromise and downstream impacts on ML pipelines in production.

If your team has API keys or cloud credentials scattered across development environments without strict IAM governance, LLMjacking is one of the easier attack paths into your account.

AI-Assisted Malware and Reconnaissance

AI-assisted malware is already showing up in production samples. Documented malware samples have been observed calling OpenAI GPT-3.5-turbo with logged functions for evasion generation and obfuscation. Other reports describe malware that incorporates LLM capabilities, but it's not always clear whether those were production samples or proofs of concept.

Defensive Strategies for Security Teams

The most practical adversarial AI defense program starts with identity and logging, then moves into AI-specific detection rules and guardrails. Building this program doesn't require a massive budget or a 20-person SOC.

Later controls depend on earlier visibility and access control, so start from the top.

1. Harden Identity and Access for AI Services

Identity hardening is your first line of defense against LLMjacking and unauthorized model access.

On AWS, scope IAM least-privilege policies to specific model ARNs and monitor Amazon Bedrock Guardrails for configuration changes. On Azure, apply Azure policy on production OpenAI resources so Entra ID becomes the only access method. On GCP, create dedicated service accounts per AI application and block key creation via Organization Policy.

Do not store LLM API keys in code; prefer a dedicated secrets manager over plaintext storage, including environment variables when possible. Use cloud-native secrets managers with automated rotation and access logging.

2. Centralize AI Telemetry and Logs

You need centralized AI telemetry before you can reliably detect attacks on your AI services or abuse of your AI credentials. AI service logs are often scattered across CloudTrail, Azure Monitor, and Cloud Audit Logs with no centralized view. The minimum viable telemetry for every LLM API call includes: caller identity, model name and version, token count (input and output), source IP, and HTTP status code.

For AWS, enable Bedrock logging to CloudWatch Logs (a single console toggle per account). For Azure, deploy the diagnostic policy across all AI Services resources. For GCP, activate SCC Agent Platform Threat Detection for Vertex AI Agent Engine workloads.

Then centralize those logs somewhere you can actually query them. Panther, a cloud-native SIEM, supports ingestion from S3, CloudWatch, Azure Blob Storage, and GCS with automatic schema inference.

AI triage and AI-assisted rule authoring speed up the work, but the fundamentals still matter. You still need clean log pipelines, structured schemas, and sound detection logic underneath. Stephen Gubenia, Head of Detection Engineering for Threat Response at Cisco Meraki, makes the same point: "AI isn't the silver bullet; you still have to have processes in place, good logging and alerting pipelines, sound detection logic."

Docker faced a similar centralization challenge across AWS, GCP, and Azure; the team tripled ingestion volume while cutting false positives by 85% after consolidating telemetry.

3. Write Detection Rules for AI-Specific Threats

ATT&CK-only detection programs miss ATLAS-specific techniques by default, so you need AI-specific detection rules. You need detection rules for:

Anomalous LLM API token consumption
Prompt injection patterns (phrases like "ignore previous instructions," Unicode invisible character sequences)
Credential access anomalies on AI service API keys
AI agent lateral movement (an AI service role calling IAM APIs or assuming another role)

GCP Security Command Center includes two AI-specific detection rules: one for AI agents investigating their own IAM permissions and another for agents fetching credentials from the metadata service. These map to Excessive Agency patterns, but cover only GCP-hosted agents. You'll still need SIEM-level detection rules for cross-cloud and cross-service coverage.

For custom detection rules, a code-driven approach scales well on lean teams. Panther supports Python rules, SQL, or YAML, and Detection Builder lets you describe an AI-specific attack pattern in plain language to generate a working detection rule with test cases for analyst review. That reduces the authoring burden for novel AI-specific threats where you don't have a template to start from.

4. Apply Guardrails at the Model Layer

Model-layer guardrails reduce prompt injection and unsafe output risk before content reaches users or downstream systems. Cloud-native guardrail services (AWS Bedrock Guardrails, Azure Content Safety, GCP guardrails) provide content filtering, topic denial, and PII redaction at the model layer. Enable these on all production AI endpoints.

For input filtering:

Structurally separate system instructions from user content (never concatenate user input directly into system prompts)
Normalize Unicode before passing to the model
Maintain a regex-based blocklist of known injection patterns

For output filtering, scan all model responses for PII patterns and system prompt content before returning to users.

Apply per-identity rate limits at the API Gateway layer with token-based thresholds, not just global aggregate limits.

Building an Adversarial AI Defense Program with Centralized Detection

An adversarial AI defense program uses the same plays as the rest of your detection program: centralize visibility, harden identity, write detection rules, iterate. What changes is the scope. You now have AI services to monitor, AI credentials to protect, and ATLAS techniques to cover on top of ATT&CK.

Start with AI service logging this week, IAM audits next week, and your first AI-specific detection rules next month.

AI-related vulnerabilities are the fastest-growing cyber risk in 2026, named as the top concern by 87% of security leaders. The organizations that manage this well treat adversarial AI as a detection engineering problem, centralize AI telemetry alongside every other log source, and write code-driven detection rules against the ATLAS threat model.

Panther's detection-as-code framework and Security Data Lake support that workflow by giving lean teams centralized visibility and AI-augmented triage. Those AI-assisted workflows still need analyst review. Judgment calls about organizational context, intent, and risk tolerance don't belong in an LLM.