NEW

Panther joins Databricks to build the future of the security lakehouse. Read more →

Platform

Solutions

Resources

Company

Book a demo

Platform

Solutions

Resources

Company

Book a demo

Panther joins Databricks to build the future of the security lakehouse. Read more →

See all blogs

BLOG

Cloud Infrastructure Monitoring: Best Practices for Modern SOC Teams

How to Build AWS, GCP, and Azure Coverage That Catches Evasion and Drift Without Noise

Katie

Campisi

Jun 5, 2026

You can have CloudTrail fully enabled and still miss events that matter. It captures every API call in the account, so the things worth catching – a stopped trail, activity in a region you don't operate in, an IAM change that skipped your pipeline – sit buried under millions of routine calls. Too often, no one notices the one that should have been escalated until it surfaces during an incident review.

Why CloudTrail Becomes the Backbone of Cloud Detection

Nearly every meaningful action in an AWS account lands in CloudTrail. Console logins, IAM changes, API calls, resource creation, network rule edits. That makes it the single richest source you have for understanding what happened in your cloud, and it's table stakes for almost every team running on AWS.

The honest version of the multi-cloud story is that most teams are AWS-primary. GCP and Azure coverage tends to follow actual workloads rather than the org chart, so it's worth instrumenting the equivalents (Cloud Audit Logs in GCP, Activity and Entra ID logs in Azure) where you genuinely run infrastructure, and not pretending you have parity before you do. Coverage you don't trust is worse than coverage you know is partial.

The harder problem is that CloudTrail volume is enormous and most of it is routine. Collecting it isn't the same as detecting on it. The detections that matter are the ones that catch an attacker trying to operate quietly: tampering with logging, working in regions you don't use, drifting your IAM away from its known state, and changing what's exposed to the network. Those are low-volume, high-confidence signals that get lost if your cloud coverage is just a firehose of API calls.

The Signals You Shouldn’t Ignore

Logging tampering and evasion. This is the highest-priority cloud signal because it's what an attacker does to buy time. A StopLogging call, a deleted or modified trail, or a disabled GuardDuty detector should fire immediately. Pair it with what happens next: logging stopped followed by a resource being made public is a different event than a single config change in isolation.
Activity in unused regions. Attackers spin up resources where you aren't looking, often for crypto mining or to stage access away from your monitored regions. If your workloads run in three regions, API activity in a fourth is worth a look regardless of what the call is. This is one of the most reliable cloud signals precisely because it's so cheap to define against your own footprint.
IAM drift. Most IAM changes in a mature environment should come through your infrastructure-as-code pipeline. A role, policy, or access key created or modified outside that path, by a human identity rather than your deploy automation, is drift worth investigating. New access key creation for a user that normally only acts through a CI/CD role is a classic precursor to persistence.
Network exposure changes. A security group opened to 0.0.0.0/0, an RDS instance with an egress rule to the entire internet, or an S3 bucket flipped to public are the changes that turn a foothold into an exfiltration path. These map cleanly to outcomes, so they make good high-confidence detections rather than noisy config alerts.
External recon against your edge. Port scans and HTTP anomalies that target known CVEs are the early, pre-exploitation signal. On their own they're common background noise, but correlated with a subsequent config change or login they help you catch an attack while it's still at the perimeter.

Why Cloud Detection Needs Identity Context

Single-source cloud detection has a ceiling. CloudTrail tells you what API call was made and which principal made it. It doesn't tell you whether that principal should have been making it, or what the human behind it did to get there.

Here's what that looks like in practice. Three operations land in three different log sources within ninety minutes. On an endpoint, a credential-dumping tool reads from LSASS. In your identity provider, someone resets another user's MFA and assigns a Global Administrator role to a service account. In CloudTrail, an assumed role suspends versioning on your backup bucket and deletes EBS snapshots, all without MFA. Any one of those is a moderate alert that could sit in a queue. Together, they're a confirmed ransomware precursor chain, and only correlation on the shared identity tells that story.

The reason this is hard for most tools is that they treat each log source independently, and cross-source correlation either requires brittle multi-source rules or a place where the data already sits together. In Panther, CloudTrail, your identity provider, and your EDR feed into one security data lake. A detection written as Python can query CloudTrail and Okta in the same rule, correlate on the principal and the human behind it, and surface the full chain as a single alert with the evidence attached. Panther AI triages the related alerts together, connects them on the shared identity and IP, and lays out the reasoning before anyone manually pivots between the AWS console, Okta, and your pipeline logs.

Snyk is a good example of what happens when cloud coverage and noise reduction move together rather than against each other. By building detections as code and consolidating their sources, they expanded infrastructure coverage while reducing the alert load their team had to work through.

Building Cloud Coverage That Doesn't Bury You

The failure mode for cloud monitoring isn't missing data; it's generating so many low-value config alerts that the team learns to ignore the channel. Alert fatigue is the single most common thing we hear from teams as they add cloud sources, because volume compounds with every new account and detector. A few principles hold up:

Encode your environment's normal once, and apply it everywhere. Approved regions, your IaC deploy identities, service accounts that operate at odd hours by design, and maintenance windows. Panther's Organization Profiles hold this context, so every cloud detection and every triage decision uses it, instead of it living in your senior engineer's head.
Scope detections to drift and exposure. The signal worth catching is an IAM change that bypassed your pipeline, or a config edit that opened a resource to the world. Start from the behavior you want to detect, then write the rule backward from there.
Enrich at detection time. A cloud alert should already carry the principal's recent activity, whether the identity is human or automation, and whether the region or IP has been seen before. Context gathered after the alert fires is what turns a five-minute triage into a forty-minute one.
Route CSPM and posture findings into the same place as detections. Many teams already run Wiz or a similar tool and send its findings into Panther, so cloud posture, identity, and infrastructure activity get triaged in one queue rather than another standalone inbox. One place to investigate beats one more tab to check.

This is also where cost and coverage stop being a trade-off. Docker onboarded Okta and CloudTrail as their first sources, tuned detections with Python-based rules, and cut their false positive rate by 85% year over year while tripling ingestion, without adding headcount. Tealium saw a similar pattern, reducing alert volume by 80 to 90% and bringing detection build time down from four or five hours to roughly ten minutes, which let them monitor close to 30% more log sources.

That's the compounding part. Every cloud detection you tune, and every triage outcome you confirm, feeds back into the logic, so adding a fourth and fifth source sharpens your coverage rather than multiplying your noise. Cloud infrastructure is where most of your environment now lives. Getting the detection coverage right and keeping it consistent enough to trust is what makes the rest of your program hold up.

Want to see how Panther handles cloud infrastructure monitoring in your environment? Book a demo or explore how teams are building detection coverage across CloudTrail, identity, and beyond.

What CloudTrail events should I be monitoring for security?

The highest-value events are logging tampering (StopLogging, trail deletion or modification, disabled GuardDuty detectors), IAM changes made outside your infrastructure-as-code pipeline, network exposure changes (security groups opened to 0.0.0.0/0, public S3 buckets, broad RDS egress rules), root and console activity, and API calls in regions you don't normally use. Monitor patterns and context, not raw call volume. A single IAM change is routine; an IAM change from a human identity that bypasses your deploy automation is worth investigating.

How do I monitor AWS, GCP, and Azure together without parity gaps?

Start where your workloads actually run, which for most teams is AWS and CloudTrail. Instrument the equivalents where you have real infrastructure: Cloud Audit Logs in GCP, Activity logs and Entra ID in Azure. Avoid claiming coverage you can't trust. A consistent detection layer across the clouds you genuinely use is more useful than shallow collection everywhere. Bringing the sources into one data lake lets you write detections once and correlate activity across providers and identity.

How do I detect an attacker disabling CloudTrail logging?

Alert immediately on StopLogging, trail deletion, and trail configuration changes, then enrich with who made the change and what followed it. Logging being stopped by your IaC pipeline during a planned change is expected. The same call from a human identity, especially one that recently created a new access key or logged in from an unusual location, is an anti-forensics signal. Correlating the logging change with the surrounding identity and config activity is what separates a real evasion attempt from routine maintenance.

How do I reduce false positives in cloud detection without missing real threats?

Baseline your environment and build that baseline into your detection logic. Define approved regions, your deploy identities, and service accounts that behave unusually by design, then write rules that fire on deviations from your normal rather than on generic config changes. Scope detections to drift and exposure outcomes instead of every event, and enrich alerts with principal and history at detection time so triage is faster and more consistent. Closing the loop, where confirmed triage outcomes feed back into the rules, is what keeps the false positive rate down as you add sources.

What cloud detections do teams build first vs. discover later?

Teams usually start with logging tampering, root activity, and obvious network exposure changes. What many discover after deployment is that IAM drift and unused-region activity catch a class of quiet, persistence-oriented behavior the early detections miss, and that the most useful cloud detections depend on identity context the cloud logs alone don't carry. Correlating CloudTrail with your identity provider and your deploy pipeline tends to surface attack chains that single-source cloud monitoring leaves invisible.