TL;DR For cost-effective CloudTrail log ingestion, invest in filtering, enrichment, and tuning, alongside adopting a cloud-native SIEM with zero infrastructure management.
Cybersecurity practitioners label log sources as “noisy” when they produce large quantities of logs, and quickly. This label is meant to be negative, representing log sources that you need to manage in order to prevent serious problems like excessive ingestion costs and a threat detection environment that’s clogged with irrelevant information.
This blog will guide you on tackling these problems with AWS CloudTrail logs. You’ll understand the limitations of SIEM infrastructure and licensing that may prevent you from cost-effective ingestion. You’ll also learn practical methods to control costs and improve threat detection: filtering out irrelevant log data and boosting your signal-to-noise ratio with log enrichment and detection-as-code.
AWS CloudTrail logs enable you to identify actions taken within your AWS infrastructure, including who or what performed the action, on which resource, and when. CloudTrail logs provide this information by recording events that occur in the AWS Management Console, AWS Command Line Interface, and AWS SDKs and APIs. The broad scope in logging makes these logs a goldmine of critical security data. Let’s dig into a few examples of the security information that CloudTrail logs report:
This is just a short list of examples that highlight the critical security information reported in CloudTrail logs. In order for you to have visibility into your AWS infrastructure, ingesting CloudTrail logs into your Security Information and Event Management (SIEM) platform is not optional, but a basic necessity.
It is well-known that CloudTrail logs are voluminous and produce large quantities of logs.
Just like popups, ads, and notifications degrade your concentration and productivity, noisy logs overwhelm practitioners and congest the threat detection environment. This leads to well-known issues that increase risk: alert fatigue, false positives, and slow mean time to resolve (MTTR). The March 2023 attack on 3CX’s supply chain is a solid example of how dangerous alert fatigue can be.
Equally problematic is the issue of cost. The sheer volume of AWS CloudTrail logs makes ingestion very expensive, if not cost-prohibitive, which challenges the basic requirement for visibility in threat detection. While more logs always require greater storage capacity, the real culprits driving up spending are SIEM infrastructure and licensing structures that aren’t friendly to high-volume cloud logs.
In order to maintain a baseline of performance, traditional SIEMs require ongoing database management to adjust how logs are indexed based on the log write frequency and infrastructure hardware profile. Despite this management overhead, only a small subset of your data remains in hot storage for rapid search. Most data resides in warm or cold storage that uses fewer resources, but takes longer to query during incident response and threat hunting. When faced with these challenges on top of additional licensing fees for cloud logs, security teams often don’t ingest CloudTrail logs into their SIEM, resorting instead to siloing their log data in an S3 bucket, if at all.
If you are still using a traditional SIEM, migrating to a cloud-based SIEM will have the most significant impact on your ability to cost-effectively ingest AWS logs and gain full system visibility.
Here’s why: a cloud-based SIEM leverages serverless architecture to eliminate infrastructure management; further, log data is saved in a security data lake that uses modern cloud database technology designed to not only handle cloud-scale data, but guarantee query performance. This enables cloud-based SIEM platforms to deliver exactly what modern security teams need:
Short of adopting a cloud-native SIEM, the best way to manage AWS CloudTrail noise and cost is to filter out unwanted data. There are a few ways to do this depending on how your log pipeline is set up.
Within AWS, you can configure CloudTrail using event selectors and advanced event selectors to identify which events you want a trail to log. The AWS docs explain, “for each trail, if the event matches any event selector, the trail processes and logs the event.” Any other events are filtered out.
However, if your log pipeline uses AWS CloudWatch to aggregate and forward logs, you may prefer to use subscription filters to control which logs are sent to your SIEM.
Within your SIEM, you can create filters by log source to ignore—or throw out—log data before it’s analyzed for threats and saved to your data lake. Two common methods are raw data filters and normalized data filters. With both methods, any logs that are dropped during the filtering process should not contribute to your overall ingestion quota; verify pricing with your SIEM vendor.
A raw data filter processes and filters log data before the SIEM normalizes the data according to its schemas. A raw data filter specifies a pattern to match with. If any log matches the pattern, the entire log is ignored and not processed by the SIEM. The available tools for filtering raw data depends on the SIEM you are working with, but these are the two most common:
In contrast, a normalized data filter processes and filters log data after the SIEM normalizes the data. The benefit here is a granular filter that can throw out individual fields—or “keys”, pieces of data within a log—instead of the entire log.
To create a normalized data filter, you’ll typically need the following information:
With both filtering methods, any logs that are dropped during the filtering process should not contribute to your overall ingestion quota; verify pricing with your SIEM vendor.
Filtering starts by determining what information is irrelevant to security. For an example, let’s work with S3 buckets.
There are a variety of actions that you can make on an S3 bucket that gets reported as a CloudTrail data event. The action HeadBucket is useful when you need to see if a bucket exists and if you have permission to access it, but tracking these events is not vital to security. To prevent raw log data that contains HeadBucket events from being processed in your SIEM, create a filter with the regex /\”eventName|”:|”HeadBucket\”.
With an AWS CloudTrail event selector, do the opposite: identify the events that you want your trail to process, such as DeleteBucket.
Alongside filtering, enriching your log data with context and threat intelligence increases the fidelity of your alerts and speeds up investigation and incident response. A classic example is adding information about business assets to log data, like a user-to-hardware mapping. Another example is mapping numeric IDs or error codes to human readable information.
To enrich logs, create a lookup table, a custom data set that you upload to your SIEM and configure to enrich one or more specified log types. The next image shows how this process works, where a lookup table of known bad actors enriches an incoming log by the matching IP address 1.1.1.1.
Your SIEM may also partner with third-party threat intelligence providers for pre-configured log enrichment. For example, GreyNoise is a threat intelligence provider that collects, analyzes, and labels Internet-wide data on IP addresses to identify noise—irrelevant or harmless activity—that saturates security tools. When AWS S3 object-level logging is enabled for a given bucket, GreyNoise can identify S3 operations from known malicious classifications.
Filtering enables you to reduce noise and control cost, and log enrichment improves the fidelity of your alerts. But another essential task is to increase your signal-to-noise ratio by tuning detections.
Detection tuning is the process of tailoring detections so they are optimized for your specific environment. This ensures that detections are specific, informative, and cover relevant security threats, so that you can accurately identify threats and resolve them faster. This approach controls alert fatigue, reduces false positives, and improves two key performance metrics for threat detection and response: mean time to detect (MTTD) and mean time to resolve (MTTR).
But the ability to customize detections varies across SIEMs. It’s well known that legacy SIEMs suffer from inflexible tools that limit the extent to which you can tune and optimize detections for your environment. Platforms that offer detection-as-code (DaC) are making customization and flexibility fundamental to detection development by writing, managing, and deploying detections through code. The goal is to make threat detection consistent, reliable, reusable, and scalable, all while controlling cost and providing the flexibility and customization you need to increase alert fidelity:
Check out the next image to get a sense of what a code-based detection looks like.
The image shows an excerpt from a Python detection for Identity Access Management (IAM), one of Panther’s 500+ pre-built detections. The logic in this rule will trigger an alert when a user successfully assumes a role ARN (roleArn) that’s defined in ASSUME_ROLE_BLOCKLIST, a predefined list of blocklisted roles.
In the code excerpt, notice aws_cloudtrail_sucess (line 12) and deep_get (line 16); these are custom helper functions that encapsulate routine processes to be reused across your detections, one of the benefits of writing detections in code discussed earlier. What’s not shown in this excerpt is the logic that defines what information goes into the alert, tests for the detection, and other ways to customize how and when the alert is triggered. To get a closer look at DaC, including no-code workflows, check out how to create a code-based detection.
To summarize, detection-as-code is all about efficiency and reliability; it gives security practitioners the flexibility to optimize their detections and the agility to stay on top of threats in a dynamic cybersecurity landscape.
Traditional SIEMs struggle with the demands of cloud workloads, often compromising on timeliness and cost. Whether you control cost and noise with filtering, or boost your signal-to-noise ratio with detection-as-code, choose a SIEM that is built to handle cloud-scale data, without compromise.
For a comprehensive review on managing AWS logs, read the ebook Keep AWS Logs from Running Wild by Putting Panther in Charge. Panther is a cloud-native SIEM that empowers modern security teams with real-time threat detection, log aggregation, incident response, and continuous compliance, at cloud-scale.
Ready to try Panther? Get started by requesting a demo.