Welcome to the 3rd and final part of our series on managing pipelines and infrastructure for high-fidelity detections. Part 1 outlined strategic considerations, and Part 2 drilled into detailed functional requirements. Here we wrap up with common pitfalls and how to avoid them. We also recommend checking out this on-demand webinar to see the full series in action.
So you’ve reviewed strategic considerations for your project in part 1 and specific pipeline functionality and processes in part 2. Or maybe you’re the type who skips right to the ending… Either way, learning all the ways pipelines can fail will help keep your project on the rails.
Whether you’re just beginning to implement your strategy, or you’re mid-implementation and starting to experience challenges, let’s review how to avoid common pitfalls so your project stays on track.
False positives, alert storms, team fatigue, burnout… sound familiar? If you haven’t experienced this directly, you’ve probably heard stories from the trenches. These issues creep in when you’re not being choosy enough about the data you’re ingesting.
Data quality is a key prerequisite for high fidelity detections and alerts. It’s a lot easier to ensure quality when the data set is focused on high value logs. You could have flawless detection rules with elegant logic perfectly aligned to your use cases, plus an army of expert threat hunters investigating your data. It won’t matter if your data set is so large that detections and search queries take forever.
So how do you address the needle in a haystack problem? Collect less hay.
In practical terms, this could mean simply not collecting logs generated by uncritical systems that have little to no security value. Or it could mean parsing high volume logs for noisy components, filtering them out, and preserving the high value components. You might focus on collecting only write and data transfer events (and filtering out the read events), rejected network traffic (and dropping expected, usual traffic), and logs from production environments (and filtering test and development).
Don’t go overboard on the filtering though, or you’ll run into another pitfall…
The flip side of the above is limiting your visibility into potential threats by not ingesting enough. If your team has PTSD from alert storms creating an onslaught of false positives, bringing in very high volume logs like AWS CloudTrails, VPC Flow, or Guard Duty may give you pause. But these logs contain critical information that help pinpoint high-severity attacks.
It’s not just about ingesting enough data, it’s about ingesting enough of the right data.
The goldilocks scenario of having just the right volumes from your most valuable logs can be achieved by designing pipeline functionality to support your use cases.
So how do you get to this goldilocks scenario? If you read part 2, you already know. If you skipped ahead, well… you really should read it. Lucky for you, the relevant parts are right here:
If you’re not validating that your security pipelines work as expected before they’re in production, you’re risking failures that could limit visibility and severely weaken your security posture. Data getting routed to the wrong destination, potentially blowing up your budget… Transformation errors causing your detections to fail… Filters breaking down and flooding the system with noisy logs and alert storms… Not ideal.
Testing and QA are the security engineering equivalents of never skipping leg day. They’re not fun or sexy but pay consistent dividends throughout your entire SecOps program.
So get that squat rack ready: stand up a non-production environment to test your pipeline functionality works as designed. Send a variety of sample logs through and validate your routing, filters, and transformations are functioning properly. See how they impact your detection efficacy and alert fidelity. When something breaks, conduct root cause analyses and fix the underlying issues.
There’s detection-as-code, infrastructure-as-code… why not ingestion-as-code too? Treat your pipelines like high-performance software, and carry that over to all other aspects of your SecOps program while you’re at it.
If you’re using the same security infrastructure across all logs, regardless of their relative value, you’re likely spending way more on compute and storage than you actually need. Without prioritizing logs and diverting different categories for separate use cases, you’ll end up not tapping into all the computing power and storage you’ve purchased.
Luckily this pitfall is less prevalent with teams adopting more flexible strategies. However, it’s still a symptom of the monolithic SIEM philosophy that encourages kitchen-sink log ingestion and one-size-fits-all approaches to compute and storage.
First step: break down your tech stack into modular components. Then decouple the processes and supporting technical functionality required for data collection, analytical processing, and storage.
Revisit your threat models and log priorities and categorize your logs into high, moderate, and low value buckets. These categories align to real-time analysis in hot storage, near-term correlation analysis in warm storage, and historical analysis and compliance reporting in cold storage. Operationalize these 3 workflows using a combination of routing, filtering, and transformation processes based on your log formats and detection logic.
So you don’t skip leg day anymore, great. But after a few months you might wonder, are you mixing it up enough to challenge yourself? Not just the standard back squats and deadlifts, but front, box, split, and overhead squats, plus sumo, stiff leg, and trap bar deadlifts?
Think of all these variations like more advanced data transformations and refinement techniques. They could be the missing piece that tightly aligns your pipeline to your detection strategy and takes your SecOps strategy from good to great.
Revisit your priority logs (yes, again), and this time take a good hard look at their native formats. Then flip over to your threat model and detection rules. Now you have the necessary context to think through all the ways you can transform those native logs into standardized formats that are perfectly optimized for your detections. There are hundreds, maybe thousands of ways transformations can be combined with different log sources to deliver maximum fidelity detections and alerts.
Bonus points for reviewing your threat model (I know, again) and seeing if your logs are missing key pieces of information that will unlock even more security insights. From there you can look into enrichment providers to fill in the gaps.
This series has covered a lot of ground on pipeline strategy and functional considerations. If there’s one takeaway, it’s this: high-performance SecOps programs start with data quality.
It’s simple in theory, yet two decades of practical experience have proven that achieving quality datasets from high performance pipelines is deceptively complex. Armed with a thoughtful strategy that aligns pipeline functionality to your threat model and detections, success is well within your reach.