If you’ve ever been around security operations, you can probably relate to at least one of these:
- As a security practitioner, you create a new detection, turn it on, grab a coffee and wait for the adversary. Five minutes later, your new alert generates 100(0)+ false positives that you have to clean up.
- A pen test report details the successful Tactics, Techniques, and Procedures (use of a technique you know is covered by an existing detection. An alert never fired. After investigation, you uncover that the logic was erroring out. No one knew.
- An existing detection starts generating false positives after being quiet for months. You try to figure out if the detection logic was changed, who changed it, and what changed? You can’t. Audit logs are difficult to find and aren’t detailed enough to tell the story.
These are just a few challenges teams experience when managing threat detection content. Detection-as-Code principles are the long-needed missing link that enables security practitioners to apply 50+ years of learnings from software engineering to the discipline of building and maintaining threat detection.
What is Detection-as-Code?
Detection-as-Code is the application of software engineering best practices to detection engineering. By adopting this new paradigm, teams can build scalable, repeatable processes for writing, maintaining, testing, and deploying detection content.
Threat detection is not a new concept. Detection and response teams have been around for decades though the term detection engineer is relatively new. Pre-2018, most organizations called the same position a “SOC use case analyst.” The scale at which detection and response teams must operate has changed. Teams face more data, more diverse technologies, and more advanced and tailored threats than ever, which drives the need for environment-specific, custom threat detection logic.
Anything that helps your organization detect threats can be considered detection content. We can categorize detection content into three major buckets: custom logic, vendor-defined logic, and end-users.
- Custom logic is any detection content the security team builds from scratch or edits in some way. The logic in this category aims to tailor detection content to your environment. For example, a detection that alerts on the use of an emergency-only admin account or tuning a service account out of an impossible travel alert provided by your Security Information and Events Management (SIEM) vendor.
- Vendor-defined logic is any detection capability you are receiving from a vendor. An example of this would include Endpoint Detection and Remediation (EDR), cloud threat detection tools, threat intel Indicators of Compromise (IOCs), or built-in SIEM rules. For the sake of definition, a security practitioner cannot modify the logic that falls into this category. You trust the vendor to provide detection as a service (e.g., a machine learning-based threat engine leveraged by your EDR).
- Let us not forget about your end-users. They can often be your best source of detection in an unexpected scenario.
For the remainder of this article, we will focus on custom logic.
Copy and Paste Deployment
As a security practitioner, what would you do if you found out your org’s SREs were deploying the latest production code base by hand? That sounds crazy, right? Think about how you’ve deployed detection logic in the past, though. I would be willing to bet nearly everyone reading this has had to copy and paste logic into a SIEM or, at a minimum, made a change to production detection logic directly in your SIEM.
Over time these manual processes lead to inconsistent results. Many things can go wrong in this scenario because a human is too directly involved. These manual mechanisms for managing detection logic increases mistake leading to more false positives and fewer true positives.
The good news is that the industry is starting to recognize these problems, and vendors are providing native solutions to fix them.
Five Domains of Detection-as-Code
Detection-as-code is a framework composed of five domains applied to detection engineering that help address the problems we have covered up to this point.
- Agile workflows – Agile processes allow a detection engineer to take a structured approach to the development lifecycle of threat detection content, defining phases for planning, development, testing, documentation, and deployment. By defining a prioritized backlog and explicit steps for testing and documentation, engineers can more clearly define what they should be working on.
- Expressive languages & code reuse: The expressiveness of the SIEMs language directly contributes to the flexibility an engineer has when writing detection. Modern platforms leverage languages like Python, which ensure the language’s capabilities do not restrict an engineer’s detection logic. SIEM-specific functionality such as lookup tables and data models also allow for code reuse by enabling a single piece of logic to cover events from many log sources.
- Version control: Using version control (e.g., GitLab, GitHub, etc.) as the source of truth for detection logic benefits the team by streamlining code reviews, enforcing reviews and approvals, and simplifying rollbacks. Version control is also the foundation that enables CI/CD workflows and test-driven development.
- Continuous integration & delivery (CI & CD): If detection logic lives in version control, it enables CI tools to enforce automated testing and linting before pushing changes to production. Linting is a static analysis method used to identify syntax errors. As applied to detection logic, linting can help enforce both code quality and detection metadata quality (e.g., is a response runbook linked to the detection, or are log fields added to the alert output). After merging a change, the updated detection can be automatically deployed to the SIEM using CD infrastructure.
- Test-driven development (TDD): In detection engineering, our test cases are log events. Malicious log events and log events representing regular activity make up our test coverage for each detection use case. Historically in detection engineering, we gather these test cases for development, but after deploying the logic, we discard the tests. Following a TDD methodology, we retain these use log events and retest the detection logic when changes are made or periodically on a schedule to ensure it is still functional.
Implementing Detection-as-Code with Panther
From the beginning, Panther was architected around detection-as-code principles. Because of this design, Panther natively provides workflows and tools that enable teams to easily integrate external systems like version control and CI/CD pipelines into their detection engineering processes. In addition, Panther uses Python to define detection logic and bundles pre-defined test cases with each detection use case in our public GitHub repo.
At a high-level the detection-as-code workflow in Panther looks like this:
- Security engineers edit or build a detection rule on their laptops and run tests locally.
- They commit their changes to a new branch and create a pull request against their panther detection repository. Linting and tests run automatically.
- A peer reviews the change and provides feedback or approves
- Once all tests have passed, and approval is received, the authoring engineer merges the pull request.
- Upon merge, CI/CD workflows deploy the detection into Panther.
Example: Editing a detection
Let’s walk through what it would look like to make a small change to an existing detection in Panther while following detection-as-code principles. The detection below will alert if the user deletes an S3 bucket:
from panther import aws_cloudtrail_success from panther_base_helpers import deep_get def rule(event): # Capture DeleteBucket, DeleteBucketPolicy, DeleteBucketWebsite return event.get("eventName").startswith("DeleteBucket") and aws_cloudtrail_success(event)
In this example, let us say we know the operations team will delete specific buckets. To prevent those actions from generating unnecessary alerts, we need to add an allowlist.
from panther import aws_cloudtrail_success from panther_base_helpers import deep_get, pattern_match_list ALLOWED_BUCKETS = [ "TEMP-*", ] def rule(event): # Capture DeleteBucket, DeleteBucketPolicy, DeleteBucketWebsite if event.get("eventName").startswith("DeleteBucket") and aws_cloudtrail_success(event): # Don't alert on buckets that we know will be deleted if pattern_match_list(deep_get(event, "requestParameters", "bucketName"), ALLOWED_BUCKETS): return False # alert on all other buckets return True return False
A few notes:
- Code reuse (L1,2): We import three functions that enable us to reuse code written to solve different problems across any detection use case.
- aws_cloudtrail_success (L11): validates if an action in AWS was successful.
- pattern_match_list (L13): used for determining if an event value matches any entry in a list.
- deep_get (L13): efficiently pulls a field value from nested JSON.
Once the detection engineer has completed their change locally, they start by submitting a pull request to their Panther detection repo. Opening a pull request notifies team members that a review is needed, and the diff view in version control enables the reviewing engineer to identify the changes.
Tests and linting run automatically in CI validate that all detection logic is functional, which gives the engineers the confidence their changes won’t break production logic.
Finally, upon merging the pull request, the detection logic is automatically uploaded to Panther. A Panther maintained open-source tool called Panther Analysis Tool (PAT) enables CI/CD workflows. This command-line-based tool allows security practitioners to quickly leverage their organization’s existing CI/CD infrastructure for testing and linting their detection logic.
By following detection-as-code principles, security practitioners can make the lifecycle of detection content more consistent. Learn how, one of Panther’s customers, Cedar scaled their security program with detection-as-code.
Panther is a modern SIEM platform that solves the challenges of security operations at scale. Panther can analyze log data from hundreds of systems, including AWS, GCP, O365, G Suite, Crowdstrike, OSquery, and more. Check out a list of all our supported integrations. Try Panther for 30 days to get started or request a personalized demo. If you are interested in learning more about detection-as-code, ask us about a workshop in your area.