Best practices for running faster SQL queries

An announcement comes out that there has been a security breach in a tool your company utilizes heavily. You are tasked with checking the AWS CloudTrail logs to see if your company has been compromised due to this breach. Faced with a mountain of logs spanning over just the last 30 days, how do you get an answer to this question as fast as possible?

In this blog, we’ll explore the power of querying normalized data in Panther efficiently and precisely by following these principles:

  1. Include a time range
    Including a time range in your SQL query is the easiest way to reduce your search space (and consequently reduce your query time) as it removes events from time ranges you’re not targeting. Panther indexes log data using the p_event_time field, which is the timestamp that has been identified in the schema as the time the event occurred. To pull only the last day’s worth of events, add this WHERE clause with a time range macro to your query:
    WHERE p_occurs_since('24 hours')
  2. Add a LIMIT
    A LIMIT clause ensures that there is a stopping point for pulling data in the case of a broad query. If a query would normally return thousands of results, a result can be returned quicker if the engine only needs to find the first instance. For instance, if you’re looking to see if there has been any activity for the past 30 days for a particular log source, you can use the following clause to get a single quick result to confirm activity:
    WHERE p_occurs_since('30 days') LIMIT 1
  3. SELECT only the fields you need
    The SELECT statement allows users the ability to easily select fields relevant to their investigation without having to sift through everything. Snowflake, the data platform which drives Panther’s data lake, utilizes a columnar format, which means queries return faster when fewer fields are selected. For AWS CloudTrail events, we can create a timeline of resources accessed in the last 30 days with the following statement:
    SELECT eventTime, resources WHERE p_occurs_since('30 days')
  4. Summarize
    Useful summarizing commands like GROUP BY, COUNT(), and DISTINCT can help detect outliers in a dataset. Using GROUP BY username with p_occurs_since('30 days') and COUNT(username) can give us a list of users and the number of actions they’ve taken in the last 30 days, making it easier to spot a username that may be out of place.

When all the above recommendations are put together, we can form a slick AWS CloudTrail query that looks for AWS IAM roles that have been assumed by multiple users, which could be a sign of compromised credentials. 

SELECT
  requestParameters:roleArn,
  userIdentity:principalId,
  COUNT(DISTINCT userAgent) AS n_agents
FROM
  panther_logs.public.aws_cloudtrail
WHERE
  eventSource = 'sts.amazonaws.com'
  AND eventName = 'AssumeRole'
  AND p_occurs_since('30 days')
  AND userIdentity:principalId != 'null'
  AND userAgent != 'AWS Internal'
  AND requestParameters:roleArn != 'null'
GROUP BY requestParameters:roleArn, userIdentity:principalId
HAVING n_agents > 1
ORDER BY n_agentsDESC
LIMIT 100;Code language: SQL (Structured Query Language) (sql)

For more information on how Panther enables speedy investigations, check out our Documentation to get set up, explore our Knowledge Base for tips, tricks, and troubleshooting, and join the Panther Community to see how other security professionals are getting the most out of Panther! Hope to see you there!

Recommended Resources

Escape Cloud Noise. Detect Security Signal.
Request a Demo