How to Write Queries in PantherFlow, a Piped Search Language

PantherFlow is Panther’s piped query language that supports filtering, transforming, and aggregating data in a way that should feel familiar to those who have experience with other pipelined languages. In our last blog, we talked about why we created PantherFlow, including how it improves search with an intuitive, logical syntax that retains the power of SQL, without inheriting its complexities.

In this blog, we’ll deep dive into PantherFlow’s syntax and features, showing you how to write PantherFlow queries and how it integrates into Panther’s search experience.

Fully Integrated into Search

In Panther’s Search interface, you can switch into PantherFlow mode by selecting the toggle button </>, as shown in the following image. In this mode, you type queries in PantherFlow while still retaining all of the features of the existing Search interface, including:

  • Data is returned in time order but can be overridden with PantherFlow’s sort operator
  • Limits are handled transparently, although the PantherFlow’s limit operator is available for extra control
  • Summary statistics of the “Top N” values of a given field can be generated
  • Additional filter criteria can be added to your PantherFlow query with the click of a button
  • The Database, Table, and Time Range dropdown fields can be used to select your input data, or you can override these values directly in the PantherFlow query
Panther's Search interface with the toggle button to go into PantherFlow mode.

It Starts with a Table

A table name is a valid query. Subsequent filters, transforms, or joins are added on new lines starting with the pipe | symbol. Comments begin with //. Read more about PantherFlow statements in the docs.

// query all data in the aws_alb table 

aws_albCode language: plaintext (plaintext)

Expressive Filtering

The where operator lets you write complex expressions. You can use typical boolean logic (and, or, not) to combine expressions, typical comparisons (== , < , <=, >, >= and !=, and in) and a growing set of function calls to produce complex queries.

// Query the aws_alb table for events within the last day that have the field elbStatusCode set to 200 or 404, or containing 'safari'

aws_alb
| where p_event_time > time.ago(1d) and (elbStatusCode in [200, 404] or strings.contains(userAgent, 'safari'))Code language: plaintext (plaintext)

Transform your Data

You can transform your data with extend and project. Use the extend operator to add fields based on any expression or function call you wish. Updates to your data can be referred to later on in the query:

// Add new fields "sentKB" and "clientIp"

aws_alb
| extend sentKB=sentBytes/1024, clientIp
| where sentKB > 2Code language: plaintext (plaintext)

Similarly, the project operator adds new fields after dropping all existing fields:

// Only sentKB and clientIp remain

aws_alb
| project sentKB = sentBytes / 1024, clientIpCode language: plaintext (plaintext)

Aggregate

No self respecting query language would be complete without the ability to aggregate data, and PantherFlow is no different. The summarize operator lets you aggregate data:

// Count all results by clientIp and clientPort

aws_alb
| summarize agg.count() by clientIp, clientPortCode language: plaintext (plaintext)

You can rename fields, use arbitrary expressions and, of course, you can refer to added fields later in the query:

// Count all results by whether elbStatusCode equals 200, store all IPs for each case in the uniq_ips field, the count in `num_connections`, and the expression `elbStatusCode == 200` in `isOK`. 
// Filter for `num_connections > 5` and sort by num_connections.

aws_alb
| summarize num_connections=agg.count(), uniq_ips=agg.make_set(clientIp) by isOK = elbStatusCode == 200
| where num_connections > 5
| sort num_connectionsCode language: plaintext (plaintext)

Data Types that Are a Superset of JSON

PantherFlow’s data types are a superset of JSON, so anything that is valid JSON can be used in PantherFlow as an expression. This means PantherFlow has support for both arrays [1, "foo", 2.3] and objects {"a": 1, "b": 2}.

// Add animal object to each row

| extend animal={
	"name": "panther", 
	"likes": ["long walks on the beach", "picnics in the park"]
}Code language: plaintext (plaintext)

Data types are evaluated as expressions so they can contain function calls and refer to fields in the query:

// Add user_info object to each row

| extend user_info={
	"source": strings.cat(clientIP, ":", clientPort), 
	"userAgent": userAgent,
}Code language: plaintext (plaintext)

Multi-Table Queries

As a result of PantherFlow’s semi-structured nature, you can search across multiple tables with the union operator, even when they contain different columns. No special handling is needed. Data that does not exist in a table is null.

// Search in both the aws_alb and aws_cloudtrail tables using the same filter criteria

union panther_logs.public.aws_alb, panther_logs.public.aws_cloudtrail
| where "1.1.1.1" in p_any_ip_addressesCode language: plaintext (plaintext)

First Class Time Intervals

Time intervals are treated as first class types, for example: 1d (one day), 15m (15 minutes), 60s (60 seconds), etc. Time intervals can be passed to functions or computed arithmetically.

| extend one_day_ago = time.ago(1d)
| extend also_one_day_ago = time.now() - 1dCode language: plaintext (plaintext)

Deeply nested object paths

It is straightforward to refer to deeply nested values and arrays. Object attributes can be accessed with a period (.). Square brackets ([]) can be used to access an attribute as a string or array index.

| where some.totally["(un)reasonable"][0].path == "coffee"Code language: plaintext (plaintext)

Named Subqueries

You can name queries with the let keyword and treat them like a first class table. This incurs zero cost overhead: only subqueries that are needed are computed.

// Referred to below as if it were a table
let mysubquery = union aws_alb, aws_cloudtrail, aws_guardduty
| where p_event_time > time.ago(1d);

// Ignored
let unneeded = some_other_query;

// Refers to the above subquery, just like a table
mysubquery 
| where '1.1.1.1' in p_any_ip_addresses Code language: plaintext (plaintext)

Null Handling

The way that SQL handles NULL is confusing, and arguably requires a lecture in three-valued logic. In comparison, PantherFlow treats null as a value so comparisons like foo == null and foo != null do exactly what you think. There is no IS keyword in PantherFlow.

| where some_field != nullCode language: plaintext (plaintext)

Join Support

PantherFlow supports joins with the join operator, including the ability to join on a subquery:

// Join aws_alb data on tor_exit_nodes, storing the data in the `tor` field

aws_alb
| join kind:inner tor=(panther_lookups.public.tor_exit_nodes) on        
       $left.clientIp == $right.ipCode language: plaintext (plaintext)

Text Searching

If you’ve ever had to search for a particular string in your logs but don’t know what field it is in, we feel your pain. Arbitrary strings can be found anywhere in your data with the search operator:

aws_alb
| search 'big cat'Code language: plaintext (plaintext)

You can combine strings with boolean logic, and wildcards are supported too:

aws_alb
| search ('GET' or 'POST') and not 'HTTP/1.1' and 'mozilla*chrome'Code language: plaintext (plaintext)

Inject Data into your Query

You can inject data into a query with the datatable operator. And because JSON is valid PantherFlow, you can copy and paste JSON directly from the source.

datatable [
    {
     "name": "paul.paulson", 
     "favorite_food": "pineapple"
    },
    {
     "name": "samantha.samson", 
     "favorite_food": "carrot"
    }
]
| where favorite_food == 'pineapple'
Code language: plaintext (plaintext)

You can also use expressions within the datatable:

// time.now() always evaluates to the current time, so the query always returns the data

datatable [
    {
     "eventName": "Panther roar", 
     "p_event_time": time.now()
    },
    {
     "eventName": "Panther sleep", 
     "p_event_time": time.now()
    }
]
| where p_event_time > time.ago(1m)Code language: plaintext (plaintext)

You can combine the union operator with datatable to inject values into your query results:

// Inject data into your query that contains up-to-date timestamps

aws_alb
| union (datatable [{"p_event_time": time.ago(1m), "clientIp": "1.1.1.1", "clientPort": "12345"}])
| where p_event_time > time.ago(20m)Code language: plaintext (plaintext)

Unicode Support!

And finally, a bit of a fun feature, we have full support for unicode in strings.

| extend panther = "✨🐈‍⬛✨"Code language: plaintext (plaintext)

Forward-looking

In designing PantherFlow we emphasized simplicity, ease of use, and productivity. Whether it’s the ability to search for a known bad indicator string anywhere in your logs, pivot and enrich with join, combine multiple tables into a single query, or generate informative summaries, we’ve created PantherFlow to be a search language that you actually enjoy using.

We think this set of features is already compelling but we are continuing to improve and invest in PantherFlow. One such feature is our upcoming visualize operator that will produce a custom bar or line chart of your query results, which can be saved for later reuse or to a dashboard!

Panther leverages streaming analysis, detection-as-code, and a high-performance security data lake to filter security signal from noise, at cloud scale. Request a demo to learn how Panther can improve your threat detection and response.

Table of Contents

Recommended Resources

Escape Cloud Noise. Detect Security Signal.
Request a Demo