PantherFlow is Panther’s piped query language that supports filtering, transforming, and aggregating data in a way that should feel familiar to those who have experience with other pipelined languages. In our last blog, we talked about why we created PantherFlow, including how it improves search with an intuitive, logical syntax that retains the power of SQL, without inheriting its complexities.
In this blog, we’ll deep dive into PantherFlow’s syntax and features, showing you how to write PantherFlow queries and how it integrates into Panther’s search experience.
In Panther’s Search interface, you can switch into PantherFlow mode by selecting the toggle button </>
, as shown in the following image. In this mode, you type queries in PantherFlow while still retaining all of the features of the existing Search interface, including:
sort
operatorlimit
operator is available for extra controlA table name is a valid query. Subsequent filters, transforms, or joins are added on new lines starting with the pipe |
symbol. Comments begin with //
. Read more about PantherFlow statements in the docs.
// query all data in the aws_alb table
aws_alb
Code language: plaintext (plaintext)
The where
operator lets you write complex expressions. You can use typical boolean logic (and
, or
, not
) to combine expressions, typical comparisons (==
, <
, <=
, >
, >=
and !=
, and in
) and a growing set of function calls to produce complex queries.
// Query the aws_alb table for events within the last day that have the field elbStatusCode set to 200 or 404, or containing 'safari'
aws_alb
| where p_event_time > time.ago(1d) and (elbStatusCode in [200, 404] or strings.contains(userAgent, 'safari'))
Code language: plaintext (plaintext)
You can transform your data with extend
and project
. Use the extend
operator to add fields based on any expression or function call you wish. Updates to your data can be referred to later on in the query:
// Add new fields "sentKB" and "clientIp"
aws_alb
| extend sentKB=sentBytes/1024, clientIp
| where sentKB > 2
Code language: plaintext (plaintext)
Similarly, the project
operator adds new fields after dropping all existing fields:
// Only sentKB and clientIp remain
aws_alb
| project sentKB = sentBytes / 1024, clientIp
Code language: plaintext (plaintext)
No self respecting query language would be complete without the ability to aggregate data, and PantherFlow is no different. The summarize
operator lets you aggregate data:
// Count all results by clientIp and clientPort
aws_alb
| summarize agg.count() by clientIp, clientPort
Code language: plaintext (plaintext)
You can rename fields, use arbitrary expressions and, of course, you can refer to added fields later in the query:
// Count all results by whether elbStatusCode equals 200, store all IPs for each case in the uniq_ips field, the count in `num_connections`, and the expression `elbStatusCode == 200` in `isOK`.
// Filter for `num_connections > 5` and sort by num_connections.
aws_alb
| summarize num_connections=agg.count(), uniq_ips=agg.make_set(clientIp) by isOK = elbStatusCode == 200
| where num_connections > 5
| sort num_connections
Code language: plaintext (plaintext)
PantherFlow’s data types are a superset of JSON, so anything that is valid JSON can be used in PantherFlow as an expression. This means PantherFlow has support for both arrays [1, "foo", 2.3]
and objects {"a": 1, "b": 2}
.
// Add animal object to each row
| extend animal={
"name": "panther",
"likes": ["long walks on the beach", "picnics in the park"]
}
Code language: plaintext (plaintext)
Data types are evaluated as expressions so they can contain function calls and refer to fields in the query:
// Add user_info object to each row
| extend user_info={
"source": strings.cat(clientIP, ":", clientPort),
"userAgent": userAgent,
}
Code language: plaintext (plaintext)
As a result of PantherFlow’s semi-structured nature, you can search across multiple tables with the union
operator, even when they contain different columns. No special handling is needed. Data that does not exist in a table is null
.
// Search in both the aws_alb and aws_cloudtrail tables using the same filter criteria
union panther_logs.public.aws_alb, panther_logs.public.aws_cloudtrail
| where "1.1.1.1" in p_any_ip_addresses
Code language: plaintext (plaintext)
Time intervals are treated as first class types, for example: 1d
(one day), 15m
(15 minutes), 60s
(60 seconds), etc. Time intervals can be passed to functions or computed arithmetically.
| extend one_day_ago = time.ago(1d)
| extend also_one_day_ago = time.now() - 1d
Code language: plaintext (plaintext)
It is straightforward to refer to deeply nested values and arrays. Object attributes can be accessed with a period (.
). Square brackets ([]
) can be used to access an attribute as a string or array index.
| where some.totally["(un)reasonable"][0].path == "coffee"
Code language: plaintext (plaintext)
You can name queries with the let
keyword and treat them like a first class table. This incurs zero cost overhead: only subqueries that are needed are computed.
// Referred to below as if it were a table
let mysubquery = union aws_alb, aws_cloudtrail, aws_guardduty
| where p_event_time > time.ago(1d);
// Ignored
let unneeded = some_other_query;
// Refers to the above subquery, just like a table
mysubquery
| where '1.1.1.1' in p_any_ip_addresses
Code language: plaintext (plaintext)
The way that SQL handles NULL
is confusing, and arguably requires a lecture in three-valued logic. In comparison, PantherFlow treats null
as a value so comparisons like foo == null
and foo != null
do exactly what you think. There is no IS
keyword in PantherFlow.
| where some_field != null
Code language: plaintext (plaintext)
PantherFlow supports joins with the join
operator, including the ability to join on a subquery:
// Join aws_alb data on tor_exit_nodes, storing the data in the `tor` field
aws_alb
| join kind:inner tor=(panther_lookups.public.tor_exit_nodes) on
$left.clientIp == $right.ip
Code language: plaintext (plaintext)
If you’ve ever had to search for a particular string in your logs but don’t know what field it is in, we feel your pain. Arbitrary strings can be found anywhere in your data with the search
operator:
aws_alb
| search 'big cat'
Code language: plaintext (plaintext)
You can combine strings with boolean logic, and wildcards are supported too:
aws_alb
| search ('GET' or 'POST') and not 'HTTP/1.1' and 'mozilla*chrome'
Code language: plaintext (plaintext)
You can inject data into a query with the datatable
operator. And because JSON is valid PantherFlow, you can copy and paste JSON directly from the source.
datatable [
{
"name": "paul.paulson",
"favorite_food": "pineapple"
},
{
"name": "samantha.samson",
"favorite_food": "carrot"
}
]
| where favorite_food == 'pineapple'
Code language: plaintext (plaintext)
You can also use expressions within the datatable
:
// time.now() always evaluates to the current time, so the query always returns the data
datatable [
{
"eventName": "Panther roar",
"p_event_time": time.now()
},
{
"eventName": "Panther sleep",
"p_event_time": time.now()
}
]
| where p_event_time > time.ago(1m)
Code language: plaintext (plaintext)
You can combine the union
operator with datatable
to inject values into your query results:
// Inject data into your query that contains up-to-date timestamps
aws_alb
| union (datatable [{"p_event_time": time.ago(1m), "clientIp": "1.1.1.1", "clientPort": "12345"}])
| where p_event_time > time.ago(20m)
Code language: plaintext (plaintext)
And finally, a bit of a fun feature, we have full support for unicode in strings.
| extend panther = "✨🐈⬛✨"
Code language: plaintext (plaintext)
In designing PantherFlow we emphasized simplicity, ease of use, and productivity. Whether it’s the ability to search for a known bad indicator string anywhere in your logs, pivot and enrich with join
, combine multiple tables into a single query, or generate informative summaries, we’ve created PantherFlow to be a search language that you actually enjoy using.
We think this set of features is already compelling but we are continuing to improve and invest in PantherFlow. One such feature is our upcoming visualize
operator that will produce a custom bar or line chart of your query results, which can be saved for later reuse or to a dashboard!
Panther leverages streaming analysis, detection-as-code, and a high-performance security data lake to filter security signal from noise, at cloud scale. Request a demo to learn how Panther can improve your threat detection and response.