Request a demo

Request a demo

pen

Blog

Why PantherFlow: How Our Piped Query Language Simplifies Search

Doug

Miller

Nov 12, 2024

9

min read

Avoiding the complexities of SQL while retaining its powerful search capabilities is exactly why we created PantherFlow, Panthers piped query language. Building PantherFlow is a big step for us, so we wanted to spend some time talking about how we got here and why we think PantherFlow is great.

Yes, PantherFlow is indeed another vendor-specific query language, and well explain why it was the best decision to improve Panthers search experience. Well cover our key design goals for developing an intuitive query syntax that simplifies threat investigation, why piped query languages are so effective, and the flexibility of modeling security data as semi-structured JSON. Lets get into it.

How We Got Here

Before PantherFlow there were two ways to interact with your data in Panther: SQL or an interactive Search feature. Each has pros and cons. With SQL you have the raw power to do exactly what you want. However, this comes with little hand-holding. For example, you must provide an ORDER BY or your data will not be sorted. And without a LIMIT you could wait minutes for your query to complete. These are only two common examples of SQLs complexitiesthere are many more. And of course, you actually need to know how to write SQL, which is not a common skill in the security space.

Panthers SQL interface

Panthers SQL interface

With our Search feature you get a lot of bells and whistles:

  • Data is returned in time order

  • Limits are handled transparently

  • An autocomplete feature assists you in writing filter criteria

  • A timeline view shows result frequency

  • Tooltips help you add new search criteria with one click

  • You can search across multiple tables easily

However, there is one major drawback: you can only perform basic filtering operations. You cannot transform, aggregate, join, or make function calls.

Panthers Search interface

Panthers Search interface

This created a gap for users who found SQL too cumbersome, yet considered the Search interface too limiting. The core challenge became: How can we provide users with the expressive power they desire without inheriting SQLs complexities?

Ultimately we decided the best way to resolve this was by embedding our own pipelined query language into Panthers Search experience. PantherFlow was the result.

Why Pipelined Search Is Better

Our experience with other query languages showed us that using pipelined query languages results in a more intuitive and approachable experience. In piped search, your data logically flows through the pipeline of operators; the output from one is the input to the next. This allows for complex queries to be built up from short, digestible statements that flow from top to bottom. Reading and writing queries in this way is more accessible to users of all skill levels.

To demonstrate this, well look at the same query written in both PantherFlow and SQL, starting with PantherFlow:

In PantherFlow, each step in the query is applied in order. This makes it simple to understand what the query is doing by breaking it down into discrete, logical steps.

In the above query, the first step references the source of the data: the aws_cloudtrail table. We then apply some filters with where to scope the date range and events. We extract the username into a new field using extend, which we use to join with some enrichment data. Afterward, we filter our search further by job family. Finally, we reduce the results to a count of actions by username.

Now lets look at the SQL for the same query:

With SQL, the query is written in a different order than the steps that are logically applied. To start, the data source is listed in the FROM clause in the middle of the query. Furthermore, the results of the aggregation are governed by both the SELECT clause at the top and the GROUP BY clause at the bottom. Also, the first two conditions in the WHERE clause are applied before the join, and the final condition is applied after.

The next image visualizes the difference in execution order between PantherFlow and SQL. On the left, each step in the PantherFlow query is colored in the order of the ROYGBIV rainbow. On the right, these same colors are applied to the corresponding steps in the SQL query.

Comparing the same PantherFlow and SQL query, side by side.

Of course, any query language you dont know is going to be foreign, but what makes PantherFlow and other pipelined languages approachable is the way the query is structured: a logical ordering, starting with the data source and applying operators like filters, joins, and transformations as sequential building blocks. This way of structuring queries makes them intuitive to write and easy to read.

Under the Covers Its (Schema-Less) SQL

We built PantherFlow to wrap our existing SQL data lake, but one of our key design goals was to allow users to easily query multiple tables, even those with different structures. In SQL this is typically quite a challenge, but in PantherFlow the union operator lets you query many tables at once simply by listing the names of the tables, even those with wildly different schemas.

To make this work, PantherFlow presents a semi-structured data model. In contrast to a SQL querywhere data is retrieved from structured tables and returned as a tabular resultPantherFlow retrieves semi-structured data and returns semi-structured data. As a result, PantherFlow doesnt require data to have a fixed schema.

This flexibility powers the union operator mentioned above, but it also maps more naturally to security data which is almost ubiquitously JSON. In fact, in PantherFlow you can even use JSON data as input to your query with the datatable operator.

As we look to expand PantherFlow to more areas of our product, its semi-structured nature gives us the flexibility to apply PantherFlow to places where schemas may not be practical or available.

Yet Another Query Language

We know, PantherFlow is yet another vendor-specific query language. There have been many popping up recently including: ES|QL, PQL, and TQL. This begs the question, are pipelined query languages just in vogue or is the industry coming to some kind of shared recognition that they are excellent tools to interact with and explore data? We feel that pipelined query languages power, flexibility, and ease of use put them firmly in the latter camp and that they have staying power.

With all these pipelined query languages coming onto the scene, could we have adopted an existing one instead of building our own? We tried but nothing met our use case. We looked into several including:

None of these had everything we were looking for, namely: a Go toolkit for processing queries and generating SQL, support for semi-structured data, and ease-of-use; PQL, which is very similar to PantherFlow, was not available at the time we started development.

Thats not to say we did not take inspiration from these sources. In fact, many of PantherFlows operators and overall syntax were heavily inspired by KQL.

Despite all of the effort that goes into designing and developing our own language, building from the ground up allows us to take it in the direction that we think is best for threat hunting and incident response. Lets walk through just a few examples of how PantherFlow improves search.

Simple Is Better than Complex

When designing PantherFlow we wanted to have oneand only oneobvious way of doing things. For example, unlike some languages, we do not allow for operator aliases, such as using both sort and order to do the same thing. While some might argue that aliases give more flexibility to users we believe that this increases the mental overhead when reading unfamiliar code.

Easy-to-Use Raw Text Search

Its really important to us that users have a way to quickly perform text searching across their logs for those times when you just dont know what field a value might be in. For this use case we created the search operator which lets you perform a raw text search against logs without specifying an explicit column.

Null Handling Is Intuitive

In SQL, null handling is a perennial source of confusion, for example the expression foo = NULL will always evaluate to False; instead, you must remember to say foo IS NOT NULL. In contrast, PantherFlow treats null as just another value that you can compare with, so foo == null does exactly what you think it should.

Consistent and Clean Syntax

Some languages use a mixture of both infix operators (e.g. foo contains 'str') and function calls. In our experience, this increases the mental overhead of writing queries because you need to remember whether the thing you are trying to do is a function call or an infix. In PantherFlow, the only infix operations are in and between, everything else is a function call. To lower this mental overhead even more, we group PantherFlows functions into namespaces like agg for aggregations and strings for string functions.

See PantherFlow for Yourself

There are many more examples of PantheFlows flexibility and expressive syntax. So now that you know why we created our own pipelined search language and some of our major design goals and decisions, continue reading in our next blog How to Write Queries in PantherFlow. Well deep dive into the specifics of PantherFlow queries, with plenty of code examples for you to digest!

    Table of contents will appear here.

Recommended Resources

Detection-as-Code

Escape Cloud Noise. Detect Security Signal.

Request a Demo

Escape Cloud Noise. Detect Security Signal.

Request a Demo

escape cloud noise. detect security signal

Request a Demo

Escape Cloud Noise. Detect Security Signal.

Request a Demo

Product
Resources
Support
Company