Pivot

StreamLab’s Pivot analytics let you unfold and fold streams of data to make them simpler to process. (See http://en.wikipedia.org/wiki/Pivot_table for more information on pivot tables.) Because streams continually update, in working with streaming data, you always need to choose a window over which to query.

In other words it will turn this:

ROWTIME=12:00:00,type=Temperature,value=33
ROWTIME=12:00:00,type=Wind Speed,value=12

into this:

ROWTIME=12:00:00,Temperature=33,Wind Speed=12

pivot operations are currently experimental. At the moment they will only work if there’s an input row containing each of the key values (the columns in the unfolded result) that arrives during the time window. The input should contain all of the keys each time window.

Unfold

Unfold reshapes a table into columns of key-value sets. Selected rows map to keys, and selected columns map to values.

The unfold operation proceeds as follows.

Given a stream of the form:

| Rowtime | Browser | Count | |1.– |1.– |1.– | | 1:01 | IE | 110| | 1:03 | Chrome | 290| | 2:01 | IE | 90| | 2:03 | Chrome | 300 |

An unfold operation generates columns like the following:

| rowtime window | IE | Firefox | Chrome | Safari | |1.– |1.– |1.– |1.– | | <hour 1> | 110 | 230 | 290 | 100 | | <hour 2> | 90 | 220 | 300 | 90 |

In the example above, “user-agent” serves as the key column, and “count” serves as the value column. For each value in the key column, StreamLab creates a new column and fills in this column with values from the value column. Because you may have multiple values for each key, you need to use an operation to aggregate those values to produce a single value for the unfolded row. For example, if incoming values are counts (as in the example above), you’d want to SUM over the time window. If they’re measurements of a continuous value, like a current temperature reading from a sensor, you would likely want to use AVG.

As rows come in, the StreamLab scrutinizer watches the input stream and lists unique values in the key column (in this example, the user-agent column). Once you select the Unfold operation and pick user-agent as the key column, it will make a suggestion that fills in the ‘key list’ parameter with the list of unique values observed in the stream. You can then set the rest of the parameters either using the widgets up top or by selecting the fields in the suggestion. You can also enter the key list manually.

A window on rowtime is a common use case.

To implement an unfold operation:

  1. Select a column to be used as a key column. This column has all the column names for the new columns. This column must contain text. StreamLab will automatically identify values from this column for use as keys (new column names).
  2. Select a column to be used as a value column. This column must be numerical. This column provides values for the new columns. This might be counts, temperature readings, pressure readings, and so on.
  3. If desired, enter a column suffix for the new columns. This suffix will make it easy to identify the new columns.
  4. Select an operation to aggregate values for the new columns. This might be an average temperature, or a running total. See Operations Used in Commands for more details on these operations.
  5. Select a length and unit for the aggregation window, such as “1 hour” or “15 minutes”.
  6. Choose rolling or tumbling window type. See below for more information on these.
  7. If desired, either enter the keys manually or let StreamLab fill them in.
  8. Click the + icon to add the command to the Guide script. The results of the script appear in the Output View window.

Sliding Window

A sliding window is a time-based or row-based analytic window. With a sliding window, you apply an analytic to a given amount of rows that changes incrementally as rows stream in. For example, you might take the sum of a column called Orders for the past hour. At 12:01, the analytic would apply to all rows with timestamps between 11:01 and 12:01. Sliding window queries do not contain a flooring function, which means that rows are not batched into intervals. The query’s results slide forward one row at a time. In other words, for every row that the query finds, it emits one row. This is an example of a sliding window in a query:

SELECT STREAM , SUM(amt) OVER w
FROM sales
WINDOW w AS (PARTITION BY store_id
ORDER BY s.ROWTIME
RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW);

You can specify sliding windows in two ways:

As ROWS N PRECEDING, such as ROWS 10 PRECEDING

This analytic window uses the last N rows, such as the last 10 rows. When the query starts up, the window fills up to N rows. From then on, as a new row enters the window, an old row is excluded from the analytic. For all cases where N is a number, the number of rows in the analytic window will always be N+1. You can also specify ROWS UNBOUNDED PRECEDING, which includes all available previous rows in its analytic window.

As RANGE BETWEEN INTERVAL, such as RANGE BETWEEN INTERVAL ‘1’ HOUR PRECEDING AND CURRENT ROW

This analytic window uses rows from a specified interval of time. In this case, all rows whose rowtimes precede the current rowtime by one hour will be excluded from the analytic. In other words, rows that precede the previous hour are “dumped” or “discarded” from the analytic window.

Tumbling Window

A “tumbling window” is a a collection of rows that are aggregated to produce a fewer number of output rows, such as “the sum of the last twenty rows” or “the sum of the rows in the last hour”. One row is returned for every group of rows.

As a result, the total number of output rows are limited: many rows in, one row out. Tumbling windows are actually not “windows” in the SQL sense, but rows that are aggregated with SELECT STREAM … GROUP BY.

Fold

Fold lets you convert multiple columns with like values into a single row. For fold, select multiple columns with like values (for example, a series of columns that contain counts for different categories, such as product types or employees). Fold creates two new columns: a key column with values drawn from the column names of selected columns, and a value column with values drawn from selected column values.

To implement a fold:

  1. Enter a column name for the key column. This will contain the selected columns' names.
  2. Enter a column name for the value column. This column will contain values drawn from the selected columns.
  3. Choose or enter source columns. These columns will be combined into the new column. All of these columns should contain the same type of values (counts, temperatures, and so on).
  4. If desired, enter a column suffix for the new columns. This suffix will make it easy to identify the new columns.
  5. Click the + icon to add the command to the Guide script. The results of the script appear in the Output View window.