WINDOW clause

Contents:

Windows Overview

The WINDOW clause in a SELECT query specifies a subset of rows within the stream, measuring backward from the current row. Typically the interval will be either a time range (e.g. data timestamped during the preceding 5 minutes) or else a particular number of rows (e.g. the previous 100 rows).

Alternatively, you can define the upper bound for a WINDOW using an offset relative to the current row. In other words, the cutoff for the WINDOW’s most recent data would be placed not at the current row but rather at an earlier point. For example, you can establish a WINDOW that contains data beginning 60 minutes prior to the current row and ends 30 minutes prior.

Summary information can be derived from rows within a WINDOW. That includes calculations such as AVG, MAX, MIN, COUNT, and SUM, among others. In a conventional database, you would not need a WINDOW to perform such calculations because the query would see 100% of the available rows in advance. With streaming data, on the other hand, new rows are continually created. So you must identify a subset of rows on which to act.

In RDBMS tables, calculations like MAX, COUNT, and SUM occur as aggregate functions, often using GROUP BY. With streaming data, GROUP BY is frequently used to define tumbling windows, including use of these familiar aggregate functions. Note: Despite the name, “tumbling windows” are not true windows in the SQL sense. Remember, WINDOW defines a subset of rows within the stream. Meanwhile, tumbling windows are created without using the WINDOW clause.

For a WINDOW, s-Server performs similar calculations (MAX, COUNT, SUM, etc.) as analytic functions. Both aggregate functions and analytic functions take multiple rows as inputs and calculate summary information. But there is a crucial difference: Aggregate functions output a single row that encapsulates all input rows, whereas analytic functions produce 1 output for every input row.

Consider why this is so. The WINDOW clause implies a sliding window frame relative to the current row. As new rows arrive, the window frame “moves”, meaning that each single row is associated with its own set of prior rows. Therefore each single row receives an output value from analytic functions acting on that changing set. For example, the AVG of the previous 100 rows may well change, going from one row to another; so a separate AVG output value is given for every row.

You can use the PARTITION BY clause to limit the scope of analytic functions, based on the value in a particular column. For example, MAX would apply to all rows within the WINDOW interval. But if PARTITION BY key_column is used, then MAX will consider only rows having the same key_column value as the current row. This achieves an effect similar to GROUP BY, since distinct values in key_column produce separate MAX calculations.

Some use cases demand a WINDOW whose duration is not a predefined constant value and whose start / end points cannot necessarily be known in advance. For such applications, you can configure a partitioned WINDOW as a session. That means the data stream itself will trigger the beginning and end of a row set.

You can use the ORDER BY clause to specify ordering within a WINDOW, even making use of FLOOR() or STEP() to group rows into larger timeframes. For more information, see the subtopic on hopping windows.

Windows of all kinds, including partitioned windows and sessions, can be used to perform JOINs. You can define named windows (using WINDOW) or use inline window specifications (by applying OVER to one or more columns in the query). Note: s-Server does not support using the WINDOW clause on foreign tables.

When you specify a window, s-Server applies calculations for each output row (such as AVG or MAX) to a sliding set of rows, known as the window frame. You can define window frames based on a logical interval or a physical number of rows. Logical intervals express windows in terms of time (e.g. “the last 5 minutes”) or another monotonically increasing expression (e.g. “the last 5 millimeters”). Having said that, logical intervals are almost always based on input.ROWTIME. For logical windows, you can define windows that start in the past and end at the present, or that start in the past and end in the past. The latter are known as offset windows.

A “physical number of rows” means something like “the last 200 rows that arrived”. These are often called row-based windows. These windows use ROWTIME, but only implicitly: s-Server determines the last 200 rows that arrived based on their ROWTIME. Physical windows always include the current row. So, actually, this window would contain 201 rows–the 200 rows that arrived prior to the current row and the current row itself.

Using Windows in a Query

You reference named windows in the OVER clause of a query. The following example references a window called 10min, which is defined in the WINDOW clause of the query:

SELECT STREAM ROWTIME,
     driver_no,
     AVG(speed) OVER 10min AS avg_speed_10_min
     --references window defined below
   FROM buses_stream
 WINDOW 10min   AS (PARTITION BY driver_no RANGE INTERVAL '10' MINUTE PRECEDING);

This example defines a window frame based on a logical interval, as indicated by RANGE. If the window frame were defined as a number of rows, it would use ROWS in place of range, as in ROWS 20 PRECEDING.

You can also use the OVER clause to define windows inline, without using the WINDOW clause, as in:

SSELECT STREAM ROWTIME,
     driver_no,
     AVG(speed) OVER (PARTITION BY driver_no RANGE INTERVAL '10' MINUTE PRECEDING)
      AS "avg_speed_10_min"
 FROM buses_stream;

Calculating Windows

s-Server calculates windows in the following order:

  1. A PARTITION BY clause divides the input rows into partitions, similar to GROUP BY but without actually combining rows with the same key.
  2. An ORDER BY clause specifies the ordering within each partition.
  3. A WINDOW clause defines the window frame within the current partition.

s-Server now features offset and hopping windows.

Syntax for Window Statement

<window-clause> ::=
WINDOW <window_name> AS (
   [ <partition-clause> ]
   [ ORDER BY <MonotonicExpression> ]
   <window-frame> )
   <window-frame> ::=  <range-definition> | <rows-definition> )
   <range-definition> ::=  RANGE [ BETWEEN ] INTERVAL 'x' <timeunit> PRECEDING -- lower bound
​                                 [ AND INTERVAL 'y' <timeunit> PRECEDING ] -- upper bound
​   <rows-definition>  ::=  ROWS  [ BETWEEN ] number PRECEDING | UNBOUNDED PRECEDING | CURRENT ROW -- upper bound
                                 [ AND number PRECEDING | CURRENT ROW ] -- lower bound
   <partition-clause> ::= PARTTION BY [<partition-list>] [SESSION ON <session-key-list> [START WHEN <session-open-expr>]
   [END WHEN <session-close-expr>] [TIMEOUT AFTER <idle-timeout-expr>]]
   <partition-list> ::= <value-expression> {, <value-expression> }...
   <session-key-list> ::= <value-expression> {, <value-expression> }...
   <idle-timeout-expr> ::= <interval-expression>
   <session-open-expr> ::= <boolean-expression>
   <session-close-expr> ::= <boolean-expression>

Note: Including <partition_list> after PARTITION BY is mandatory except when using SESSION ON.

Intervals for RANGE

For RANGE, you can use any valid interval including complex intervals such as

INTERVAL ‘2:30’ MINUTE TO SECOND
INTERVAL ‘2:45’ HOUR TO MINUTE

s-Server does not support MONTH or YEAR as an interval. To express longer intervals, you can use intervals of days or hours:

INTERVAL ‘1000’ DAY(4)

DAY(4) because by default the precision is (2).

For more information on intervals, see Interval Expressions in the SQLstream SQL Reference Guide.

Examples:

WINDOW w AS (ORDER BY FLOOR(s.ROWTIME TO HOUR)
​    RANGE BETWEEN INTERVAL '10' HOUR PRECEDING
​          AND INTERVAL '1' HOUR PRECEDING)

WINDOW "aWindow" AS (PARTITION BY "device_key" RANGE INTERVAL '30' MINUTE PRECEDING)

Window Clause and Endpoints

CURRENT ROW is the default upper bound of a window frame in the WINDOW clause. Unless otherwise specified, all windows include CURRENT ROW. As of version 6.0, s-Server supports offset windows, where the upper bound of a window frame can precede CURRENT ROW.

Windows can have, then, two upper bounds:

CURRENT ROW - The upper bound of the window has an offset 0. This is the default upper bound. PRECEDING - The upper bound of the window has a negative offset relative to the current row.

Note: s-Server does not support FOLLOWING.

Streaming SQL follows the SQL Standard for windows over a range. This means, for example that the syntax

WINDOW HOUR AS (RANGE INTERVAL '1' HOUR PRECEDING)

will include the end points of the hour.

To ensure that the endpoint of the previous hour is not included, you need to use the following syntax for the window:

WINDOW HOUR AS (RANGE INTERVAL '59:59.999' MINUTE TO SECOND(3) PRECEDING);

See Allowed and Disallowed Window Specifications below for more details.

Window Types

Various WINDOW configurations are possible:

Example Stream

To illustrate these different window types, we provide an underlying example based on stock-market purchases. By using SQLLine to execute the queries yourself, you can see the streaming results and gain familiarity with the WINDOW clause. Instructions:

  1. Open 2 SQLLine terminals.
  2. In the first terminal, execute queries to create the data structure: SCHEMA, STREAM, and VIEW. (See below to copy and paste.)
  3. In the first terminal, define the WINDOW of interest through a CREATE VIEW query. (Various examples are provided in the subsections below.)
  4. In the first terminal, execute another query that acts on that VIEW — for example, SELECT * FROM <view_name>.
  5. In the second terminal, paste an INSERT query to load simulated values into the stream. (See below to copy and paste.)
  6. Results will appear in the first terminal.

Structure for Example Stream

Copy the following code into SQLLine to create the basic structure for our later WINDOW examples:

CREATE OR REPLACE SCHEMA stocks;
SET SCHEMA 'stocks';

CREATE OR REPLACE STREAM ticker (
   order_time TIMESTAMP, --Time order was reported.
   amount INTEGER, --Amount of order.
   ticker VARCHAR(4096) --Stock ticker (identifier).
 )
DESCRIPTION 'native stream ticker';

CREATE OR REPLACE VIEW ticker_rt
DESCRIPTION 'ticker with rowtime' AS
SELECT STREAM
    order_time as ROWTIME,
    amount,
    ticker
FROM ticker;

The first block of SQL above creates a schema called stocks. The second block creates a simple native stream called ticker, which contains stock purchase information: what, when, and how much. The third block promotes the column order_time to ROWTIME. Thus, the actual timestamps for each stock purchase order will be used for ROWTIME instead of the internal stream clock, which reflects the moment data is received.

Values for Example Stream

Now that we have created the structure for our STREAM and VIEW, we need to supply values for simulated stock purchases. The following INSERT command loads values into the ticker_rt stream created above. To see this example in action, open a second SQLline terminal and paste the following code. (See instructions above.)

INSERT INTO stocks.ticker (order_time, ticker, amount)
VALUES
(CAST('2019-03-30 03:02:00.000' as TIMESTAMP), 'ORCL', 20),
(CAST('2019-03-30 03:02:10.000' as TIMESTAMP), 'ORCL', 20),
(CAST('2019-03-30 03:03:00.000' as TIMESTAMP), 'IBM', 30),
(CAST('2019-03-30 03:04:00.000' as TIMESTAMP), 'ORCL', 15),
(CAST('2019-03-30 03:04:30.000' as TIMESTAMP), 'IBM', 40),
(CAST('2019-03-30 03:04:45.000' as TIMESTAMP), 'IBM', 10),
(CAST('2019-03-30 03:05:00.000' as TIMESTAMP), 'MSFT', 15),
(CAST('2019-03-30 03:05:30.000' as TIMESTAMP), 'MSFT', 55),
(CAST('2019-03-30 03:59:45.000' as TIMESTAMP), 'IBM', 20),
(CAST('2019-03-30 04:02:00.000' as TIMESTAMP), 'GOOGL', 100),
(CAST('2019-03-30 04:04:00.000' as TIMESTAMP), 'GOOGL', 100),
(CAST('2019-03-30 04:06:00.000' as TIMESTAMP), 'ORCL', 5),
(CAST('2019-03-30 04:08:00.000' as TIMESTAMP), 'IBM', 15),
(CAST('2019-03-30 04:18:00.000' as TIMESTAMP), 'IBM', 40),
(CAST('2019-03-30 04:18:00.000' as TIMESTAMP), 'GOOGL', 100),
(CAST('2019-03-30 04:18:00.000' as TIMESTAMP), 'GOOGL', 100),
(CAST('2019-03-30 04:18:00.000' as TIMESTAMP), 'IBM', 15),
(CAST('2019-03-30 04:43:00.000' as TIMESTAMP), 'IBM', 60),
(CAST('2019-03-30 04:44:00.000' as TIMESTAMP), 'ORCL', 1000),
(CAST('2019-03-30 05:46:00.000' as TIMESTAMP), 'ORCL', 3000)
;

Time-Based Sliding Windows

Sliding windows are the simplest type of windows. Like all windows, sliding windows emit one row for every row that enters the query (through a simple INSERT, a pump or an external source accessed by a foreign stream.

Time-based sliding windows identify groups of rows based on a specified time period in relation to the current row.

You define time-based windows in terms of 1) a time when the window begins. 2) a time when the window ends. 3) whether or not the window should “hop”. Hopping windows accumulate data in a specified time period (1 minute, 5 minutes, 1 hour) and then reset once the time period has elapsed.

The window frame of a sliding window moves along one row at a time. Rows exit the window as they move out of the defined frame. If you define a window as beginning five minutes ago, rows move out of the window once their timestamp is greater than 5 minutes past the current row. If you define a window as 200 rows, rows move out of the window once the 201st row enters the window.

The view below defines a window called 5min. The start frame bound is the current row’s rowtime minus 5 minutes. The end frame bound is the current row’s rowtime. The calculation AVG(amount) will apply to only those rows whose rowtimes are between (current rowtime minus 5 minutes) and (current rowtime). The row is calculated once once the stream has flowed forward past the current rowtime.

Every row emitted will have a slightly different value for AVG(amount) and MAX(amount), since the window is continually moving forward.

The code below creates a view with a simple time-based window with a window frame of 5 minutes. Calculations for AVG and MAX include the all rows that fall within a 5 minute window frame from the current row.

Try the following VIEW yourself, observing its output in SQLLine. Instructions are given above.

CREATE OR REPLACE VIEW sliding
DESCRIPTION 'Avg and max order over 5 minutes preceding' AS
SELECT STREAM ROWTIME,
      s.ROWTIME as "rowtime",
      amount as "order",
      AVG(amount) OVER "5min"       AS "avg_order",
      MAX(amount) OVER "5min"       AS "max_order",
ticker as "ticker"
FROM ticker_rt AS s
WINDOW "5min"   AS (
        RANGE INTERVAL '5' MINUTE PRECEDING);

The illustration below shows the results of a query run on the view defined above. In the illustration below, all rows from ‘2019-03-30 03:02:00.0’ to ‘2019-03-30 03:05:30.0’ are part of the same window. At ‘2019-03-30 03:59:45.0’, the window has slid to a 5 minute window frame that stretches back to ‘2019-03-30 03:54:45.0’.

Row-Based Sliding Windows

Row-based sliding windows identify groups of rows based on N number of rows in relation to the current row.

You define row-based windows in terms of 1) when the window begins, either unbounded (all rows received up until the current row) or a specified number of rows (such as 200 rows). 2) when the window ends, either the current row or N rows preceding (such as “until 10 rows ago).

Row-based windows are defined as the current row plus X number of rows preceding. A window such ROWS ‘3’ PRECEDING will actually contain 4 rows: 3 preceding rows plus the current row.

Because of how they’re defined, row-based windows can be slightly tricky to work with at the start of a stream. That’s because s-Server doesn’t require that the exact number of rows are present: if you specify ROWS ‘3’ PRECEDING and the stream has only emitted one row, s-Server runs calculations anyway. This means that at the start of a stream, a row-based sliding window window such as ROWS ‘3’ PRECEDING window begins with a single row and then grows up to the eventual window size. It first includes row 1 of a stream, then rows 1 and 2, then 1-2-3, and finally 1-2-3-4 before sliding to 2-3-4-5, 3-4-5-6, and so on. In some cases, you may want to disregard those initial smaller windows by filtering out rows by using a statement along the lines of WHERE COUNT(*) OVER theWindow = 4 (in the case of a 4 row window).

The code below creates a simple row-based window of 5 rows. Calculations for AVG and MAX include the current row and the previous 5 rows.

Try the following VIEW yourself, observing its output in SQLLine. Instructions are given above.

CREATE OR REPLACE VIEW avg_5rows
DESCRIPTION 'Avg and max order over 5 rows' AS
SELECT STREAM ROWTIME,
     s.ROWTIME as "rowtime",
     amount as "order",
     AVG(amount) OVER "5rows"       AS "avg_order",
     MAX(amount) OVER "5rows"       AS "max_order",
ticker as "ticker"
FROM ticker_rt AS s
WINDOW "5rows"   AS (
     ROWS 5 PRECEDING)
;

The illustration below shows the results of a query run on the view defined above. In the illustration below, for the first row, only that row is included in the window. The 6th row calculates values using data from rows 1, 2, 3, 4, 5, and 6. The 10th row calculates values using data from rows 5, 6, 7, 8, 9, and 10.

Partitioned Windows

Partitioned windows contain rows that meet the WINDOW criteria and whose value in the PARTITION BY column matches the value in the current row. It is also possible for multiple columns to be listed after PARTITION BY, in which case all values in those columns must match. Note: PARTITION BY is evaluated before ORDER BY and before window frames.

Try the following VIEW yourself, observing its output in SQLLine. Instructions are given above.

CREATE OR REPLACE VIEW ticker_prt
DESCRIPTION 'Avg and max order partitioned by ticker' AS
SELECT STREAM ROWTIME,
     s.ROWTIME as "rowtime",
     amount as "order",
     AVG(amount) OVER "partition"       AS "avg_order",
     MAX(amount) OVER "partition"       AS "max_order",
     ticker as "ticker"
FROM ticker_rt AS s
WINDOW "partition" AS
    (PARTITION BY ticker
    RANGE INTERVAL '5' HOUR PRECEDING)
;

The code above creates a window frame that partitions on the column ticker. Calculations for AVG and MAX include all rows that fall within the 5-hour interval immediately prior to current row and whose value for ticker matches that of the current row. The output is shown in the table below, accompanied by an explanation.

Partitioned Windows Using SESSION

Sessions are a particular kind of partitioned window whose beginning and end are determined by the input data itself. This mechanism is especially important when window duration is not constant or knowable in advance.

For example, you might want to define a session that lasts exactly as long as a mobile network connection, based on a customer ID. Then you can aggregate summary information about purchases and services used during each customer session.

Sessions can make use of START WHEN, END WHEN, TIMEOUT AFTER, or any combination of these rules:

  • You can set Boolean conditions for when the session starts (START WHEN) and ends (END WHEN). (See below.)
  • You can define expiration based on inactivity with TIMEOUT AFTER <interval_expression>. (See below.)

Within s-Server, sessions are defined by session keys. Behavior is as follows:

  • If a row arrives without any session key, then a new session key is returned.
  • Whenever the START WHEN clause evaluates to true, a new session key is returned.
  • Whenever the END WHEN clause evaluates to true, the next call with the previous session key will return a new session key.
  • Whenever the interval defined by TIMEOUT AFTER has elapsed, a new session key will be returned.

In other words, a session can begin even without the START WHEN condition being met. In fact, often there will be no START WHEN condition specified at all. When the first row arrives with a given value for the SESSION ON column, a session begins.

If a row arrives that meets the END WHEN condition, this row belongs to the pre-existing session, assuming one exists. A new session begins only after this row.

Both END WHEN and TIMEOUT AFTER will end a session. But, somewhat counterintuitively, so will START WHEN. Suppose there is an active session, and a new row arrives that meets the START WHEN condition. Immediately, the pre-existing session is terminated; and the new row is assigned a new session key.

Syntax Options

Ordinarily the PARTITION BY subclause must be followed by a value expression (1 or more columns). However, when using SESSION ON, this expression may either be included or omitted. In other words, PARTITION BY can be followed immediately by SESSION ON, as follows:

PARTITION BY
SESSION ON trader
START WHEN (s.action='START')
END WHEN (s.action='END')

That is equivalent to naming the same column(s) after both PARTITION BY and SESSION ON:

PARTITION BY trader
SESSION ON trader
START WHEN (s.action='START')
END WHEN (s.action='END')

It is also possible to designate separate columns for PARTITION BY and for SESSION ON. Notice that ticker is not trader:

PARTITION BY ticker
SESSION ON trader
START WHEN (s.action='START')
END WHEN (s.action='END')

Note: In the case above, some limitations apply — but only if including the session in a JOIN.

SESSION() Window Function

Recall the first specimen above:

PARTITION BY
SESSION ON trader
START WHEN (s.action='START')
END WHEN (s.action='END')

This could also be written with the window function, SESSION(), rather than SESSION ON, producing equivalent results:

PARTITION BY
SESSION( trader
START WHEN (s.action='START')
END WHEN (s.action='END')
)

Constructions using the SESSION() function, as opposed to SESSION ON, are common for sessionized GROUP BY. But this is also supported within the WINDOW clause. See the section on the GROUP BY clause for more examples.

You can include SESSION() within the SELECT list to retrieve the session key associated with a window. (See the examples below.)

Session Defined with START WHEN / END WHEN

In the simple example below, a session for the column trader begins when the column action equals ‘START’ and ends when action equals ‘END’.

CREATE OR REPLACE VIEW stocks.start_end_session
DESCRIPTION 'test session' AS
SELECT STREAM
     AVG(order) OVER "awindow" AS "avg_order",
     SESSION() OVER "awindow" AS "session_id",
     ticker AS "ticker",
     action AS "action",
     trader AS "trader"
FROM ticker_rt AS s
WINDOW "awindow" AS
    (PARTITION BY ticker
    --code for SESSION begins here. SESSION ON must be used with PARTITION BY
    SESSION ON trader
    --uses simple flags for START WHEN and END WHEN
    START WHEN (s.action='START')
    END WHEN (s.action='END')
    RANGE INTERVAL '24' HOUR PRECEDING)
;

Session Defined with TIMEOUT AFTER

Compare the code above with the following modified version. Because no START WHEN logic is defined in the example below, a SESSION ON the column trader begins at the very start of the stream. Or, to be more precise, the first row that arrives for a given value of trader will mark the start of a session. If 2 hours elapse without new rows arriving for a given value of trader, then its current session ends; and a new session begins.

CREATE OR REPLACE VIEW stocks.timeout_session
DESCRIPTION 'test session' AS
SELECT STREAM
     ROWTIME,
     s.ROWTIME AS "rowtime",
     MAX(amount) OVER "awindow" AS "max_order",
     AVG(amount) OVER "awindow" AS "avg_order",
     SESSION() OVER "awindow" AS "session_id",
     ticker AS "ticker",
     trader AS "trader"
FROM ticker_rt AS s
WINDOW "awindow" AS
    (PARTITION BY ticker
    SESSION ON trader
    --indicates that session times out after 2 hours.
    TIMEOUT AFTER interval '2' HOUR
    RANGE INTERVAL '24' HOUR PRECEDING)
;

In this example, based on 2-hour timeout, if 3 rows arrive at time 04:00:00.000, 05:00:00.000, and 08:00:00.000, then the session on trader ends once ROWTIME reaches 07:00:00.001. As a result, the session for trader starts anew; and the value for max_order will be evaluated based on rows arriving at 2019-03-30 07:00:00.001 or after. Hence the value of max_order is seen to decrease at time 08:00:03 below, due to crossing a session boundary:

'rowtime','max_order','ticker','trader'
'2019-03-30 04:00:00.000','1000','IBM','trader2'
'2019-03-30 05:00:00.000','1000','IBM','trader2'
'2019-03-30 08:00:03.000','200','IBM','trader2'

Offset Windows

By default, windows have an upper bound of the current row. For some calculations, you will want to exclude the current row by using an offset window. Otherwise, a given calculation may miss additional rows that come in with the same ROWTIME as the current row, since s-Server begins emitting output rows as soon as the current row hits its system. Using offset windows ensures that you have captured all rows with the same ROWTIIME in a calculation.

However, offset windows introduce some latency. In cases where it is not important to calculate rows with the same rowtime, you can use the default upper bound.

Offset windows enable the exclusion of the “current” row (or the current time interval as expressed in the order by clause) from computation of aggregates such as AVG, VAR_POP, or STDDEV. This makes statistical tools like Bollinger bands or the naive bayes classifier, statistically valid.

In the code below, calculations for AVG and MAX include all rows that fall within a window frame that begins 5 minutes ago from the current row and ends 1 minute ago from the current row.

Try the following VIEW yourself, observing its output in SQLLine. Instructions are given above.

CREATE OR REPLACE VIEW offset
DESCRIPTION 'Avg and max order over 5 rows' AS
SELECT STREAM ROWTIME,
     s.ROWTIME AS "rowtime",
     amount AS "order",
     AVG(amount) OVER "1min_ago" AS "avg_order",
     MAX(amount) OVER "1min_ago" AS "max_order",
ticker AS "ticker"
FROM ticker_rt AS s
WINDOW "1min_ago"   AS (PARTITION BY id
       RANGE BETWEEN INTERVAL '5' MINUTE PRECEDING
       AND INTERVAL '1' MINUTE PRECEDING)
;

The illustration below shows the results of a query run on the view defined above. In the illustration, values for the row with a rowtime of ‘2019-03-30 03:05:30.0’ include rows from ‘2019-03-30 03:00:30.0’ to ‘2019-03-30 03:04:30.0’, but do not include the current row. If no data is available for the offset window, as in the first row, values return null.

Hopping Windows

You can use ORDER BY in the window clause to create a hopping window. Hopping windows slide forward by an set interval of time, such as a minute, 10 minutes, an hour, or 2 hours. To achieve this effect, you use a function that rounds down ROWTIME, either FLOOR() or STEP().

For example, using ORDER BY with ORDER BY FLOOR(s.ROWTIME TO HOUR, you specify that rows will be calculated in hourly chunks.

This window will slide forward at the top of every hour, that is, “hopping” the window forward by 1 hour every hour. Rows ROWTIMEs in the 3:00 hour will accumulate in the window until a row arrives with a ROWTIME of 4:00 or later. At that point, the window starts over, and the first calculation will include only one row, the next two rows, and so on, until a row arrives with a ROWTIME of 5:00 or later. When you use a hopping window with COUNT, for example, counts will go up for the entire hour, then start at zero at the beginning of the new hour. Data from the previous hour is “dropped” from the aggregation.

Hopping windows are particularly useful for aggregations like AVG, VARIANCE, STDDEV on large time windows (such as hours, days, or weeks) on high datarate streams.

Hopping windows have significant benefits.

  • They reduce the amount of system resources required to maintain the window, since the window only needs to maintain aggregated history for each time interval (1 hour in the example above).
  • When used with an offset, such as RANGE BETWEEN INTERVAL ‘60’ MINUTE PRECEDING AND INTERVAL ‘1’ MINUTE PRECEDING, the results for each input row are computed on a window that excludes data from the current row. This is highly desirable since any anomaly in streaming data does not adversely impact computation of Bollinger bands or Naive Bayes classifiers. We recommend using hopping windows with an offset. The offset makes both ends of a window frame hop forward in a 1 minute interval. That is, as each row arrives for the current minute interval, it does not contribute to the aggregation results for that row since its not in the window frame for the current row.
  • When used without an offset, results for hopping windows are computed for each input row with zero latency.

The code below creates a hopping window that resets at the top of the hour. It does so by grouping all rows by hour, using ORDER BY FLOOR(s.ROWTIME TO HOUR). This means “convert rowtimes to the lower bound of the hour”. That is, a rowtime ‘2019-03-30 04:18:00.0’ is converted to ‘2019-03-30 04:00:00.0’. Window calculations include all rows that match the converted rowtime: all rows that evaluate to ‘2019-03-30 04:00:00.0’.

Try the following VIEW yourself, observing its output in SQLLine. Instructions are given above.

CREATE OR REPLACE VIEW hopping
DESCRIPTION 'Avg and max order over hopping window' AS
SELECT STREAM
    s.ROWTIME AS "rowtime",
    amount as "order",
    AVG(amount) OVER "hopping" AS "avg_order",
    MAX(amount) OVER "hopping" AS "max_order",
    ticker AS "ticker"
FROM ticker_rt AS s
WINDOW "hopping"   AS (
    ORDER BY FLOOR(s.ROWTIME TO HOUR)
    RANGE INTERVAL '1' HOUR PRECEDING)
;

The illustration below shows the results of a query run on the view defined above. In the illustration, calculations for the first 9 rows include rows from ‘2019-03-30 03:02:00.0’ to ‘2019-03-30 03:59:45.0’. Once a row arrives later than ‘2019-03-30 03:59:59.999’, calculations reset to newly arriving rows. In the last 5 rows, calculations include rows from ‘2019-03-30 04:02:00.0’ to ‘2019-03-30 04:18:00.0’.

Zero-Width Windows

At times, you may want to return calculations for the current row only, excluding other rows with the same rowtime. You can do so by creating zero-width windows. The row-based window lastZeroRows includes just the current row, and therefore always contains precisely one row. Note that ROWS CURRENT ROW is equivalent to ROWS 0 PRECEDING.

The time-based window lastZeroSeconds contains all rows with the same timestamp, of which there may be several. Note that RANGE CURRENT ROW is equivalent to RANGE INTERVAL ‘0’ SECOND PRECEDING.

Try the following VIEW yourself, observing its output in SQLLine. Instructions are given above.

CREATE OR REPLACE VIEW zero_width
DESCRIPTION 'Avg and max order over zero width window' AS
SELECT STREAM ROWTIME,
     s.ROWTIME as "rowtime",
     amount as "order",
     AVG(amount) OVER lastZeroRows       AS "avg_order",
     MAX(amount) OVER lastZeroSeconds       AS "max_order",
ticker as "ticker"
FROM ticker_rt AS s
WINDOW
     lastZeroRows AS (ROWS CURRENT ROW),
     lastZeroSeconds AS (RANGE CURRENT ROW)
;

In the illustration below, all rows contain calculations for their data only.

Allowed and Disallowed Window Specifications

SQLstream s-Server supports nearly all windows that end with the current row or a specified interval preceding the current row.

You cannot define an infinite window, a negative-sized window, or use negative integers in the window specification.

Infinite windows are windows with no bounds. Typically these point into the future, which for streams is infinite. For example “ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING” is not supported, because in a streaming context such a query would not produce a result, since streams are continually expanding as new data arrives. All uses of UNBOUNDED FOLLOWING are unsupported.

Negative windows. For example, “ROWS BETWEEN 0 PRECEDING AND 4 PRECEDING” is a window of negative size and is therefore illegal. Instead, you would use: “ROWS BETWEEN 4 PRECEDING AND 0 PRECEDING” in this case.

Offset windows that end with FOLLOWING are supported only for tables. For example, “ROWS BETWEEN UNBOUNDED PRECEDING AND 4 FOLLOWING” is not supported. (Window spans CURRENT ROW rather than starting or ending there.)

Windows defined with negative integers. For example, “ROWS BETWEEN -4 PRECEDING AND CURRENT ROW” is invalid because negative integers are disallowed.

Also, the special case of … 0 PRECEDING (and … 0 FOLLOWING) cannot be used for windowed aggregation; instead, the synonym CURRENT ROW can be used.

For windowed aggregation, partitioned windows are allowed, but ORDER BY must not be present.

For windowed join, partitioned windows are NOT allowed, but ORDER BY can be present if it sorts by the ROWTIME column of one of the inputs.