Pumps, Streams, and Processing Pipelines

Streaming Query Execution

By default, each stream query in SQLstream is run independently from all other stream queries. However, it is often desirable for either performance or consistency reasons to share the output of one query as input to one or more other queries. A potential complexity arises from three facts:

  1. that queries can be inserted or closed dynamically while SQLstream is processing streams,
  2. that data can arrive at varying rates, and
  3. that data can be emitted at varying rates after being processed.

In this context, ensuring that all such queries receive identical input from the time each of them becomes active requires some forethought.

Defining STREAM objects as a source for muliple subscribers

In SQLstream this goal is accomplished by defining a stream that all such queries will listen for, and then creating a pump to feed that stream. The pump is based on the source views or queries. Using the pump compensates for the variations in the timing of the data sources; using the stream that the pump feeds ensures that every query listening for that stream sees the same set of results. This procedure enables “processing pipelines”, that is, modular sequences of processing steps, where each step performs filtering, aggregation, and transformation, providing its results to downstream consumers. Each such step thus also provides a public junction where its results may be

  • inspected for debugging purposes,
  • analyzed for SLAs or regulatory compliance,
  • selected and repurposed by streams in other processing pipelines,
  • pumped into sink adapters or other streams, or
  • subscribed by JDBC client applications.

Example of using a View / Stream / Pump pipeline

This example uses the BIDS and ASKS streams already defined in the SALES schema included in the First SQLstream s-Server distributed with the SQLstream product. The view matches bids and asks by ticker, shares, and price within a sliding ten-second window by using a windowed join. Such a join is a streaming join over a time-based subset of records ordered by row timestamps.

DESCRIPTION 'match bids and asks for each ticker over a 10-second time window' AS
B.ROWTIME AS "bidTime",
A.ROWTIME AS "askTime",
B."shares" AS "bidShares",
B."price" AS "bidPrice",
A."shares" AS "askShares",
A."price" AS "askPrice"
JOIN SALES.ASKS AS A ON A."ticker" = B."ticker"
AND A."shares" = B."shares"
AND A."price" = B."price";

After defining the view, you create a related stream to receive the view results and a pump to insert those results into that stream, as follows:

CREATE STREAM "MatchBidsAndAsks"
"bidTime" TIMESTAMP,
"askTime" TIMESTAMP,
"ticker" varchar(5),
"bidShares" INTEGER,
"bidPrice" REAL,
"askShares" INTEGER,
"askPrice" REAL

CREATE PUMP "MatchBidsAndAsksPump" STARTED as
INSERT INTO "MatchBidsAndAsks" SELECT * FROM "MatchBidsAndAsksView";

At this point, any queries that SELECT from “MatchBidsAndAsks” will all see the same data stream at the same time.