Using Telemetry to Monitor Performance

Telemetry provides information on the structure, status, and progress of data flowing in s-Server, such as the net input and output bytes for a particular stream, when a stream started processing data, and when a stream stopped processing data. Telemetry allows you to see the overall structure of your data pipelines, how your data is flowing and where it is actually transformed, and the progress and throughput of data moved through pipelines, so that, for example, you can identify performance-slowing bottlenecks. The management views SESSIONS_VIEW and STATEMENTS_VIEW may both be useful in identifying graphs and nodes related to your streaming application.

This topic contains the following information:

Background Concepts: Stream-graphs, Nodes, and s-Server

To understand telemetry and related concepts, it helps to have a clear picture of how s-Server computes the results of a query.

s-Server consist of two layers:

  1. A query engine that maintains a collection of SQL definitions and translates a SQL statement into a specified data-flow.
  2. A data engine that executes and coordinates these data-flows.

The specification of a data-flow is called a physical query plan. The actual data flow is called a stream graph, which is a chain of stream nodes, each of which is a small software machine that applies some relational operation to its input data stream(s) to produce is output data stream(s). Stream graphs are also referred to as stream operators. Nexus nodes are defined by a CREATE STREAM statement. They are freestanding nodes to which other stream graphs can connect and disconnect, in order to receive or deliver data.

Nodes Stream Graph

In the most common case, a node has 1 input and 1 output, and so the graph is a linear chain of operations. But data flows can merge and split, and the graph of a new query can connect to the graphs of existing queries as they run.

Reading Telemetry Data

Telemetry data is available through system functions that you can query using SQL. Functions are available as both tables and streams. Tables provide a single snapshot of the server, while streams return continually-updating information.

Category Table Function Stream Function
Stream Graph Virtual Table/Stream. Contains a row to describe each stream-graph. Stream-graphs represent one SQL statement running in s-Server, usually either a SELECT, INSERT, or a running PUMP. getStreamGraphInfo(session integer, recent float) getStreamGraphInfoForever(session integer, period float)
Stream Operator (Node) Virtual Table/Stream. Contains a row to describe each stream-node. Stream-nodes represent one step in the running implementation of a SQL statement. getStreamOperatorInfo(session integer, recent float) getStreamOperatorInfoForever(session integer, period float)
Server Info Virtual Table/Stream. Contains a single row, which describes s-Server as a whole. Each column is some attribute (status or statistic) of the overall server. getServerInfo(recent float) getServerInfoForever(period float)

These functions take the following arguments:

Argument Definition
SESSION getStreamGraphInfo and getStreamOperatorInfo only. Sessions refer to a single connection to s-Server. You can designate a session ID, or use 0 to mean all sessions. To determine session, you can query sys_boot.mgmt.sessions_view: select session_name, id from sys_boot.mgmt.sessions_view (or use the SQL script at $SQLSTREAM_HOME/support/sql/showSessions.sql). This query will return results along the following lines:

Session_Name: LOCALDB.StreamLab_Output_Buses.pipeline_2_step_3-to-pipeline_2_out_sink_1-Pump ID: 7338* You should be able to identify your desired session by session_name. Use the corresponding ID to get information on this session. See the topic s-Server Features for Managing and Monitoring for more details on sys_boot.mgmt.sessions_view.
RECENT In seconds, how recent a snapshot is required. 0 will force a new snapshot, 10 means use a extant snapshot no more than 10 secs old. You can use RECENT to get a pair of results that match the same snapshot. For more detail, see How RECENT Works below.Note: It's a good idea to add 0.5 sec to RECENT for the second query. This allows room for the 2nd query to be processed slightly later than the 1st.
PERIOD In seconds, how often to fetch and emit new data.

You query these using a statement along the following lines:

SELECT NAME_IN_QUERY_PLAN, NODE_ID, NET_INPUT_ROWS, NET_INPUT_BYTES, NET_OUTPUT_ROWS, NET_OUTPUT_BYTES FROM TABLE (sys_boot.mgmt.getStreamOperatorInfo(0, 10));

Note: Columns do not require quotation marks.

Note that the above query uses '0' for SESSION. This tells s-Server "give me information on all sessions". This query returns a result along the following lines:

NAME_IN_QUERY_PLAN NODE_ID NET_INPUT_ROWS NET_INPUT_BYTES NET_OUTPUT_ROWS NET_OUTPUT_BYTES
[LOCALDB, StreamLab_Output_Meetup, dashboard_map_step_2] 325006 36684 733688 25216 504312
[LOCALDB, StreamLab_Output_Meetup, dashboard_map_step_2] 325006 40779 815584 29310 586208
[LOCALDB, StreamLab_Output_Meetup, us_rsvps] 325008 41870 837408 113269 2265384
[LOCALDB, StreamLab_Output_Meetup, dashboard_bar_CHARt_step_2] 325010 42614 852288 0 0
[LOCALDB, StreamLab_Output_Meetup, us_rsvps] 325008 46490 929792 127127 2542536
[LOCALDB, StreamLab_Output_Meetup, dashboard_bar_CHARt_step_2] 325010 47316 946328 0 0

To see all the info about all the statements that belong to session 7338, run the query:

SELECT * FROM TABLE(SYS_BOOT.MGMT.getStreamGraphInfo(7338, 0));

As of version 6.0, you no longer need to use the preface SYS_BOOT.MGMT.

Filtering Data

You can restrict your results using a SELECT list or WHERE clauses, and you can compute expressions based on the telemetry data. That is, you can use SQL to limit or filter telemetry data. The management views SESSIONS_VIEW and STATEMENTS_VIEW may both be useful in identifying graphs and nodes related to your streaming application.

All streams created by users have a NAME_IN_QUERY_PLAN that corresponds to the fully qualified sql name of the stream, which has the format [LOCALDB, , ].

Therefore, you could filter for only named streams by running the following query:

SELECT STREAM * FROM STREAM(sys_boot.mgmt.getStreamOperatorInfoForever(0, 10)) WHERE NAME_IN_QUERY_PLAN LIKE '%LOCALDB%';

To filter only streams from a particular schema, you could run a query along the following lines:

SELECT SELECT STREAM * FROM STREAM(sys_boot.mgmt.getStreamOperatorInfoForever(0, 10)) WHERE NAME_IN_QUERY_PLAN LIKE '%LOCALDB, my-schema%';

You can also query individual columns, as in

SELECT STREAM NAME_IN_QUERY_PLAN, NODE_ID, NET_INPUT_ROWS, NET_INPUT_BYTES,  NET_OUTPUT_ROWS, NET_OUTPUT_BYTES FROM STREAM(sys_boot.mgmt.getStreamOperatorInfoForever(0, 10));

A full list of columns for each virtual table/stream appears at the end of this page.

As of version 6.0, you no longer need to use the preface SYS_BOOT.MGMT.

Moving Telemetry Data into a Native Stream

You can use a pump to move telemetry data into a stream that you've created. You first need to set up a stream that will accept columns, in order, from the telemetry table/stream that you want to use. The example below uses columns from getStreamOperatorInfoForever().

CREATE OR REPLACE SCHEMA "telemetry-results";
SET schema '"telemetry-results"';

CREATE OR REPLACE STREAM "telemetry-results-stream"

(NODE_ID VARCHAR(32),
LAST_EXEC_RESULT VARCHAR(32),
NET_INPUT_ROWS BIGINT,
INPUT_ROWTIME_CLOCK TIMESTAMP,
NET_INPUT_BYTES BIGINT,
NAME_IN_QUERY_PLAN VARCHAR(32),
NET_OUTPUT_ROWS BIGINT,
OUTPUT_ROWTIME_CLOCK TIMESTAMP,
NET_OUTPUT_BYTES BIGINT
)
;

CREATE OR REPLACE PUMP "telemetry-results"."my-Pump" STOPPED AS
INSERT INTO "telemetry-results"."telemetry-results-stream"
(NODE_ID, LAST_EXEC_RESULT, NET_INPUT_ROWS, INPUT_ROWTIME_CLOCK, NET_INPUT_BYTES, NAME_IN_QUERY_PLAN, NET_OUTPUT_ROWS, OUTPUT_ROWTIME_CLOCK, NET_OUTPUT_BYTES)
SELECT STREAM NODE_ID, LAST_EXEC_RESULT, NET_INPUT_ROWS, INPUT_ROWTIME_CLOCK, NET_INPUT_BYTES, NAME_IN_QUERY_PLAN, NET_OUTPUT_ROWS, OUTPUT_ROWTIME_CLOCK, NET_OUTPUT_BYTES FROM STREAM (sys_boot.mgmt.getStreamOperatorInfoForever(0, 1));
--selects columns from telemetry Stream Operator stream

By default, pumps are created as stopped. That means when you initially create them, they don't move any data.

To start the pump, enter the following line:

ALTER PUMP "telemetry-results"."my-Pump" START;

To query the new stream, enter the following line:

SELECT STREAM * from "telemetry-results"."telemetry-results-stream";

How RECENT Works

Here the 2 requests session 2, and the 0 require a snapshot no older than 0 secs. That is, it forces a new snapshot.

If you asked instead:

SELECT * FROM TABLE(SYS_BOOT.MGMT.getStreamGraphInfo(2, 5));

the system would use a snapshot it already had if this snapshot was no older than 5 seconds. (If that data is good enough, it saves a little work to use the existing snapshot.)

Or consider:

SELECT * FROM TABLE(SYS_BOOT.MGMT.getStreamGraphInfo(0, 0));

Session = 0 means report on all sessions (since the lowest session ID is 1).

You can use RECENT to match queries:

SELECT * FROM TABLE(SYS_BOOT.MGMT.getServerInfo(0, 0));
SELECT * FROM TABLE(SYS_BOOT.MGMT.getStreamGraphInfo(0, 2));
SELECT * FROM TABLE(SYS_BOOT.MGMT.getStreamOperatorInfo(0, 2));

We want all 3 parts to come from the same snapshot, so that all the data is consistent. Note that the 1st query forces a new snapshot, and the following 2 queries accept stale data (up to 2 seconds state). They will thus re-use the snapshot from the 1st query.

As of version 6.0, you no longer need to use the preface SYS_BOOT.MGMT.

Table/Stream Descriptions

Stream Graph Virtual Table/Stream

(getStreamGraphInfo/getStreamGraphInfoForever)

This table/stream returns rows for each stream graph. A stream graph generally corresponds to one SQL statement (DML or query). Stream graph data offers you an overview of how a particular node in a stream graph is processing data, including information on how much data it has processed since inception (in bytes and rows), how quickly it is processing data, and how much data it has output since inception.

We recommend looking at StreamGraphInfo first. Once you have identified a stream graph with issues, you can then use StreamOperatorInfo to see how the operation is doing in more detail, likely by filtering on graph_id, which serves as a foreign key between the two tables/streams.

The set of columns fall into four groups: identification, description, status, and statistics. Identification and description are stable identifiers and CHARacteristics for the node, including numerical identifiers for the node and stream_graph, and so on. Status and statistics offer information on what the stream node is doing now, how long it has been doing it, how much data it has processed, how quickly it is doing so, and so on. The former are useful for determining information about a node, and the latter are useful for understanding how the stream node is performing.

Identification and Descriptive Columns

  • node_id is a VARCHAR. It consists of a dotted pair of integers, such as 1.0 or 2.3. The first number represents the stream graph, and the second the stream graph node. 1.0 means node #0 in stream graph #1, and 4.3 means node #3 in stream graph #4.
  • graph_id is the same as the first (graph) part of node_id. It is provided as a column to serve as a foreign key.
  • name_in_query_plan is for advanced debugging. It is a generated name that matches the logical and physical query plans as reported in the s-Server trace log. The overall structure of the data flow graph is indicated by listing the node_id of the input neighbor and the output neighbor.

Note: The s-Server query makes a unique identifier for each stream node.

Note: All errors that are logged to the trace log are also available in a system management stream called sys_boot.mgmt.all_trace. The latest (up to 1000) error messages are also available in a table called sys_boot.mgmt.saved_trace. You can use these to create visualizations of errors; they can be accessed from SQLline or anywhere that has a JDBC connection to s-Server. See the Global Error Stream for more details.

Status Fields

These fields are helpful in answering questions like "why is my pipeline stuck?" or "why won't the scheduler do what I want it to do next?" Important columns include the following:

  • sched_state. Whether node is blocked (needs more input / needs room to output), suspended, finished, or ready to run. The values of sched_state are:
    • R* running
    • R ready to run
    • N not runnable, because a neighbor is running
    • B blocked for data (underflow or overflow)
    • T suspended externally,
    • E finished (has reached end of input stream)
    • C stream graph is closed (but can be re-opened and re-executed)
    • Z stream graph was removed (SQL statement is closed).
  • last_exec_result. Indicates why the node is blocked. UND means underflow, and OVF means overflow. EOS means the node has seen the end-of-stream marker in its input; it is finished.
  • num_busy_neighbors. When the data engine puts a DOUBLE-buffer between two adjacent data nodes, they both can run at the same time. But when a single-buffer lies between two adjacent data nodes, only one can run at a time. A value num_busy_neighbors > 0 means the node is blocked because of a busy buffer.
  • input_rowtime_clock. The rowtime field of the next available input row, or else the next input rowtime bound.
  • output_rowtime_clock. The rowtime field of the latest row output by this node. The clock rowtimes show how far the data has progressed down the pipeline. This shows latency, overall and between nodes. This is important for diagnosing when a pipeline is stuck because a node is waiting for an earlier row from an empty input buffer.

Input and output are measured in bytes and bytes/sec. Input and output are also measured in rows and rows/sec when possible. Most stream operators process row by row, and so can count rows. But a few process a whole buffer at a time, and since row-length can vary, we don't pay the cost of converting, and sometimes report null for the stats in terms of rows.

These are the columns returned by the functions getStreamGraphInfo and getStreamGraphInfoForever:

Column Name Type Definition
measured_at TIMESTAMP NOT NULL Time of query snapshot.
graph_id INT NOT NULL Numerical identifier for stream graph. Numbered from 1 as graphs are added to scheduler. The same column appears in the Stream Node Virtual Table, serving as a foreign key. You can use graph_id in conjunction with the Stream Node table to learn more about individual nodes in a graph.
statement_id INT 0 if not from a SQL statement. These are listed in the monitoring view called STATEMENTS_VIEW. See s-Server Features for Managing and Monitoring for more details.
session_id INT Numerical identifier for the session, which is equates to JDBC Connection to s-Server. These are listed in the monitoring view called SESSIONS_VIEW. See s-Server Features for Managing and Monitoring for more details.
source_sql VARCHAR(2048) SQL for statement.
sched_state CHAR(2) State of the stream graph in the scheduler. See sched_state above for values.
close_mode CHAR(4) When the stream graph was closed, and how it was closed.
is_global_nexus BOOLEAN Indicates a named stream. These are defined by a CREATE STREAM statement. They are freestanding nodes to which other stream graphs can connect and disconnect, in order to receive or deliver data.
is_auto_close BOOLEAN Whether or not the stream will close automatically at state E
num_nodes INT NOT NULL Number of nodes in the stream graph. These may be shared with other stream graphs, as in the case of named streams.
num_live_nodes INT NOT NULL Number of open nodes not at state E.
num_data_buffers INT NOT NULL Number of data buffers between nodes.Note: Two adjacent nodes share a data buffer; the upstream node writes rows to the buffer, and the downstream node reads rows from the buffer.
total_execution_time DOUBLE Time spent executing this stream graph in seconds.
total_opening_time DOUBLE Time spent setting up this stream graph in seconds.Note: When a user sends a SQL query to the server, the query engine sends a query plan to the data engine layer. The data engine then "sets up" the query, by performing a number of steps, including allocating and initializing software objects that do the data operations, allocating data buffers to stand between the operators, adding to internal control & bookkeeping data structures, and connecting the graph to one or more global nexuses.
total_closing_time DOUBLE Time spent tearing down this graph in seconds.Note: When the user ends the query, the data engine stops executing the stream graph and does the set-up in reverse.
net_input_bytes BIGINT Overall input read by the stream graph since it started, in bytes.
net_input_rows BIGINT Overall input read by the stream graph since it started, in rows.
net_input_rate DOUBLE Overall input rate, averaged since the stream graph started, as bytes/sec.
net_input_row_rate DOUBLE Overall input rate, averaged since the stream graph started, as rows/sec.
net_output_bytes BIGINT Overall output written by the stream graph since it started, in bytes.
net_output_rows BIGINT Overall output written by the stream graph since it started, in rows
net_output_rate DOUBLE Overall output rate, averaged since the stream graph started, as bytes/sec.
net_output_row_rate DOUBLE Overall output rate, averaged since the stream graph started, as rows/sec.
net_memory_bytes BIGINT Bytes of working memory now for whole stream graph.
max_memory_bytes BIGINT Maximum bytes of working memory since stream graph opened.
when_opened timestamp System time when stream graph was opened.
when_started timestamp System time of first execution of stream graph.
when_finished timestamp System time of latest execution of stream graph.
when_closed timestamp System time when stream graph was closed.

Stream Operator (Node) Virtual Table/Stream

(getStreamOperatorInfo/getStreamOperatorInfoForever)

This table/stream offers you an overview of how a node (also known as a stream operator) is processing data.

Columns for these rows fall into four groups: identification, description, status, and statistics. Identification and description are stable identifiers and CHARacteristics for the stream, including numerical identifiers for the stream graph, its session number, and so on. Status and statistics offer information on what the stream graph is doing now, how long it has been doing it, how much data it has processed, how quickly it is doing so, and so on. The former are useful for determining information about a stream graph, and the latter are useful for understanding how the stream graph is performing.

Identification columns consist of graph_id and statement_id. These are both unique numerical identifiers.

Description columns include session_id, source_sql, is_global_nexus, is_auto_close, num_nodes, and num_data_buffers. These tell you what the stream looks like before any processing occurs.

Status columns include sched_state, num_live_nodes, and num_data_buffers. These columns tell you how the scheduler is handling the stream, as well as how many active nodes the stream is engaging.

Statistics columns include net_execution_time, net_schedule_time, net_input_bytes, net_input_rows, net_input_rate, net_input_row_rate, net_output_bytes, net_output_rows, net_output_rate, and net_output_row_rate.

These are the columns returned by getStreamOperatorInfo / getStreamOperatorInfoForever:

Column name Type Definition
measured_at TIMESTAMP NOT NULL Time of query snapshot.
node_id VARCHAR(8) NOT NULL Node identifier, such as 4.6 for the 6th node in the 4th stream graph
graph_id INT NOT NULL Graph identifier. This column also appears in the Stream Graphs table/stream.
source_sql VARCHAR(2048) SQL for stream graph.
query_plan VARCHAR The query plan executed by this node. See EXPLAIN PLAN examples.
name_in_query_plan VARCHAR A generated name that matches the logical and physical query plans as reported in the s-Server trace log. All streams created by users have a NAME_IN_QUERY_PLAN that corresponds to the fully qualified name of the stream, which has the format [LOCALDB, , ].
num_inputs INT NOT NULL Count of input neighbors.
input_nodes VARCHAR(64) NOT NULL node_id numerical identifier of input neighbors.
num_outputs INT NOT NULL Count of output neighbors.
output_nodes VARCHAR(64) NOT NULL node_id numerical identifier of output neighbors.
sched_state CHAR(2) State of the stream graph in the scheduler. See sched_state above for values.
last_exec_result CHAR(3) Latest code returned by ExecStream::execute() -- e.g. UND (underflow), OVF (overflow). See last_exec_result above.
num_busy_neighbors INT When positive, indicates that this node is blocked because some neighboring nodes are busy.
input_rowtime_clock timestamp ROWTIME of the latest input row read by this node. See input_rowtime_clock above.
output_rowtime_clock timestamp ROWTIME of the latest output row written by this node. See input_rowtime_clock above.
execution_count BIGINT Number of times this node has been executed by the scheduler.
started_at timestamp System time of first execution.
latest_at timestamp System time of latest execution.
net_execution_time DOUBLE Total time node has executed in seconds.
net_schedule_time DOUBLE Total overhead scheduling for operator in seconds.
net_input_bytes BIGINT Overall input read by the stream node since it started, in bytes.
net_input_rows BIGINT Overall input read by the stream node since it started, in rows.
net_input_rate DOUBLE Overall input rate, averaged since the stream node started, as bytes/sec.
net_input_row_rate DOUBLE Overall input rate, averaged since the stream node started, as rows/sec.
net_output_bytes BIGINT Overall output written by the stream node since it started, in bytes.
net_output_rows BIGINT Overall output written by the stream node since it started, in rows
net_output_rate DOUBLE Overall output rate, averaged since the stream node started, as bytes/sec.
net_output_row_rate DOUBLE Overall output rate, averaged since the stream node started, as rows/sec.
net_memory_bytes BIGINT Bytes of working memory now for this node.
max_memory_bytes BIGINT Maximum bytes of working memory since node opened.

Server Info Virtual Table/Stream

(getServerInfo/getServerInfoForever)

The Server Info Virtual Table provides a summary of the entire server. This includes information on the number of currently active sessions, the number of currently active SQL statements, the number of currently executing threads and the number of stream graphs currently open. Because it summarizes the entire server, this table (getServerInfo) has only one row; the stream (getServerInfoForever) returns one row per snapshot.

These are the columns returned by getServerInfo / getServerInfoForever:

Column Type Meaning
measured_at TIMESTAMP NOT NULL Time of snapshot.
is_running BOOLEAN NOT NULL Whether or not server is running.
is_licensed BOOLEAN NOT NULL Whether or not server has some kind of license.
license_kind VARCHAR(32) Type of license.
license_version VARCHAR(32) Version number for license.
is_throttled BOOLEAN NOT NULL Whether or not the server is currently throttled.
num_sessions INT Number of currently active sessions.
num_statements INT Number of currently active SQL statements.
started_at timestamp Time server started.
throttled_at timestamp Time throttling started. If server is not throttled, value is null.
throttle_level DOUBLE Throughput limit enforced by throttling. If server is not throttled, value is null.
num_exec_threads INT Number of currently executing threads.
num_stream_graphs_open INT Number of stream graphs currently open.
num_stream_graphs_closed INT Number of stream graphs currently closed.
num_stream_operators INT Number of stream operated.
num_stream_graphs_open_ever INT Number of stream graphs opened since server started.
num_stream_graphs_closed_ever INT Number of stream graphs closed since server started.
net_memory_bytes BIGINT Total bytes of working memory.
max_memory_bytes BIGINT Max bytes of working memory since server started.
usage_at timestamp Timestamp of last usage snapshot.
usage_since timestamp Usage data is from usage_since to usage_at.
usage_reported_at timestamp Timestamp of last successful upload of usage data.
net_input_bytes_today BIGINT Overall input read by the server since midnight, in bytes.
net_input_bytes BIGINT Overall input read by the server since it started, in bytes.
net_output_bytes BIGINT Overall output written by the server since it started, in bytes.

Common Use Cases

The example queries below mostly show examples of single queries to the finite table functions. If you want to monitor over time, just switch to using the equivalent streaming functions.

Using Telemetry to Track Memory Use

Telemetry tracks two kinds of memory:

  • Some stream operators/nodes require a significant amount of auxiliary memory. For example, windowed aggregation requires remembering data from past rows, and this is kept in a window structure that can be large. The amount of auxiliary memory can vary over time. The Stream Node Table/Stream (getStreamOperatorInfo) keeps track of the current memory used by a stream operator/node in the net_memory_bytes column. It also tracks the maximum value since the operator started in the max_memory_bytes column. Each node, represented by row in getStreamOperatorInfo has its own pair of values. You can query these columns with a statement along the following lines:

      !outputformat vertical
          
      select graph_id,net_memory_bytes,net_input_bytes,net_input_rows,
      from table(stream(getStreamOperatorInfoForever(0, 2)) 
      where sched_state <> 'Z' 
      order by net_memory_bytes desc 
      fetch first 10 rows only;
  • The other kind of memory that can be significant is that used by the data buffers that s-Server uses to pass rows from node to node. You can track this memory using the Stream Graph Table/Stream. In the Stream Graph Table, keeps track of the total of all the buffers in the graph plus all the auxiliary memory required by all the nodes in the graph in the net_memory_bytes column. It also tracks the maximum over the lifetime of the stream graph in the max_memory_bytes column. You can query these columns with a statement along the following lines:

    !outputformat vertical
        
    select graph_id,net_memory_bytes,net_input_bytes,net_input_rows,source_sql
    from table(getStreamGraphInfo(0,2)) 
    where sched_state <> 'Z' 
    order by net_memory_bytes desc 
    fetch first 10 rows only;

    Rather than tracking net_memory_bytes - the current memory use - you may want to track max_memory_bytes - the peak amount of memory ever used by this operator or graph.

Looking for Memory Leaks

A memory leak would show as the net_memory_bytes growing over time for one or more graphs. You cannot see it in a single poll of the telemetry data. Instead, you can either poll multiple times using the query above, or you can use the streaming api with a slow polling frequency, and build a pipeline which detects rising memory allocation over time.

CREATE OR REPLACE SCHEMA "telemetry";
SET SCHEMA '"telemetry"';

CREATE OR REPLACE VIEW "graph_5mins" AS
SELECT STREAM * FROM STREAM(getStreamGraphInfoForever(0,300))
WHERE sched_state <> 'Z';

-- pick up memory from 5 and 60 minutes ago
-- If you have hourly aggregations you may want to check every 1 and 3 or 4 hours

CREATE OR REPLACE VIEW "graph_memory_history" AS
SELECT STREAM graph_id, source_sql, net_memory_bytes
       , LAG(net_memory_bytes, 1, 0) OVER "60min" as memory_5mins_ago
       , LAG(net_memory_bytes, 11, 0) OVER "60min" as memory_60mins_ago
FROM "graph_5mins"
WINDOW "60min" AS (PARTITION BY graph_id RANGE INTERVAL '1' HOUR PRECEDING);

CREATE OR REPLACE VIEW "graph_memory_delta" AS
SELECT STREAM *
     , net_memory_bytes - memory_5mins_ago AS memory_growth_5mins
     , net_memory_bytes - memory_60mins_ago AS memory_growth_60mins
FROM "graph_memory_history";

-- now focus on the graphs which have rising memory
-- we could insist on short term rising as well
-- we could add more periods

CREATE OR REPLACE VIEW "graph_memory_rising" AS
SELECT STREAM *
FROM "graph_memory_delta"
WHERE memory_growth_60mins > 0;

Now monitor the final view; of course ignoring the first cycle of data:

select stream * from "telemetry"."graph_memory_rising";

Using Telemetry to Track CPU

We can track execution time (which approximates to the CPU used):

  • for graphs we track total_execution_time
  • for operators we track net_execution_time

Monitoring Throughput

You can get information on bytes or rows read by a stream graph or operator by calling getStreamGraphInfo() or getStreamOperatorInfo() for a session in your application.

To identify your application's session, we recommend looking at sys_boot.mgmt.sessions_view. You may also be able to identify your application's stream graph by looking at the column NAME_IN_QUERY_PLAN. For named streams, these correspond to the fully qualified sql name of the stream, which has the format [LOCALDB, , ].

To get information on bytes or rows read into a stream graph or operator, query the following columns:

Column Type Description
net_input_bytes BIGINT Overall input read by the stream graph since it started, in bytes.
net_input_rows BIGINT Overall input read by the stream graph since it started, in rows.
net_input_rate DOUBLE Overall input rate, averaged since the stream graph started, as bytes/sec.
net_input_row_rate DOUBLE Overall input rate, averaged since the stream graph started, as rows/sec.
net_output_bytes BIGINT Overall output written by the stream graph since it started, in bytes.
net_output_rows BIGINT Overall output written by the stream graph since it started, in rows
net_output_rate DOUBLE Overall output rate, averaged since the stream graph started, as bytes/sec.
net_output_row_rate DOUBLE Overall output rate, averaged since the stream graph started, as rows/sec.

Overall input means the total input that enters the stream graph or stream operatorfrom the outside.

At server level, s-Server does not record rows or row rates; instead it records

Column Type Description
net_input_bytes BIGINT Overall input read by the server since it started, in bytes.
net_output_bytes BIGINT Overall output written by the server since it started, in bytes.
net_input_bytes_today BIGINT Overall input read by the server since midnight, in bytes.

Monitoring Input to Source Foreign Streams

Source foreign streams are managed by an XO of type AnonymousJavaUdxRel. This XO handles any Java UDX; we can recognize the input foreign streams nodes because they have no input nodes.

The telemetry stats show zero data in (because there are no input nodes); whatever data is read by the plugin is recorded as net_output_rows, net_output_bytes etc as the data is delivered downstream.

Example: Using the StreamLab Gallery Apps - Sydney Buses and CDN, we see:

!outputformat vertical

select xo.node_id,xo.name_in_query_plan,xo.query_plan,xo.net_output_rows, xo.net_output_row_rate 
     , g.source_sql 
from table(getStreamOperatorInfo(0,2)) xo 
join table(getStreamGraphInfo(0,2)) g using (graph_id) 
where xo.name_in_query_plan like 'AnonymousJavaUdxRel%' 
and  xo.num_inputs = 0 
order by xo.net_output_rows desc 
fetch first 5 rows only;

NODE_ID              83.0
NAME_IN_QUERY_PLAN   AnonymousJavaUdxRel.#44650:21684
QUERY_PLAN           AnonymousJavaUdxRel(param0=['KAFKA10-source;137;#
NET_OUTPUT_ROWS      8700
NET_OUTPUT_ROW_RATE  765.7860186133535
SOURCE_SQL           INSERT INTO "LOCALDB"."StreamLab_Output_cdn"."source_1_ns"
(SELECT STREAM "source_1_fs"."ReportTime", "source_1_fs"."ip", "source_1_fs"."clientOS", "source_1_fs"."clientVersion", "source_1_fs"."sessionId", "source_1_fs"."contentId", "source_1_fs"."bufferingDelay", "source_1_fs"."bitRate", "source_1_fs"."cacheMisses"
FROM "LOCALDB"."StreamLab_Output_cdn"."source_1_fs" AS "source_1_fs")

NODE_ID              39.1
NAME_IN_QUERY_PLAN   AnonymousJavaUdxRel.#25825:11569
QUERY_PLAN           AnonymousJavaUdxRel(param0=['org.postgresql.Driver'], param1=['jdbc:postgresql://localhost/demo'], param2=['demo'], param3=['demodemo'], param4=['SELECT *
NET_OUTPUT_ROWS      2422
NET_OUTPUT_ROW_RATE  19708.201443532176
SOURCE_SQL           INSERT INTO "LOCALDB"."StreamLab_Output_buses"."sink_1"
(SELECT STREAM "input"."pixel", "input"."avg", "input"."avgB1_0u", "input"."avgB1_0l", "input"."avgB2_0u", "input"."avgB2_0l", "input"."speed_cat", "input"."id", "input"."reported_at", "input"."speed", "input"."driver_no", "input"."prescribed", "input"."gps_bearing", "input"."gps_lon", "input"."gps_lat", "input"."highway", "input"."name"
FROM "LOCALDB"."StreamLab_Output_buses"."pipeline_1_step_9" AS "input")

NODE_ID              41.1
NAME_IN_QUERY_PLAN   AnonymousJavaUdxRel.#27194:12159
QUERY_PLAN           AnonymousJavaUdxRel(param0=['FILE-source;46;#
NET_OUTPUT_ROWS      2340
NET_OUTPUT_ROW_RATE  10.289060740533513
SOURCE_SQL           INSERT INTO "LOCALDB"."StreamLab_Output_buses"."source_1_ns"
(SELECT STREAM "EXPR$0"."id", "EXPR$0"."reported_at", "EXPR$0"."speed", "EXPR$0"."driver_no", "EXPR$0"."prescribed", "EXPR$0"."gps", "EXPR$0"."highway"
FROM STREAM(SPECIFIC "LOCALDB"."StreamLab_Output_buses"."source_1_throttlefunc"(CURSOR ((SELECT STREAM "source_1_fs"."id", "source_1_fs"."reported_at", "source_1_fs"."speed", "source_1_fs"."driver_no", "source_1_fs"."prescribed", "source_1_fs"."gps", "source_1_fs"."highway"
FROM "LOCALDB"."StreamLab_Output_buses"."source_1_fs" AS "source_1_fs")), 100)) AS "EXPR$0")

3 rows selected (0.336 seconds)
  • The !outputformat vertical command makes it easier to read the long fields like SOURCE_SQL.
  • The fetch first 5 rows only clause ensures we only look at the top 5 inputs (by row)
  • The QUERY_PLAN column identifies the plugin type, but does not directly identify the foreign stream itself.
    • We see two ECDA plugins contain references to the source plugin type, such as AnonymousJavaUdxRel(param0=['FILE-source;46;#.
    • We see a SQL/MED JDBC plugin which contains references to the source driver and URL: AnonymousJavaUdxRel(param0=['org.postgresql.Driver'], param1=['jdbc:postgresql://localhost/demo'], ....
  • The SOURCE_SQL column identifies the SQL being run (in a running application, this is executed by a pump).
    • In simple cases, the FROM clause directly identifies the input foreign stream - for example node 41.1 includes _FROM "LOCALDB"."StreamLab_Output_buses"."source_1fs".
    • If there is a longer pipeline, you may see a view in the FROM clause - for example node 39.1 includes _FROM "LOCALDB"."StreamLab_Output_buses"."pipeline_1_step9". You need to look at the chain of views in the pipeline to discover the underlying foreign stream.

Monitoring Output to Sink Foreign Streams

Sink foreign streams are managed by an XO of type FarragoJavaUdxRel. We can recognize the output foreign streams nodes because they have no input nodes.

!outputformat vertical
         
select xo.node_id,xo.name_in_query_plan,xo.query_plan,xo.net_input_rows, xo.net_input_row_rate 
     , g.source_sql 
from table(getStreamOperatorInfo(0,2)) xo 
join table(getStreamGraphInfo(0,2)) g using (graph_id) 
where xo.name_in_query_plan like 'FarragoJavaUdxRel%' 
and  xo.num_outputs = 0 
order by xo.net_input_rows desc 
fetch first 5 rows only;


NODE_ID             35.1
NAME_IN_QUERY_PLAN  FarragoJavaUdxRel.#25324:11208
QUERY_PLAN          NetworkRel
NET_INPUT_ROWS      0
NET_INPUT_ROW_RATE  0.0
SOURCE_SQL          MERGE INTO "LOCALDB"."StreamLab_Output_buses"."external_table_out_1" "sink"
USING (SELECT STREAM ROWTIME AS "ROWTIME", CAST("pipeline_2_step_2"."id" AS VARCHAR(128)) AS "id", CAST("pipeline_2_step_2"."driver_no" AS INTEGER) AS "driver_no", CAST("pipeline_2_step_2"."speed" AS DOUBLE) AS "speed"
FROM "LOCALDB"."StreamLab_Output_buses"."pipeline_2_step_2" AS "pipeline_2_step_2") AS "input"
ON "sink"."position_time" = "input"."ROWTIME" AND "sink"."bus" = "input"."id"
WHEN MATCHED THEN UPDATE SET "driver_no" = "input"."driver_no"
, "speed" = "input"."speed"
WHEN NOT MATCHED THEN INSERT ("position_time", "bus", "driver_no", "speed") VALUES ("input"."ROWTIME", "input"."id", "input"."driver_no", "input"."speed")

NODE_ID             274.1
NAME_IN_QUERY_PLAN  FarragoJavaUdxRel.#99060:53343
QUERY_PLAN          NetworkRel
NET_INPUT_ROWS      0
NET_INPUT_ROW_RATE  0.0
SOURCE_SQL          INSERT INTO "LOCALDB"."StreamLab_Output_IoT_Weather"."sink_2_fs"
(SELECT STREAM "sink_2"."Color", "sink_2"."Wind_Size", "sink_2"."Feels_Like", "sink_2"."Wind_Chill", "sink_2"."Dew_Point", "sink_2"."Barometric_Pressure", "sink_2"."Rainrate", "sink_2"."Rainfall", "sink_2"."Humidity", "sink_2"."Wind_Direction", "sink_2"."Wind_Speed_KPH", "sink_2"."Wind_Speed_MPH", "sink_2"."Temperature_F", "sink_2"."Temperature_C", "sink_2"."readings_uuid", "sink_2"."device_key", "sink_2"."model_code", "sink_2"."latitude", "sink_2"."longitude", "sink_2"."recorded_at"
FROM "LOCALDB"."StreamLab_Output_IoT_Weather"."sink_2" AS "sink_2")

2 rows selected (0.359 seconds)
  • For the sink streams, we don't get any help identifying the sink type from the QUERY_PLAN column - use SOURCE_SQL as a guide.
  • In these examples, node 35.1 is writing to a PostgreSQL sink table, and node 274.1 is writing to a Kafka topic.

Monitoring throughput in SQLstream (native) streams

User streams are identified by having a name_in_query_plan like '[LOCALDB, , <streamname.]'.

select name_in_query_plan,net_input_rows,net_input_row_rate, net_output_rows, net_output_row_rate 
from table(getStreamOperatorInfo(0,2)) 
where name_in_query_plan like '[LOCALDB%'
order by net_input_rows desc
fetch first 10 rows only

Monitoring Throughput and Identifying Bottlenecks using DATA_RATE_RPS

The telemetry functions tell you the throughput of each stream graph, and of each stream operator within each graph.

If you want to know how each of the SQL views in a pipeline contribute to the overall pipeline's performance, you can use the SYS_BOOT.MGT.DATA_RATE_RPS stream function to read rows and rowtime bounds from each view in a pipeline, one by one in turn.

For more information see Looking for Performance Bottlenecks using DATA_RATE_RPS in the General Management Functions section of the SQL Reference Guide.

Monitoring the entire s-Server

To determine total throughput, you can use getServerInfo()..

!outputformat vertical

select * 
from table(getServerInfo(2)) 
;

MEASURED_AT                    2021-05-07 11:30:16.691
IS_RUNNING                     true
IS_LICENSED                    false
LICENSE_KIND                   
LICENSE_VERSION                
IS_THROTTLED                   false
NUM_SESSIONS                   12
NUM_STATEMENTS                 14
STARTED_AT                     2021-05-07 10:13:38.919
THROTTLED_AT                   
THROTTLE_LEVEL                 0.0
NUM_EXEC_THREADS               2
NUM_STREAM_GRAPHS_OPEN         24
NUM_STREAM_GRAPHS_CLOSED       50
NUM_STREAM_OPERATORS           84
NUM_STREAM_GRAPHS_OPEN_EVER    389
NUM_STREAM_GRAPHS_CLOSED_EVER  365
NET_MEMORY_BYTES               72332088
MAX_MEMORY_BYTES               373293056
USAGE_AT                       2021-05-07 10:13:38.917
USAGE_SINCE                    
USAGE_REPORTED_AT              
NET_INPUT_BYTES                0
NET_OUTPUT_BYTES               0
NET_INPUT_BYTES_TODAY          0

1 row selected (0.278 seconds)
  • If there are throughput problems, first check for IS_LICENSED and IS_THROTTLED.
    • This example was run against an unlicensed server.
  • For a licensed server, usage data will be reported only if the license vendor string requires it

Identifying Blockages

To identify a blockage, we are looking for an operator in OVF (overflow) state immediately followed by an operatpr in UND (underflow).

SELECT ovf.node_id as blocked_node_id, ovf.sched_state as blocked_state
     , und.node_id as blocking_node_id, und.sched_state as blocking_state
     , gund.source_sql blocking_sql
FROM (
    SELECT * from TABLE(getStreamOperatorInfo(0,1)) 
    WHERE last_exec_result in ('UND', 'YLD', 'RUN')
) und
JOIN (
    SELECT * FROM TABLE(getStreamOperatorInfo(0,1))
    WHERE last_exec_result in ('OVF', 'OVR')
    ) ovf
ON ','||ovf.output_nodes||',' LIKE '%,'||cast(und.node_id AS VARCHAR(5))||',%' 
JOIN (
    SELECT * FROM TABLE(getStreamGraphInfo(0,1)) 
     ) gund 
ON gund.graph_id = und.graph_id
;

Visualizing Telemetry as a Directed Graph

You can use a WebAgent telemetry API to present the telemetry information as a directed graph. This allows developers to view the execution plan for their applications and to drill down to see what stream operators are running, which are processing the most data or using the most resources and in particular to identify blockages. By default the API shows the graph as an SVG image – it can also be presented in other formats (including png, jpg or pdf). The telemetry data may also be provided in json format so that a set of historical snapshots can be collected and analysed off-line.

The developer can see all graphs and operators, and may also drill down on specific sessions (either listing one or more SESSION.IDs or using a pattern match on SESSION.NAME). Each running pump corresponds to an open session whose name is based on the pump name.