Error Handling

Errors can occur while the SQLstream s-Server is evaluating queries. This topic describes the kinds of errors and the different contexts in which they occur, and discusses how the s-Server indicates and handles errors. Of particular interest are errors that occur while a long-running statement executes.

All errors are recorded in the s-Server trace log. You can also access errors through the Global Error Stream.

For a list of error messages see Appendix A - List of Error Messages.

This topic contains information on the following:

Kinds of Errors

An error indicates that a requested operation has failed. Here "operation" means evaluating or preparing a SQL statement, but can also mean fetching each result from the result set of a query, or each execution of a prepared statement.

Errors can be classified by severity and by context.

  • A warning means the operation has completed, but in a correct yet somehow unexpected way.
  • An ordinary error means the operation has failed safely: a DDL or DML command has had no effect, a query has produced no results, and the system is in the state it was in before the requested operation.
  • A fatal error means that the s-Server has detected an illegal internal state, and must restart itself from its last checkpoint. This is abnormal, and should be reported to SQLstream Inc as a bug.

The error context itself has several dimensions:

  • the kind of operation that failed: a DDL statement, a DML statement, a table query, or a streaming query.
  • the stage of statement processing where the error occurs: preparation (divided into parsing, validation, and planning), or execution.
  • whether the error is detected immediately when the request is made, or later (possibly much later, for a long-running query).

Error Handling

The s-Server's SQL parser and validator detect invalid statements at prepare time. To illustrate this, consider some simple statements that act on the stream orders, defined by

CREATE STREAM orders (
   order_id INTEGER NOT NULL,
   customer_id INTEGER NOT NULL,
   item_count INTEGER,
   total_cost DECIMAL(8,2),
   comment VARCHAR(128));

Some examples of prepare time errors are:

  • a syntax error: SELECT * FROM ORDERS;
  • an invalid column: SELECT ITEMS_CT FROM ORDERS;
  • an invalid datatype in an expression: SELECT * FROM ORDERS WHERE (comment > 40);

Errors like these can be detected immediately by the s-Server, which records the error in the s-Server log, and signals the JDBC client driver, which makes the JDBC call fail, throwing a java.sql.SQLException. The s-Server prepares a statement in a context which knows about the JDBC client; this is feasible because preparation is quick.

Errors that occur during statement execution generally result from invalid data, and are tied to a specific row of input. Some examples are:

  • an inserted row rejected for having a null value for a not-null column.
  • an inserted row rejected for violating some other column constraint.
  • an arithmetic error (like divide by zero) evaluating an expression.
  • arithmetic overflow or underflow evaluating an expression
  • arithmetic overflow in the accumulator for an aggregate function, such as SUM().

Handling such run-time errors can be tricky, due to the very nature of a distributed, streaming data system. With SQLstream, independent data-source clients insert data into streams, and independent data-reading clients receive result rows computed from the streaming inputs by applying (potentially complex) chains of relational operators.

Suppose an error in execution occurs as one of these operators processes an "offending row". It is easy for the operator to detect the error, but not so easy to determine who to tell, and how. The prepare-time context, which pointed to a live JDBC client thread waiting for news, is gone. Running on a RAMP, the relational operator performs one step in a large computation: it has no idea what SQL statement has failed, and which clients are affected.

The Control Node has that information, so the error message is relayed from the relational operator to the Control Node to the client.

However, this mechanism is not complete in the current s-Server release. Consequently some kinds of execution errors are not reported directly to a client, namely:

  • an error while executing a prepared INSERT EXPEDITED to insert a row
  • an error receiving a row from the ResultSet obtained from a SELECT STREAM

The same problem holds with a pump: the client context which created or started the pump is gone, and the pump is executing as a "server daemon" thread.

The workaround for this defect is to check the s-Server log for the pertinent error messages or warnings. (See the next topic regarding how to see errors from a remote client via streams.)

Error Logging

The s-Server records every error and warning in the s-Server trace log, using the standard Java logging mechanism. Configuration of system tracing is explained in the topic Using and Managing Trace Files in the SQLstream Administrator Guide. Fatal errors have level FATAL, regular errors have level SEVERE, and warnings have level WARNING. In its default configuration the s-Server also logs major events -- such as the start of execution of any SQL statement -- at level INFO. Thus INFO messages in the log appearing shortly before a SEVERE message often indicate the context of an error.

Example of Error Logging

As a very simple illustration, consider the stream:

orders (order_id INTEGER NOT NULL, customer_id INTEGER NOT NULL, item_count INTEGER,
   total_cost DECIMAL(8,2), comment VARCHAR(128));

One client thread runs SELECT STREAM ROWTIME, *, (total_cost / item_count) as avg_item_cost FROM orders

Another thread prepares INSERT EXPEDITED INTO orders VALUES(?, ?, ?, ?, ?); and executes it, binding the following series of values:

# ROWTIME order_id customer_id item_count total_cost comment
1 9:00:00 1 100 2 1.98 a valid row
2 9:01:00 2 100 two 1.98 a type error
3 9:02:00 3 null 5 3.00 invalid null
4 9:03:00 4 102 0 0.00 zeros are trouble
5 9:04:00 5 101 7 0.84 a valid row

Here rows 2 and 3 are rejected when the insert is executed. Row 4 requires evaluating (0/0), a run-time error, and so is dropped from the result set. The query returns only rows 1 and 5:

ROWTIME order_id customer_id item_count total_cost comment avg_item_cost
9:00:00 1 100 2 1.98 a valid row 0.99
9:04:00 5 101 7 0.84 a valid row 0.12

All errors are recorded in the s-Server log. Here is an excerpt, including INFO lines that show context, but omitting many irrelevant trace messages (marked by...):

Aug 10, 2008 4:20:00 AM net.sf.farrago.db.FarragoDbSession prepare
INFO: INSERT INTO sales.orders (ROWTIME, order_id, customer_id, item_count,
        total_cost, comment)
 VALUES (TIMESTAMP '1999-09-09 9:00:00  1,    100,     2,
        1.98, 'a valid row')

Aug 10, 2008 4:20:01 AM net.sf.farrago.db.FarragoDbSession prepare
INFO: INSERT INTO sales.orders (ROWTIME, order_id, customer_id, item_count,
        total_cost, comment)
VALUES (TIMESTAMP '1999-09-09 9:01:00',  2,   100,    'two',
        1.98, 'a type error')
Aug 10, 2008 4:20:01 AM org.eigenbase.sql.validate.SqlValidatorException <init>
SEVERE: org.eigenbase.sql.validate.SqlValidatorException:
Cannot assign to target field 'ITEM_COUNT' of type INTEGER from
        source field 'EXPR$3' of type CHAR(3)
Aug 10, 2008 4:20:01 AM net.sf.farrago.db.FarragoDbSession prepare
INFO: INSERT INTO sales.orders (ROWTIME, order_id, customer_id, item_count,
        total_cost, comment)
 VALUES (TIMESTAMP '1999-09-09 9:02:00', 3, null, 5, 3.00, 'invalid null')
Aug 10, 2008 4:20:02 AM net.sf.fennel.xo.FennelCalcRel.# 340:671 <native>
WARNING: calculator error [0]:PC=15 Code=22004
Aug 10, 2008 4:20:02 AM net.sf.farrago.db.FarragoDbSession prepare
INFO: INSERT INTO sales.orders (ROWTIME, order_id, customer_id, item_count, total_cost, comment)
 VALUES (TIMESTAMP '1999-09-09 9:03:00',  4, 102,  0,  0.00, 'zeros are trouble')
Aug 10, 2008 4:20:03 AM org.eigenbase.util.EigenbaseException <init>
SEVERE: org.eigenbase.util.EigenbaseException: Overflow during calculation or cast.
INFO: INSERT INTO sales.orders (ROWTIME, order_id, customer_id, item_count, total_cost, comment)
 VALUES (TIMESTAMP '1999-09-09 9:04:00', 5, 101, 7,  0.84, 'a valid row')

Using the Global Error Stream

All errors that are logged to the trace log are also available in a system management stream called SYS_BOOT.MGMT.ALL_TRACE. The latest (up to 1000) error messages are also available in a table called SYS_BOOT.MGMT.SAVED_TRACE. You can use these to create visualizations of errors; they can be accessed from SQLline or anywhere that has a JDBC connection to s-Server.

SYS_BOOT.MGMT.SAVED_TRACE defaults to retain the last 1000 rows. You can increase the number of rows cached in SYS_BOOT.MGMT.SAVED_TRACE by using a property set in Trace.properties:

com.sqlstream.aspen.trace.TraceStreamHandler.history=100000

Increasing this size may be helpful, for example, if you want to analyze large amounts of log data retrospectively.

More information about the trace log can be found in the topic Using Trace Files in the Building Applications Guide. As described in that topic, you can control the level of trace messages -- error, informative, debugging -- is for each component of the server by modifying AspenTrace.properties. You can also change whether or not s-Server summarizes error messages (reports a periodic summary of errors), as described below.

To query the error stream, enter something like the following in sqlline :

SELECT STREAM ROWTIME, * from SYS_BOOT.MGMT.ALL_TRACE;

This will produce a continuous stream of trace messages or errors as they occur. You can substitute specific columns for. These are listed below. Columns are the same for both table and stream.

To query the error table, enter something like the following:

select * from sys_boot.mgmt.saved_trace where error_level = 'SEVERE';

Error Stream Columns

Column Column Type Description
ROWTIME TIMESTAMP Timestamp for the error row itself.
error_time TIMESTAMP Time that error was logged. Differs from the rowtime of the error row itself, which is slightly later than the time the error occurred. Also differs from the rowtime of the input row associated with the error (data_rowtime).
error_level VARCHAR(10) NOT NULL Java trace level, SEVERE, WARNING, and so on. See https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html for details.
is_error BOOLEAN NOT NULL True for errors, false for other traces.
error_class INTEGER Code for error category: 0:other error, 1:ingest error, 2:egest error, 3:syntax error, 4:validation error, 5:planner error, 6:late rowtime, 7:calculator error, 8:other execution error.
sql_state VARCHAR(5) An alphanumeric error code. See Appendix B - List of Error Messages for more details.
error_name VARCHAR(32), Explanation of sql_state. See - Appendix B - List of Error Messages for more details.
message VARCHAR(4096) Entire trace message
thread_id INTEGER Number of the execution thread where the error happened.
session_id INTEGER Number of the client session where the error happened.
statement_id INTEGER The id number for the SQL statement where the error happened.
graph_id INTEGER graph_id and node_id identify which execution object had the error. See explanation of nodes and graphs below.
node_id INTEGER graph_id and node_id identify which execution object had the error. See explanation of nodes and graphs below.
xo_name VARCHAR(128) a unique name of the execution object
error_reporter VARCHAR(256) NOT NULL the name of the java logger that reported the error
error_sql VARCHAR(256) the SQL of the statement
error_backtrace VARCHAR(4096) if available, java backtrace at the point of the errot
data_row TIMESTAMP The input row that was being processed when the error occurred.
data_rowtime VARBINARY(32768) Rowtime of data_row
source_position_key VARCHAR(4096) Identifies the location in the input source, such as a log file or kafka topic.
pump_db VARCHAR(128), Database that includes the pump that runs the statement; null if not in a pump.
pump_schema VARCHAR(128) Schema that includes the pump that runs the statement; null if not in a pump.
pump_name VARCHAR(128) Name of the pump that runs the statement; null if not in a pump.



Summarizing Error Messages

By default, s-Server reports a periodic summary of errors, in order to avoid bloating the trace log. However, this means that the global error stream contains only the summary counts and a few sampled bad rows. If you turn off sampling, you get all the bad rows traced and reported to the error stream.

To adjust sampling, you can change properties in the aspen.properties or aspen.custom.properties files. For more information on modifying aspen.properties, see the topic Managing Java Parameters.

Property Default Description
aspen.trace.sampleLateRows true When this property is set to false, s-Server reports every late row in both the trace log and error stream. When this property is set to true, s-Server reports the first event and then reports a summary every 15 seconds after the first event.
aspen.trace.sampleFailedRows true You can also specify that s-Server report every row that causes a calculator error. When this property is set to false, s-Server reports every row that causes a calculator error. When this property is set to true, s-Server reports the first calculator event and then reports a summary of calculator events every 15 seconds.

Note: All messages (including errors) that are logged to the trace log are also available in a system management stream called SYS_BOOT.MGMT.ALL_TRACE. The latest (by default up to 1000) log messages are also available in a table called SYS_BOOT.MGMT.SAVED_TRACE. You can use these to create visualizations of errors; they can be accessed from SQLline or anywhere that has a JDBC connection to s-Server. See the Global Error Stream for more details.

Explanation of Nodes and Graphs

s-Server consist of two layers:

  • A query engine that maintains a collection of SQL definitions and translates a SQL statement into a specified data-flow.
  • A data engine that executes and coordinates these data-flows.

The specification of a data-flow is called a physical query plan. The actual data flow is called a stream graph, which is a chain of stream nodes, each of which is a small software machine that applies some relational operation to its input data stream(s) to produce is output data stream(s). Nexus nodes are defined by a CREATE STREAM statement. They are freestanding nodes to which other stream graphs can connect and disconnect, in order to receive or deliver data.

In the most common case, a node has 1 input and 1 output, and so the graph is a linear chain of operations. But data flows can merge and split, and the graph of a new query can connect to the graphs of existing queries as they run.

Note: A node does not read data directly from its upstream neighbor. Instead, the upstream neighbor puts its output rows into a block of memory called a buffer, which it shares with the downstream neighbor. The upstream node writes rows to the buffer, and the downstream node reads rows from the buffer.

Because there are more stream nodes than CPU cores, all the nodes cannot actually run simultaneously. Instead, a software component called the scheduler manages the stream nodes. The scheduler runs nodes as needed in order to push data down the pipeline.

The scheduler decides when to execute a node based on its status, which consists of 1) its state (ready/sleeping/blocked/finished), 2) the availability of data for input or room for output, and 3) the - rowtime of the next input row. The scheduler also keeps track of statistics for each execution of each node, measuring the time spent, the amount of data input and output, and so on.