Integrating RDBMS Systems

Using s-Server, you can read from and write to RDBMS systems.

This topic contains the following subtopics:

Defining SQL/MED Servers and Foreign Tables

You can read data from and write data to RDBMS sources using the SQL/MED plugin for s-Server. This plugin is pre-installed with s-Server.

In order to access data from an RDBMS system, you need a JDBC driver. Currently, these are preinstalled for Microsoft SQL Server, Oracle, and Teradata, and can be installed for MySQL.

Note: In order to use other JDBC drivers with SQLstream s-Server, you need to 1) copy all required jar files to the $SQLSTREAM_HOME/lib directory, 2) adjust their permissions so that they are readable by the user who runs s-Server (typically the sqlstream user), and 3) restart s-Server.

To read or write data, you first define a server object with connection information, including the URL, user name, and password. Once you define this server object, you can read from the database's tables by either:

  • Defining a foreign table to map onto a table in the foreign database (the preferred method)
  • Using the SQL-standard three-level qualifier scheme for the names of tables: catalog.schema.table. In this case, the catalog name will be the name given to the foreign server in your CREATE SERVER command. See the topic CREATE SERVER in the s-Server Streaming SQL Reference Guide for more details.

See also the CREATE SERVER topic in the Streaming SQL Reference Guide.

Foreign Server Definition

You define a foreign server in SQL using the CREATE OR REPLACE SERVER command. (You can also define servers through s-Studio.) This foreign server always uses a foreign data wrapper called SYS_JDBC. This wrapper is preinstalled with s-Server.

For example, the following code block defines a connection for a MySQL database.

CREATE OR REPLACE SERVER mysql_reader
FOREIGN DATA WRAPPER SYS_JDBC
OPTIONS (
  DRIVER_CLASS 'com.mysql.jdbc.Driver',
  URL 'jdbc:mysql://localhost:3306/sample',
  USER_NAME 'sqlstream',
  PASSWORD 'sqlstream'
DIALECT 'MYSQL',
  );

Once defined, you can reference this table as mysql_reader.schema.table, where "schema" is a schema defined in the MySQL database and "table" is a table defined in this schema.

The following code block defines a connection for a Teradata database:

CREATE OR REPLACE SERVER "Teradata_DB"
FOREIGN DATA WRAPPER "SYS_JDBC"
OPTIONS (
  URL 'jdbc:teradata://localhost/',
  USER_NAME 'sqlstream',
  PASSWORD '',
  DIALECT 'Teradata',
  TERADATA_QUERY_BAND 'org=Finance;report=EndOfYear;universe=west;',
  DRIVER_CLASS 'com.teradata.jdbc.TeraDriver'
);

Note the presence of a TERADATA_QUERY_BAND option. This option is specific to Teradata.

You reference tables in the Teradata database defined above using the following qualifier: Teradata_DB.schema.table, where "schema" is a schema defined in the Teradata database and "table" is a table defined in that schema.

Options for RDBMS Server

Name Input / Output Description
DRIVER Both Class name of the JDBC Driver for the remote database. Fully-qualified name of the JDBC driver class to load. This must be available on the classpath.
URI Both JDBC URI
USER_NAME Both Remote database user.
PASSWORD Both Remote database password
DIALECT Both Type of database, generally auto detected. Valid values are: Oracle, POSTGRES, MYSQL, TERADATA, SQL SERVER
TERADATA_QUERY_BAND Input Optional, and for Teradata connections only. Teradata query bands "tag" the query. For more information, see the Teradata website. This is run before the actual query. The submitted SQL looks like the following: SET QUERY BAND = 'ApplicationName=SQLstream_s-Server;Version=;' FOR SESSION
queryCol Tailing Input Name of the column to use for a highwater mark, such as ROWTIME. No default. See Tailing tables.
pollingInterval Tailing Input In milliseconds, how often to sleep when now new rows are available. Default is 1000. See Tailing tables.
txInterval Tailing Input How many rows to wait before reading a row, in order to ensure that no rows are missed as a result of rollbacks or dirty reads. If queryCol is ROWTIME, you should consider setting txInterval to 1000. Larger values add latency to the query, but make the stream more consistent (safer) in the face of crashes and multiple inserters to the remote database's table. Smaller values add less latency, may result in a missed row, in the remote chance of a dirty read. See Tailing tables.
TRANSACTION_ROWTIME_LIMIT Output The number of milliseconds which may elapse between commits. Defaults to 1000 ms (1 second). See Managing Commits. May also be overridden by an INSERT or MERGE ... WITH OPTIONS clause.
TRANSACTION_ROW_LIMIT Output The number of rows to batch up before committing. Defaults to 0. See Managing Commits. May also be overridden by an INSERT or MERGE ... WITH OPTIONS clause.
JNDI_WRITEBACK Both True/False. Defaults to "false". When you specify the JNDI_WRITEBACK option as "true" in the DDL or in the .properties file, s-Server writes back consolidated options (including options inherited from the parent as well as from the properties file) to the same properties file under $SQLSTREAM_HOME/plugin/jndi. This lets you configure the JNDI file using DDL.
SCHEMA_NAME Both Name of the remote schema.
TABLE_NAME Both Name of the remote table. Normally this is specified only at FOREIGN TABLE / FOREIGN STREAM level, not against the SERVER.
OPTIONS_QUERY Both Optional. Lets you query a table to set one or more of the required options at runtime. You could use this, for example, to set suitable connection options using select URI, "SCHEMA", USERNAME, PASSWORD from TEST.rdbms_options. For more details see the topic Using the Options Query Property. This is usually applied to the FOREIGN TABLE rather than the SERVER



Defining a Foreign Table

When you define an RDBMS foreign table, you

  • Optionally declare columns that match those in the target database table.
  • Indicate the (foreign) SERVER with which this foreign table is associated
  • Indicate a table and schema in the foreign database by setting options for the foreign table.
  • Define any other options that may be needed, or inherit them from the SERVER.

You can set these options when you declare a server, and override these options in the table definition (or by using the OPTIONS_QUERY option to set options at run time).

For more information about RDBMS (SQL/MED) foreign tables see Integrating RDBMS Systems; for the options for ECD adapters, see the adapter documentation in Integrating Guavus SQLstream with Other Systems.

Sample SQL

Like all tables, foreign tables need to be created within schemas. The following code first creates and sets a schema called "MOCHI", then creates a foreign table called "regional_activity" which uses the server defined above and indicates a schema in the foreign database called "public" and a table in the foreign database called "regional_activity".

We assume that the server "Postgres_DB" defines the options needed to access the remote schema.

NOTE there is no need to define the foreign table columns explicitly - they are established by reading from the remote RDBMS's catalog.

CREATE OR REPLACE SCHEMA "MOCHI"
SET SCHEMA 'MOCHI';

CREATE OR REPLACE FOREIGN TABLE "regional_activity"
SERVER "Postgres_DB"
OPTIONS (
( SCHEMA_NAME 'public'            --schema name in the foreign database
, TABLE_NAME 'regional_activity'  --table name in the foreign database
)
DESCRIPTION 'per-city summary of suspicious activity';

JDBC Foreign Data Wrapper Definition

Because this plugin is included as part of the standard s-Server distribution, a corresponding foreign data wrapper instance named SYS_JDBC is predefined by SQLstream s-Server's initialization scripts. Normally, there is no need to define additional instances.

However, it may be useful to define additional wrapper instances corresponding to specific DBMS types. For the JDBC plugin, all SQL/MED options declared on the wrapper are propagated to the server, allowing common option settings to be factored out of individual server definitions. (Note that this is not necessarily true for other plugins.) Options specifically set on a server definition always take precedence over settings inherited from a wrapper.

Reading from RDBMS Sources

Querying RDBMS Tables

You can query JDBC foreign tables in one of two ways:

  • By defining foreign tables within s-Server that map onto tables in the foreign database, such as "my_s-ServerSchema"."foreign_customers". This has the advantage of hiding connection details. You can also restrict privileges to this table. This is the recommended technique for accessing foreign tables.

  • By using the three name qualifier scheme, such as "Oracle_DB".""."customers". Merge currently does not function for PostgreSQL and MySQL.

Using a Foreign Table

See the topic CREATE FOREIGN TABLE in the Streaming SQL Reference Guide for more details. Once you create a foreign table, you can select from it like any other table in s-Server, with the contents of the foreign table automatically fetched.

select_rdbms_source_for_table
Using the Three Name Qualifier Schema

To read from a foreign table use the schema/table as you would for any other table and preface the server name as the catalog.

select_rdbms_source_direct

Tailing Foreign Tables with SQL/MED

In using the SQL/MED adapter to query remote database tables, you will often want to "tail" an external database table. "Tailing" refers to setting up an open-ended query that returns any newly-added rows from the external database table. When new rows are added, s-Server automatically adds these to a streaming query. In order to tail a database, you will need to make use of the STREAM keyword. Before discussing this process, this document will briefly cover a more limited use case of querying a remote database table--a static query on the current contents of the table.

Foreign tables are created automatically once you create a server object using the SQL/MED plugin. This process is described in the topic Reading Data from RDBMS Sources, and involves code along the following lines:

CREATE OR REPLACE SCHEMA "ForeignTableData";
SET SCHEMA '"ForeignTableData"';

CREATE OR REPLACE SERVER mysql_reader
FOREIGN DATA WRAPPER SYS_JDBC
OPTIONS
   (
  DRIVER_CLASS 'com.mysql.jdbc.Driver',
  URL 'jdbc:mysql://localhost:3306/sample',
  USER_NAME 'sqlstream',
  PASSWORD 'sqlstream',
  txInterval '50'
   );

Performing a Query on a Remote Database Table

SELECT * FROM <FOREIGN TABLE> is a relational table read. Returns all the current rows and then an end of statement. You do not need to create a special foreign table object in order to execute this query. You can do so using the the SQL-standard three-level qualifier scheme for the names of tables.

Options for Performing a Tailing Query on a Remote Database Table

SELECT STREAM * FROM <FOREIGN TABLE> (note the use of the STREAM keyword) is a private polling table read. This query returns all current rows and rows in the future. There is no need for a highwater mark when using this option.

SELECT STREAM * FROM <FOREIGN STREAM> (note the use of the STREAM keyword and the FOREIGN STREAM object) is a shared/public polling table read. This option requires you to create a foreign stream object for the remote database table, as described below. This query returns all current rows and rows in the future based upon the current state of the highwater mark. A highwater mark is a data point that defines the current row being read by an s-Server query (this is important only when multiple readers are querying the same SQL/MED foreign stream).

Initially this query will work the same way as SELECT STREAM * FROM <FOREIGN TABLE> (it returns all rows). However, if another query starts a read on the same foreign stream, then this query will pick up at the same point (the highwater mark) as the other query. If no query is currently being run against the foreign stream, the foreign stream does not discard rows. As a result, querying the foreign stream a second time will initially return only new tuples as the stream "catches up" to the current state of the table. Any new query will update the highwater mark to the current highest rowtime or key.

Defining a Tailing Query Using the STREAM Keyword/Table Object Option

In order to tail a remote database table using this option, you prepare a SELECT query with a lower and upper bound on a monotonically increasing column of the foreign stream. For example, given a remote table called TEST_TABLE and a monotonically increasing column called ID, you might use a query along the following lines:

SELECT STREAM * FROM TEST_TABLE WHERE ID > -1 AND ID <= (SELECT MAX(ID) FROM TEST_TABLE) - 2;

The id > ? is the lower bound and id <= (select max(id) from TEST_TABLE) - 2 is the upper bound. The lower bound excludes rows that we have already seen. The upper bound excludes rows that may not have been committed yet.

The 2 in the upper bound is called the txInterval and gives a fudge factor for the inserters to this table. If there will only be 1 inserter to TEST_TABLE, txInterval can safely be set to 0 (it must always be positive). txInterval defaults to 2, but you should set it equal to the maximum total number of concurrent inserters into the foreign table (that is, the maximum number of insert queries that could simultaneously be executing at once).

Another table property, pollingInterval, determines how long to wait between executing the prepared statement against the remote DB table. The MED/JDBC reader only waits if no rows were found in the last execution pass.

Multiple monotonically increasing columns where clause expressions in the select query are OR'ed together (ie (ID > ? AND ID <= (select max(id) from TEST_TABLE)) OR (ID2 > ? AND ID2 <= (select max(id) from TEST_TABLE))).

You can use this process with both foreign streams and foreign tables in a select stream query. In foreign stream use cases, a highwater mark will be stored.

Again, this query only works for private polling. If you want to create a tailing query that lets multiple users read from a remote database table, you will need to create a foreign stream object.

Tailing a Remote Database Table with a Foreign Stream Object

When you tail a remote database table with a foreign stream object, you allow multiple readers to query the remote database table. In order to do so, you need to create a foreign stream and make use of a highwater mark.

You pass highwater marks using an option called "SQLSTREAM_POSITION_KEY" in the OPTIONS clause of FOREIGN STREAM definition.

Highwater marks can be passed dynamically through an OPTIONS_QUERY option on the foreign stream. This lets you change the highwater mark if the foreign stream is being queried by different users. See Using the OPTIONS_QUERY Option below.

SQL for Creating Foreign Stream to Tail Foreign Table

The following SQL first creates a schema in which the foreign stream will reside, called ForeignTableData. The next block of SQL creates a server object.

CREATE OR REPLACE SCHEMA "ForeignStreamData";
SET SCHEMA '"ForeignStreamData"';

CREATE OR REPLACE SERVER mysql_reader
FOREIGN DATA WRAPPER SYS_JDBC
OPTIONS
  (
 DRIVER_CLASS 'com.mysql.jdbc.Driver',
 URL 'jdbc:mysql://localhost:3306/sample',
 USER_NAME 'sqlstream',
 PASSWORD 'sqlstream',
 txInterval '1000'
  );

CREATE OR REPLACE FOREIGN STREAM "ForeignTableDataStream"
OPTIONS
(
"id" BIGINT NOT NULL,
"col1" VARCHAR(10),
"col2" DOUBLE,
"col3" INTEGER
)
SERVER "mysql_reader"
  (
   TABLE_NAME 'sample_data',
   queryCol 'id',
   --highwater mark
   "SQLSTREAM_POSITION_KEY" '1000',
   SCHEMA_NAME 'public',
   DESCRIPTION 'Foreign stream to deliver rows appended to
      public.sample_data. Foreign streams are a shared or public polling table
      read.'
  )
;

Using the OPTIONS_QUERY Option for Highwater Marks

When you use the OPTIONS_QUERY, you can pass options to the foreign stream dynamically without the need to change the FOREIGN STREAM statement itself.

In the following example, each user can pass her own watermark to the foreign stream while running the query. When the SELECT query is run on ForeignTableDataStream, s-Server also runs OPTIONS_QUERY to fetch new values of for options.

CREATE OR REPLACE SCHEMA "ForeignStreamData";
SET SCHEMA '"ForeignStreamData"';

CREATE OR REPLACE SERVER mysql_reader
FOREIGN DATA WRAPPER SYS_JDBC
OPTIONS
  (
 DRIVER_CLASS 'com.mysql.jdbc.Driver',
 URL 'jdbc:mysql://localhost:3306/sample',
 USER_NAME 'sqlstream',
 PASSWORD 'sqlstream',
 txInterval '1000'
  );

CREATE OR REPLACE FOREIGN STREAM "ForeignTableDataStream"
OPTIONS (
"id" BIGINT NOT NULL,
"col1" VARCHAR(10),
"col2" DOUBLE,
"col3" INTEGER
)
SERVER "mysql_reader"
  (
   TABLE_NAME 'sample_data',
   queryCol 'id',
   "OPTIONS_QUERY" 'SELECT SQLSTREAM_POSITION_KEY, USER_NAME, PASSWORD FROM my_config_table WHERE STREAM_NAME = ''ForeignTableDataStream''',
   SCHEMA_NAME 'public',
   DESCRIPTION 'Foreign stream to deliver rows appended to
      public.sample_data.'
  )
;

Monotonically increasing columns are indicated either using the queryCol table property or by MONOTONIC_=true table property. For example the following are equivalent:

queryCol 'ID'

MONOTONIC_ID 'true'

queryCol and the Foreign Stream's High Water Mark

Since table entries from multiple insertion sources can arrive without being in strict order, a newly arriving row can have a queryCol value higher than a prior row. If earlier rows with lower queryCol values were already streamed, a new arriving row with a higher queryCol value is seen as "out of order" and discarded.

You can minimize this data loss by specifying a querytSort parameter. This parameter establishes an interval in which rows with out-of-order queryCol can arrive. The querytSort value you choose enables the streaming of rows whose queryCol values are lower than "high-water mark minus querytSort". As incoming rows establish a new high-water mark, earlier rows outside that new interval can then be streamed.

Querying the Foreign Stream as a Tail

In order to tail a remote database table using this option, you prepare a SELECT query with a lower and upper bound on a monotonically increasing column of the foreign stream. For example, given a foreign stream called TEST_STREAM and a monotonically increasing column called ID, you might use a query along the following lines:

SELECT STREAM * FROM TEST_STREAM WHERE ID > <HIGHWATER MARK> AND ID <= (SELECT MAX(ID) FROM TEST_TABLE) - 2;
  • The id > is the lower bound and id <= (select max(id) from TEST_STREAM) - 2 is the upper bound. The lower bound excludes rows that we have already seen. The upper bound excludes rows that may not have been committed yet.

  • The 2 in the upper bound is called the txInterval and allows flexibility (a "fudge factor") for the inserters to this table. txInterval defaults to 2, but should be set to the maximum total number of concurrent inserters into the foreign table (ie the maximum number of insert queries that could simultaneously be executing at once). If you know that there will only be one inserter for TEST_TABLE, you can safely set txInterval to 0 (it cannot be negative).

  • Another table property, pollingInterval, determines how long to wait between executing the prepared statement against the remote DB table. The MED/JDBC reader only waits if no rows were found in the last execution pass.

  • Multiple monotonically increasing columns where clause expressions in the select query are OR'ed together. For example: (ID > ? AND ID <= (select max(id) from TEST_TABLE)) OR (ID2 > ? AND ID2 <= (select max(id) from TEST_TABLE)).

  • You can use this process with both foreign streams and foreign tables in a select stream query. In foreign stream use cases, a highwater mark will be stored.

Writing to RDBMS Sinks

To write data to an RDBMS destination, you first define a server object with connection information, including the URL for the RDBMS, user name, and password.

This topic contains the following subtopics:

Overview

In order to write data to an RDBMS system, you need to define a (Foreign) SERVER, as defined in Defining SQL/MED Servers and Foreign Tables above.

Once you define this server object, you can insert into the database's tables by either:

  • Defining a foreign table to map onto a table in the foreign database (the preferred method)
  • Using the SQL-standard three-level qualifier scheme for the names of tables: catalog.schema.table. In this case, the catalog name will be the name given to the foreign server in your CREATE SERVER command. See the topic CREATE SERVER in the s-Server Streaming SQL Reference Guide for more details.

The CREATE SERVER topic in the SQLstream Streaming SQL Reference Guide has a complete list of options for creating SQL/MED server objects.

You can use two options, TRANSACTION_ROWTIME_LIMIT and TRANSACTION_ROW_LIMIT, to manage when rows commit. See Managing Commits below. You can also determine how s-Server handles write errors using options on the INSERT and MERGE statements, including managing retries. See Errors for Writing to RDBMS Tables below.

Inserting into Foreign Tables

You can insert into JDBC foreign tables in one of two ways:

  • By defining foreign tables within s-Server that map onto tables in the foreign database, such as "my_s-ServerSchema"."foreign_customers". This has the advantage of hiding connection details. You can also restrict privileges to this table. This is the recommended technique for accessing foreign tables.
  • By using the three name qualifier scheme, such as "Oracle_DB".""."customers". Note: Merge currently does not function for PostgreSQL and MySQL.

Using a Foreign Table

See the topic CREATE FOREIGN TABLE in the SQLstream Streaming SQL Reference Guide for more details.

Using the Three Name Qualifier Schema

To write to a foreign table use the schema/table as you would for any other table and preface the server name as the catalog.

CREATE OR REPLACE SCHEMA "RDBMS_schema";

SET SCHEMA '"RDBMS_schema"';

CREATE OR REPLACE PUMP "writerPump" STOPPED AS
INSERT INTO "PostgresDB".my_schema.my_table 
SELECT STREAM * FROM "MyStream";   --where "MyStream" is a currently existing stream

To start writing data, use the following code:

ALTER PUMP 'writerPump' START;

Note On Writing Pumps

Because of the nature of streaming data, you will need to set up a pump in order to move rows continually from an s-Server stream to another stream, file, Kafka topic, RDBMS table or other location. Pumps are INSERT macros that continually pass data from one point in a streaming pipeline to the other. A model for setting up a pump is provided below. See the topic CREATE PUMP in the s-Server Streaming SQL Reference Guide for more details.

Managing Commits

In defining the foreign server or table, you can set parameters that manage how frequently the foreign database table is updated.

Option Defnition
TRANSACTION_ROWTIME_LIMIT The number of milliseconds which may elapse between commits. Defaults to 1000 ms (1 second).
TRANSACTION_ROW_LIMIT The number of rows to batch up before committing. Defaults to 0.

Say that

TRANSACTION_ROWTIME_LIMIT = M

and

TRANSACTION_ROW_LIMIT = N

Whenever the rowtime advances due to a new row or rowtime bound, we commit if either of the following conditions holds:

a) M > 0 and M milliseconds have elapsed since our last commit.

or

b) N > 0 and N rows have flowed by since our last commit..

Handling Errors when Writing to RDBMS Tables

Writes into an RDBMS sink can raise two kinds of errors:

Error type Description
Transient errors These are errors which might go away if we tried to write the row again. These problems include connection resets and java.sql.SQLTransientExceptions raised by the foreign database.
Fatal errors All other errors are considered fatal. The result won't change if we try to write the row again.

You can manage how s-Server behaves when it encounters these errors by setting options in the INSERT and MERGE statements using the WITH OPTIONS clause.

Using PRESERVE_PARTIAL_WRITES and BULK_LOAD for fatal errors

s-Server writes data into RDBMS tables in batches. The size of each batch is determined by TRANSACTION_ROWTIME_LIMIT and TRANSACTION_ROW_LIMIT. If one of the rows in the batch fails to write, s-Server conforms to the following error policy.

  • ABORT - The statement terminates and all writes are rolled back. This is the default behavior for finite DML (an INSERT or MERGE that writes from a table).
  • STUMBLE - The current batch of rows is rolled back but the statement continues to run. This is the default behavior for infinite DML (an INSERT or MERGE that writes from a stream) and for finite bulk-load.
  • SKIP - The failed row is discarded, but the statement continues to run. Other rows in the batch may be committed, depending on how the database's JDBC driver handles java.sql.Statement.executeBatch(). You can enable this behavior by setting PRESERVE_PARTIAL_WRITES to true.

The error policy is determined by two options in the INSERT and MERGE statements: PRESERVE_PARTIAL_ROWS and BULK_LOAD.

When BULK_LOAD is set to true, s-Server will commit rows whenever a batch of buffered rows is flushed to the external database.

When PRESERVE_PARTIAL_WRITES, s-Server will commit rows whenever a batch of buffered rows is flushed to the external database.

See DML options for the INSERT statement for more details (these are the same for both INSERT and MERGE).

INSERT and MERGE into med jdbc tables with streams as a source (finite inserts and merges) behave as follows for the indicated values of (PRESERVE_PARTIAL_WRITES, BULK_LOAD):

PRESERVE_PARTIAL_WRITES BULK_LOAD Result
false false STUMBLE
false true Illegal
true false SKIP
true true Illegal

INSERT and MERGE with tables as a source (finite inserts and merges) into med jdbc tables behave as follows for the indicated values of (PRESERVE_PARTIAL_WRITES, BULK_LOAD):

PRESERVE_PARTIAL_WRITES BULK_LOAD Result
false false ABORT
false true STUMBLE
true false illegal
true true skip

Examples

-- ABORT (default for finite DML)
insert into CAMPAIGNS
values ('aaa', 1), ('bbb', 2), ('ccc', 3), ('aaa', -1), ('ddd', 4), ('eee', 5);

-- STUMBLE (default for infinite DML and for finite bulk-load)
insert into CAMPAIGNS with options (bulk_load 'true')
values ('aaa', 1), ('bbb', 2), ('ccc', 3), ('aaa', -1), ('ddd', 4), ('eee', 5);

merge into CAMPAIGNS t
using (select stream CAMPAIGN_ID, TOTAL_CLICKS from CONSOLIDATED_STREAM) as s
on t.TOTAL_CLICKS = s.TOTAL_CLICKS
when matched then update set CAMPAIGN_ID = s.CAMPAIGN_ID
when not matched then insert (CAMPAIGN_ID, TOTAL_CLICKS) values
(s.CAMPAIGN_ID, s.TOTAL_CLICKS)
;

-- SKIP
insert into CAMPAIGNS
with options (preserve_partial_writes 'true', bulk_load 'true')
values ('aaa', 1), ('bbb', 2), ('ccc', 3), ('aaa', -1), ('ddd', 4), ('eee', 5);

merge into CAMPAIGNS with options (preserve_partial_writes 'true') t
using (select stream CAMPAIGN_ID, TOTAL_CLICKS from CONSOLIDATED_STREAM) as s
on t.TOTAL_CLICKS = s.TOTAL_CLICKS
when matched then update set CAMPAIGN_ID = s.CAMPAIGN_ID
when not matched then insert (CAMPAIGN_ID, TOTAL_CLICKS) values
(s.CAMPAIGN_ID, s.TOTAL_CLICKS)
;

Dealing with Transient Errors using Retries

When s-Server encounters a transient error -- an error that might be resolved if s-Server retries inserting or merging the row, what will happen depends on the following:

  • Whether s-Server is writing from a stream or table. Writing from a stream is known as infinite DML, while writing from a table is known as finite DML. By default, when s-Server is writing from a stream, it retries writing rows that raise transient errors. When s-Server is writing from a table, it does not retry writing rows that raise transient errors.
  • How you have configured retries using the WITH OPTIONS clause of an INSERT or MERGE statement.

You can adjust how and when s-Server retries inserting/merging rows in the case of transient errors. Note: Fatal errors ABORT, STUMBLE, or SKIP immediately, depending on how you have configured options above.

You do so by setting the following options in the WITH OPTIONS clause of an INSERT or MERGE statement. These options can only be set in the INSERT or MERGE statement (or in a PUMP using an INSERT or MERGE); they cannot be defined against the server or foreign stream / table.

Option Definition
RETRY_COUNT The number of times that s-Server tries rewriting a batch of rows that fails because of a transient error. -1 means "no maximum number of retries". Other negative values (and non-integer values) are illegal. For finite DML, RETRY_COUNT defaults to 0. For infinite DML, it defaults to -1.Each retry attempt will pause longer than the previous attempt.
RETRY_MAX_TIME The maximum number of milliseconds that s-Server will spend retrying a failed batch. It defaults to -1, which means "no maximum time". RETRY_MAX_TIME = 0 means the same thing as RETRY_COUNT = 0: don't retry at all.
RETRY_PAUSE_MIN The minimum number of milliseconds to pause between retries. It defaults to 10,000 (10 seconds). The value must be a positive integer--anything else raises an error.
RETRY_PAUSE_MAX The maximum number of milliseconds to pause between retries. It defaults to 300,000 (5 minutes). The value must be a positive integer--anything else raises an error.

s-Server continues retrying batches of rows until it reaches either RETRY_COUNT or RETRY_MAX_TIME. Once it encounters either, then the statement will go on to enforce its core error policy: ABORT, STUMBLE, or SKIP.

We also raise an error if RETRY_COUNT is set to a positive value on a finite DML statement but BULK_LOAD is not set to true. This is because the first step of retrying a batch is to roll back the previous attempt of the batch. For finite statements without BULK_LOAD, that would mean we would have to cache a potentially huge number of rows--all the way back to the beginning of the statement.

Here's an example. The following INSERT statement retries batches of rows up to 5 times, pausing for 100 milliseconds before the first retry and pausing up to a maximum of 500 milliseconds before subsequent retries:

INSERT INTO foreignTable
with options
(
retry_count '5',
retry_pause_min '100',
retry_pause_max '500'
)
SELECT STREAM * from sourceStream;

Known Issues with RDBMS sinks

MERGE into PostgreSQL fails for versions < 9.5.0

PostgreSQL does not directly support the SQL MERGE statement. So when you use the SQLstream MERGE statement to upsert into a PostgreSQL foreign table, SQLstream generates a PostgreSQL INSERT INTO ... ON CONFLICT ... statement to deliver the data.

For PostgreSQL versions before 9.5.0, you will see error messages in the Trace.log file:

ERROR: syntax error at or near "ON"

INSERT INTO ... ON CONFLICT ... - and therefore the SQLstream MERGE - is only supported for PostgreSQL versions from 9.5.0 (January 2016) onwards. Please make sure to use an up-to-date version of PostgreSQL.