Reading Data into s-Server

Before you can query, analyze, or set alerts on your data, you need to read this data into s-Server. This integration guide describes a number of ways to ingest data. All data ingested ends up in an s-Server foreign stream. A foreign stream is an s-Server object that serves as a “tank” to hold streaming data. Once you have established a foreign stream, you can pump data from the foreign stream to native streams, where you can analyze data, set alerts on data, visualize data, and so on.

There are two basic categories for reading data into s-Server:

  • Reading data from RDBMS sources. You can use s-Server to read directly from RDBMS tables. Since these data are already in rows and columns, no additional parsing of data is necessary to get it into an s-Server stream. See Reading Data from RDBMS Sources in this guide.
  • Reading data from all other sources using the Extensible Common Data framework. This framework lets you identify a data source, such as a file from the file system, a network socket feed, an Apache Kafka topic, an Amazon Kinesis stream, an AMQP message. s-Server needs to parse data from these sources. In s-Server, parsing means to translate data into rows and columns. s-Server can automatically process data in a number of common formats: CSV, XML, JSON, key pair, and Google Protobuf. For other formats, you can make a custom parser using Regex (these are “regular expressions” for creating search strings). See Reading Data from Other Sources in this guide, as well as the specific topics Reading from the File System, Reading from Network Sockets, Reading from AMQP, Reading from Kafka, Reading from Kinesis, Reading over HTTP, and Reading from Web Sockets.

Configuring Input

For all of the above sources, you can configure input in three ways. This guide focuses on the third option, creating a foreign stream by hand in SQL, but the principles described within apply to both StreamLab and s-Studio.

  1. Using StreamLab, a graphical interface, to set up a connection with a source. For more details, see Adding Sources in StreamLab in the StreamLab Guide. (You can still connect with sources that you’ve created in StreamLab using SQL.)
  2. Creating a foreign stream in s-Studio, SQLstream’s Integrated Development Environment. For more details, see Creating Objects in s-Studio in the SQLstream s-Studio Guide.
  3. Creating a foreign stream by hand in SQL, then running this SQL in s-Server using sqlline. This process is described below under topics for each source. The Streaming SQL Reference Guide provides configuration details in the topic CREATE FOREIGN STREAM. For information on using sqlline, see the Using SQLline to Run SQL in s-Server.

The Extensible Common Data framework reads from a variety of sources and parses data from these sources into rows in an s-Server stream or table. Currently supported sources are:

s-Server parses data from these sources into rows of columns in streams or tables. You can read data from these streams or tables by running a SELECT statement against them.

To parse files from these sources using an adapter, you need to set up both a server for the source and set up a foreign stream for the source. One server object can be referenced by multiple foreign streams. To parse files with the Extensible Common Data agent, you launch the agent from its directory at the command line, using the –input flag. See the topic Reading Files in Remote Locations for more information on the Extensible Common Data Agent.

File Formats

s-Server can parse a number of file formats from these sources:

Topic Parameter Name
Parsing CSV CSV
Parsing XML XML
Parsing Key Value Pairs KV
Parsing ProtoBuf PROTOBUF
Parsing Avro AVRO

If you do not have information about the file format to be read, or need help filling in foreign stream or agent properties for a file format, you can use the Discovery parser to determine information about a file’s format. This parser reads a sample of the file and returns information on file format options. Currently, the Discovery parser works for CSV, XML, JSON, and Avro files.

You can use the None Parser to pull in a single column of data.

Handling Unparsable Rows

Occasionally, s-Server will encounter rows that it cannot parse. By default, in these cases s-Server fills in default values for columns which don’t parse and then logs an informational message to its trace log. (See the topic Using Trace Files in the Building Applications Guide for more details on the trace log.)

For CSV and JSON only, you can enable a stream option called DISCARD_ROW_ON_PARSE_ERROR. The option causes s-Server to throw away rows with parsing errors to log the surrounding payload to the global error stream. (See the topic Global Error Stream in Streaming SQL Reference Guide for more information.)

To enable this option, add the following line to an ECD foreign stream definition that uses CSV or JSON:

discard_row_on_parse_error 'true',

Special Columns Generated by Parsers

Some input adapters can have “special” columns which are part of the foreign stream definition but not part of the underlying data format. For example, a column called PARSE_POSITION is produced for all file formats. If you want to include this column in your stream, you need to define it as a BIGINT. This column will not appear in the data format but will be in the result set from selecting the foreign stream. The Extensible Common Data Adapter will supply the value for this column instead of parsing its value from the underlying data. This is used to supply meta information about what is being parsed and where it came from.

The Extensible Common Data Adapter Agent generates four special row columns when parsing files which provide information on how the row was derived from the file. You can declare any of these columns to make them part of a foreign stream or table.

Special Column Meaning
PARSE_POSITION Position in the file where this row originated.
PARSE_LINE_NUMBER Line number in file where this row originated.
PARSE_ERROR Usually null. Otherwise provides an error message explaining what wrong when parsing this row
SOURCE_FILE The file this row originated.