Before you can query, analyze, or set alerts on your data, you need to read this data into s-Server. This integration guide describes a number of ways to ingest data. All data ingested ends up in an s-Server foreign stream. A foreign stream is an s-Server object that serves as a “tank” to hold streaming data. Once you have established a foreign stream, you can pump data from the foreign stream to native streams, where you can analyze data, set alerts on data, visualize data, and so on.
There are two basic categories for reading data into s-Server:
For all of the above sources, you can configure input in three ways. This guide focuses on the third option, creating a foreign stream by hand in SQL, but the principles described within apply to both StreamLab and s-Studio.
The Extensible Common Data framework reads from a variety of sources and parses data from these sources into rows in an s-Server stream or table. Currently supported sources are:
s-Server parses data from these sources into rows of columns in streams or tables. You can read data from these streams or tables by running a SELECT statement against them.
To parse files from these sources using an adapter, you need to set up both a server for the source and set up a foreign stream for the source. One server object can be referenced by multiple foreign streams. To parse files with the Extensible Common Data agent, you launch the agent from its directory at the command line, using the –input flag. See the topic Reading Files in Remote Locations for more information on the Extensible Common Data Agent.
s-Server can parse a number of file formats from these sources:
|Parsing Key Value Pairs||KV|
If you do not have information about the file format to be read, or need help filling in foreign stream or agent properties for a file format, you can use the Discovery parser to determine information about a file’s format. This parser reads a sample of the file and returns information on file format options. Currently, the Discovery parser works for CSV, XML, JSON, and Avro files.
You can use the None Parser to pull in a single column of data.
Occasionally, s-Server will encounter rows that it cannot parse. By default, in these cases s-Server fills in default values for columns which don’t parse and then logs an informational message to its trace log. (See the topic Using Trace Files in the Building Applications Guide for more details on the trace log.)
For CSV and JSON only, you can enable a stream option called DISCARD_ROW_ON_PARSE_ERROR. The option causes s-Server to throw away rows with parsing errors to log the surrounding payload to the global error stream. (See the topic Global Error Stream in Streaming SQL Reference Guide for more information.)
To enable this option, add the following line to an ECD foreign stream definition that uses CSV or JSON:
Some input adapters can have “special” columns which are part of the foreign stream definition but not part of the underlying data format. For example, a column called PARSE_POSITION is produced for all file formats. If you want to include this column in your stream, you need to define it as a BIGINT. This column will not appear in the data format but will be in the result set from selecting the foreign stream. The Extensible Common Data Adapter will supply the value for this column instead of parsing its value from the underlying data. This is used to supply meta information about what is being parsed and where it came from.
The Extensible Common Data Adapter Agent generates four special row columns when parsing files which provide information on how the row was derived from the file. You can declare any of these columns to make them part of a foreign stream or table.
|PARSE_POSITION||Position in the file where this row originated.|
|PARSE_LINE_NUMBER||Line number in file where this row originated.|
|PARSE_ERROR||Usually null. Otherwise provides an error message explaining what wrong when parsing this row|
|SOURCE_FILE||The file this row originated.|