Parser UDX

The Parser UDX lets you call s-Server’s predefined parsers in a function. You can use this function to call any of the parsers described in the topic Input Formats for Reading. This parser takes as input a cursor with data to be parsed.

See the topic CREATE FUNCTION in the SQLstream Streaming SQL Reference Guide* for more details on functions.

The Parser UDX is useful for scenarios in which data cannot be fully parsed on input, such as a case where JSON data is embedded within CSV data, or when data arrives through a JDBC connection instead of the ECD adapter or agent.

To use the Parser UDX, you need to do the following:

  1. Identify or create a table, stream, foreign table or foreign stream that contains the data to be parsed (such as a file containing comma separated values. JSON, XML, ProtoBuf, or keypair values) For more information on declaring foreign streams for these sources, see the topic Reading from Other Sources in this guide. You will use this stream or table as an input cursor for the Parser UDX function.
  2. Declare a function that RETURNS TABLE with the columns that you will parse out of the data, with the following parameters:
  • inputcursor. This is the cursor that contains a SELECT statement for the the foreign stream that you declared above.
  • columnName varchar(256). This is the name of the column in the foreign stream that you declared above. This holds the raw data to be parsed.
  • parserType varchar(256). This is the name of the parser type. Options are CSV, XML, JSON, KV, PROTOBUF, VCLP, FCLP, FastRegex, and WSC. For more information on these parsers, see the topic - Input Formats for Reading in this guide.
  • options cursor. This is a cursor that passes in options for the parser. See the topic Input Formats for Reading in this guide for more information.
  • Run a SELECT statement against the function that passes in the parameters above.

Example

The following example first creates and sets a schema for the function, sets a path for the function, creates a foreign stream to input data, creates a function using the Parser UDX, creates a view to input options for the parser, and then creates a view to select from the function. (The view makes it easier to select from this function in the future.)

CREATE OR REPLACE SCHEMA PARSERUDXTEST;
SET SCHEMA 'PARSERUDXTEST';
--Sets path so that this function can be called by
--function name when this path is set.
SET PATH 'PARSERUDXTEST';

--Server for foreign stream

CREATE OR REPLACE SERVER FileReaderServer type 'FILE'
FOREIGN DATA WRAPPER ECDA;

--Stream with raw data
CREATE OR REPLACE FOREIGN STREAM test_table (
  MESSAGE VARCHAR(1024)
)
SERVER FileReaderServer
--options for data source
OPTIONS(
   DIRECTORY '/tmp/',
   FILENAME_PATTERN 'animals.csv',
   CHARACTER_ENCODING 'ISO-8859-1',
   PARSER 'CSV',
   STATIC_FILES 'true',
   SKIP_HEADER 'true',
   SEPARATOR '@' --we ignore separators

   --so that data is not parsed on input
);

CREATE OR REPLACE FUNCTION parseColumn(
   input cursor,
   columnName varchar(256),
   parserType varchar(256),
   options cursor
  )
  --columns need to be declared with types that match
  --data to be parsed
  RETURNS TABLE(
      ROWTIME timestamp not null,
      ANIMAL_ID VARCHAR(64),
      COMMON_NAME VARCHAR(64),
      GENUS VARCHAR(64),
      SPECIES VARCHAR(64),
      FAMILY VARCHAR(64),
      ORDO VARCHAR(64),
      CLASSIS VARCHAR(64)
)
  LANGUAGE JAVA
  PARAMETER STYLE SYSTEM DEFINED JAVA
  NO SQL
  EXTERNAL NAME 'class com.sqlstream.aspen.namespace.common.ParserUdx.parseColumn';

--These are options for the parser
CREATE VIEW CSV_OPTIONS(SEPARATOR) AS VALUES(',');

--This selects from the function.
CREATE OR REPLACE VIEW readParsed as
SELECT STREAM * FROM STREAM(parseColumn(
      CURSOR(SELECT stream * FROM test_table),
      'MESSAGE',
      'CSV',
      CURSOR(select * from CSV_OPTIONS)));

SELECT STREAM * from readParsed;