XML Parse UDX

The XMLParser UDX is a User Defined Transform (UDX that uses an XPATH (the XML Path Language) definition to transform the contents of a single column into distinct columns for each attribute. The basic function is to match XML attributes to SQL column names, extract the data values, and place each attribute as an individual column value.

The inputs to the XMLParse UDX are an input stream supplied as a CURSOR argument, the name of a column to be parsed by the XMLParse, and the XPATH definition of the XML.

The XMLParse output is a new stream containing the individual attribute values as column values.

XMLParse.jar UDX file

The XMLParse UDX is implemented in a Java JAR file located in the $SQLSTREAM_HOME/plugin directory. As with other jar files, it is loaded with the CREATE JAR statement.

CREATE SCHEMA "test";
  SET SCHEMA '"test"';
  CREATE JAR "test"."XmlParse"
      LIBRARY 'file:plugin/XmlParse.jar'
      OPTIONS(0);

Alternately, you can create a link to the XMLParse UDX in the autocp directory.

cd $SQLSTREAM_HOME/plugin/autocp
ln -s ../XmlParse.jar

Creating a SQL Function to use the XMLParse UDX

The XMLParse input parameters specify the following information:

  • The set of input rows,
  • The column name of the column containing the XML data,
  • The file name of the XPATH configuration file. The location of this file is $SQLSTREAM_HOME/plugin/jndi/xmlparse/.properties.

The Table returned by the XMLParse UDX is a set of columns. Each column in the input stream named in the list of output Table columns receives the corresponding value passed through to the output. The column name of the column containing the XML data is parsed according to the XPATH configuration and any attributes corresponding to XML tags appear in the output for each column named in the list of output columns. XML tags that have no matching column names in the list of output Table columns are silently dropped. If the XML data has an array of repeating values, each repeating value will correspond to one UDX output row.

CREATE FUNCTION "test"."ParseXml"
            (c CURSOR,
             parseColName VARCHAR(20),
             confName VARCHAR(20))
      RETURNS TABLE
      (
                  -- Input values passed to the output
                   "location" VARCHAR(30),
                   "city" VARCHAR(32),
                   "region" VARCHAR(32),
                  -- Output columns parsed from the XML
                   id INTEGER,
                   name VARCHAR(60),
                   title VARCHAR(60),
                   years VARCHAR(60)
    )
    LANGUAGE JAVA
    PARAMETER STYLE SYSTEM DEFINED JAVA
    NO SQL
    EXTERNAL NAME
      '"XmlParse":com.sqlstream.plugin.xmlparse.XmlParse.parseXml';

Calling the Function

The following code calls the UDX.

 CREATE STREAM "InputStream" (
 "location" VARCHAR(30),
 "city" VARCHAR(32),
 "region" VARCHAR(32),
 id int,
      XML_DATA VARCHAR(32000));
 SELECT STREAM *
     FROM TABLE("test"."ParseXml"(
         -- First parameter is input stream
            CURSOR(SELECT STREAM * FROM "InputStream"),
         -- Second parameter is the name of the column in the input data that
            contains the XML data
            'XML_DATA',
         -- Third parameter is the name of the XPath definition file.
            'xpath_defn'));

Note: When a table function returns an infinite result, it needs to be wrapped with a STREAM(…) marker when invoked. s-Server will return an error otherwise.

Example Properties File and Input

The content of the Properties file passed as the third parameter of the function above is written in XPath. XPath is used for selecting parts of an XML document and is defined by the World Wide Web Consortium, W3C. An example of the content of the properties file which could be used for the above code would be:

 NAME.xpath=/testxml/row/name
 TITLE.xpath=@title
 YEARS.xpath=@years
 repeatingElement.xpath=/testxml/row/jobfunctions/function

Some Sample Input

SET SCHEMA '"test"';
 INSERT INTO "InputStream" ("location", "city", "region", id, xml_data)
   VALUES ('US', 'Oakland', 'California', 12345,
    '<testxml><row><name>joe</name><jobfunctions><function alt="truck driver" years="1"/>
     <function alt="clerk" years="5"/></jobfunctions></row></testxml>');

Troubleshooting

Set the appropriate trace level in /var/log/sqlstream/Trace.propertiesTrace.properties for:

com.sqlstream.plugin.xmlparse.XmlParse.parseXml.level=<trace_level>
trace_level Meaning/Usage
WARNING Error Conditions which disturb successful processing
INFO unused
FINE Logs the mapping input columns and output columns
FINER Logs the mapping between XML attributes and output columns
FINEST unused

Constraints

The maximum size of the XML document is constrained by the maximum row and column size of 65536.