The XMLParser UDX is a User Defined Transform (UDX that uses an XPATH (the XML Path Language) definition to transform the contents of a single column into distinct columns for each attribute. The basic function is to match XML attributes to SQL column names, extract the data values, and place each attribute as an individual column value.
The inputs to the XMLParse UDX are an input stream supplied as a CURSOR argument, the name of a column to be parsed by the XMLParse, and the XPATH definition of the XML.
The XMLParse output is a new stream containing the individual attribute values as column values.
The XMLParse UDX is implemented in a Java JAR file located in the $SQLSTREAM_HOME/plugin directory. As with other jar files, it is loaded with the CREATE JAR statement.
CREATE SCHEMA "test";
SET SCHEMA '"test"';
CREATE JAR "test"."XmlParse"
LIBRARY 'file:plugin/XmlParse.jar'
OPTIONS(0);
Alternately, you can create a link to the XMLParse UDX in the autocp directory.
cd $SQLSTREAM_HOME/plugin/autocp
ln -s ../XmlParse.jar
The XMLParse input parameters specify the following information:
The Table returned by the XMLParse UDX is a set of columns. Each column in the input stream named in the list of output Table columns receives the corresponding value passed through to the output. The column name of the column containing the XML data is parsed according to the XPATH configuration and any attributes corresponding to XML tags appear in the output for each column named in the list of output columns. XML tags that have no matching column names in the list of output Table columns are silently dropped. If the XML data has an array of repeating values, each repeating value will correspond to one UDX output row.
CREATE FUNCTION "test"."ParseXml"
(c CURSOR,
parseColName VARCHAR(20),
confName VARCHAR(20))
RETURNS TABLE
(
-- Input values passed to the output
"location" VARCHAR(30),
"city" VARCHAR(32),
"region" VARCHAR(32),
-- Output columns parsed from the XML
id INTEGER,
name VARCHAR(60),
title VARCHAR(60),
years VARCHAR(60)
)
LANGUAGE JAVA
PARAMETER STYLE SYSTEM DEFINED JAVA
NO SQL
EXTERNAL NAME
'"XmlParse":com.sqlstream.plugin.xmlparse.XmlParse.parseXml';
The following code calls the UDX.
 CREATE STREAM "InputStream" (
 "location" VARCHAR(30),
 "city" VARCHAR(32),
 "region" VARCHAR(32),
 id int,
   XML_DATA VARCHAR(32000));
 SELECT STREAM *
   FROM TABLE("test"."ParseXml"(
     -- First parameter is input stream
      CURSOR(SELECT STREAM * FROM "InputStream"),
     -- Second parameter is the name of the column in the input data that
      contains the XML data
      'XML_DATA',
     -- Third parameter is the name of the XPath definition file.
      'xpath_defn'));
Note: When a table function returns an infinite result, it needs to be wrapped with a STREAM(…) marker when invoked. s-Server will return an error otherwise.
The content of the Properties file passed as the third parameter of the function above is written in XPath. XPath is used for selecting parts of an XML document and is defined by the World Wide Web Consortium, W3C. An example of the content of the properties file which could be used for the above code would be:
 NAME.xpath=/testxml/row/name
 TITLE.xpath=@title
 YEARS.xpath=@years
 repeatingElement.xpath=/testxml/row/jobfunctions/function
SET SCHEMA '"test"';
 INSERT INTO "InputStream" ("location", "city", "region", id, xml_data)
  VALUES ('US', 'Oakland', 'California', 12345,
  '<testxml><row><name>joe</name><jobfunctions><function alt="truck driver" years="1"/>
   <function alt="clerk" years="5"/></jobfunctions></row></testxml>');
Set the appropriate trace level in /var/log/sqlstream/Trace.propertiesTrace.properties for:
com.sqlstream.plugin.xmlparse.XmlParse.parseXml.level=<trace_level>
trace_level | Meaning/Usage |
---|---|
WARNING | Error Conditions which disturb successful processing |
INFO | unused |
FINE | Logs the mapping input columns and output columns |
FINER | Logs the mapping between XML attributes and output columns |
FINEST | unused |
The maximum size of the XML document is constrained by the maximum row and column size of 65536.