You can use DataRobot’s analytic models to build a UDX that you can call in s-Server. This model takes stream columns as input and outputs these columns plus a predictive column, such as “isspeeding” or “isagoodbuy”.
In order to do so, you will need to:
Notes:
You will need to do two things with this JAR:
You then write a UDX that incorporates the downloaded JAR. To pass the UDX columns, you reference UDX cursor parameters, which correspond to a Java argument of type java.sql.ResultSet and streaming UDX cursor parameters, which correspond to a Java argument of type com.sqlstream.jdbc.StreamingResultSet.
To write this UDX, you follow the same steps as in Writing a Java UDX.
When you compile, the compile line needs to reference the DataRobot Jar along with aspen.jar, JDBC, and SimpleUdx.Jar.
$ javac -cp $SQLSTREAM_HOME/lib/SqlStreamJdbc_Complete.jar:$SQLSTREAM_HOME/lib/aspen.jar:$SQLSTREAM_HOME/lib/<mydatarobot>.jar
$ jar cvf SimpleUdx.jar /path/to/SimpleUdx.class
In the DataRobot JAR, look for the long ID that follows com.datarobot.prediction. You can use a tool like Reflection to find it. The code you will need follows this id. In the actual UDX, you modify the sample block of this code that follows:
String modelName = "com.datarobot.prediction.dr595665fdc808913c829601be.DRModel";
Predictor model =
(Predictor)Class.forName(modelName).newInstance();
ArrayList<String> doubleKeys =
new ArrayList(Arrays.asList(model.get_double_predictors()));
ArrayList<String> stringKeys =
new ArrayList(Arrays.asList(model.get_string_predictors()));
// For debugging purpose only, to double check the list of column
// which are accepted by the model.
System.out.print(Arrays.toString(doubleKeys.toArray()));
System.out.print(Arrays.toString(stringKeys.toArray()));
Row r = new Row();
// The Row object stores String and double values separatelly.
r.d = new double[doubleKeys.size()];
r.s = new String[stringKeys.size()];
r.s[stringKeys.indexOf("DATE")] = "2013-11-07T06:20:48";
r.s[stringKeys.indexOf("COMMENT_ID")] = "LZQPQhLyRh80UYxNuaDWhIGQYNQ96IuCg-AYWqNPjpU";
r.s[stringKeys.indexOf("AUTHOR")] = "Julius NM";
r.s[stringKeys.indexOf("CONTENT")] = "Huh, anyway check out this you[tube] channel: testi02";
// Example of a double parameter (it is not used for this predictor).
//r.d[doubleKeys.indexOf("NUMBER")] = 123;
double score = model.score(r);
In the above code, you need to replace the following code with calls to the UDX resultset. (These should populate Row r*).
r.s[stringKeys.indexOf("DATE")] = "2013-11-07T06:20:48";
r.s[stringKeys.indexOf("COMMENT_ID")] = "LZQPQhLyRh80UYxNuaDWhIGQYNQ96IuCg-AYWqNPjpU";
r.s[stringKeys.indexOf("AUTHOR")] = "Julius NM";
r.s[stringKeys.indexOf("CONTENT")] = "Huh, anyway check out this you[tube] channel: kobyoshi02";
Replace this block with calls to the resultset to get the row values. The result will be the same set of columns, plus a score column that represents the DataRobot prediction.