Writing a C++ UDF

This topic contains information on the following:

SQLstream’s support for C++ UDFs may provide better performance than UDFs written in Java. The C++ framework also allows you to benefit from many pre-existing native libraries - for example in the fields of Mathematics, Statistics, Geographics and more.

Using C++ UDFs, the SQLstream user can define a function that takes a few parameters as input and returns a result. The API provaides an INSTALL_LLVM_UDF macro which provides the glue so that SQLstream can connect it’s generated LLVM with the function. A context variable can also listed as a parameter if the function needs to cache helper variables, allocate memory for returned array types or access calculator state.

Prerequisites

Please ensure that the SQLstream C++ SDK has been installed. See Installing the C++ SDK.

Creating a C++ UDF

The following examples represent the method of creating the C++ component for these UDFs. You can find all this codein the sampleUdfs.cpp file.

All these examples and more are contained in a single C++ file sampleUdfs.cpp, which can be found in the tarball above.

Include / using

At the top of each C++ UDF module, include at least these lines

#include "sqlstream/Udf.h" 
using namespace sqlstream; 
using fennel::SqlState; 

Simple UDF implementing the boolean XOR operator

Now define a function for each UDF. There may be many functions in a single .cpp file.

// simple UDF that takes two booleans and returns a boolean
bool xorOp(bool x, bool y) {
    return x ^ y;
}

INSTALL_LLVM_UDF(xorOp, xorOp) 

NOTE:

  • The INSTALL_UDF macro is used to export the UDF, so that it can be installed through SQL.
  • Parameters used are:
    • <external name to use in sql CREATE FUNCTION>
    • <name of class>.
  • Valid result and parameter types and their SQL equivalents are:
    • int8_t (TINYINT)
    • int16_t (SMALLINT)
    • int32_t (INTEGER)
    • int64_t (BIGINT,DECIMAL or TIMESTAMP)
    • double (FLOAT or DOUBLE)
    • bool (BOOLEAN),char_t (CHAR)
    • varchar_t (VARCHAR)
    • binary_t (BINARY)
    • varbinary_t(VARBINARY)
    • Any parameter type can be wrapped with Nullable for functions that need explicit NULL handling.
  • For CHAR, VARCHAR, BINARY and VARBINARY, SQLstream supports “reference”, which returns a value without copying it. While using refrence, ensure that no transient data is passed as an input. It should only be used for persistant data. Alternatively, data can be maintained within field of a class using “=” operator. TODO move this somewhere else

Templated UDF that supports multiple types

template<typename TYPE>
// uses gcc builtin to return true on overflow
bool doAdd(TYPE x, TYPE y, TYPE &result) {
    if constexpr(std::is_integral<TYPE>::value) {
        return __builtin_add_overflow(x, y, &result);
    } else {
        // otherwise should be DOUBLE
        result = x + y;
        return true; // floating point overflows will be detected by the calling LLVM 
    }
}

template<typename TYPE>
TYPE addOp(LlvmCalculatorContext *context, TYPE x, TYPE y) {
    TYPE result;
    if (doAdd(x, y, result)) {
        // log to trace stream
        context->logInfo("this is an info message zzzzzz");
        context->logWarning("this is a warning message zzzzzz");
        context->logSevere("this is a severe message zzzzzz");
        // throw an exception
        throw SqlState::instance().code22003();
    } else {
        return result;
    }
};
// Install once for each type supported
INSTALL_LLVM_UDF(addLongs, addOp<int64_t>)
INSTALL_LLVM_UDF(addInts, addOp<int32_t>)
INSTALL_LLVM_UDF(addShorts, addOp<int16_t>)
INSTALL_LLVM_UDF(addTinys, addOp<int8_t>)
INSTALL_LLVM_UDF(addDoubles, addOp<double>)

NOTES:

  • In this example we have used INSTALL_UDF 5 times to export type-specific versions of the UDF.
  • If there is an overflow, you will see log messages in the /var/log/sqlstream/Trace.log.0 file or ALL_TRACE stream.

Returning NULL explicitly

Nullable<int64_t> negateLongOrNullOp(int64_t x) {
    if (x == 0) {
        return sqlNull; // this is how you explicitly return NULL - result will always be NULL if any arguments are NULL
    } else {
        return -x;
    }
}
INSTALL_LLVM_UDF(negateLong, negateLongOrNullOp)

Handling NULLs explicitly

Nullable<int64_t> negateLongOrNullOp(Nullable<int64_t> x) {
    if (x == 0) {
        return sqlNull; // this is how you explicitly return NULL
    } else if (x.isNull()) // this is how you explicitly test for NULL
        return 0;
    } else {
        return -x;
    }
}
INSTALL_LLVM_UDF(negateLong, negateLongOrNullOp)

Concatenate two strings and return VARCHAR

varchar_t concatOp(VarDatumContext *context, varchar_t x, varchar_t y) {
    auto newSize = x.size +y.size;
    varchar_t result = context->getStorage<varchar_t>(newSize);
    memcpy(result.data, x.data, x.size);
    memcpy(result.data + x.size, y.data, y.size);
    return result; // assumes return type is declared large enough.
}
INSTALL_LLVM_UDF(concat, concatOp)

Compiling the C++ UDF shared object library

To build the example sampleUdfs.cpp use the script supplied in the tarball:

./build.sh

This will create build/plugin/libsampleUdfs.so

Installing a C++ UDF

After creating a library plugin, the corresponding SQL function would be created as follows. These examples are from the install.sql script in the SDK tarball, and they assume that your sampleUdfs.so object library has been copied (deployed) to the plugin folder under $SQLSTREAM_HOME. You may choose any convenient location either relative to s-Server’s working directory (usually $SQLSTREAM_HOME) or use an absolute path. Then invoke sqllineClient –run=install.sql (or whatever your install script is called).

Install XorOp

CREATE OR REPLACE SCHEMA sampleUdf; 
SET SCHEMA 'sampleUdf'; 
SET PATH 'sampleUdf'; 

create or replace function xor(i boolean, j boolean) 
returns boolean 
language c 
parameter style general 
no sql 
external name 'plugin/libsampleUdfs.so:xorOp'; 

The SQL function name xor is mapped to the xorOp C++ functor.

Install AddOp

create or replace function addLongs(i bigint, j bigint) 
returns bigint 
language c 
parameter style general 
no sql 
external name 'plugin/libsampleUdfs.so:addLongs'; 

Calling a C++ UDF

Call each function SQL code like this:

SELECT STREAM a, b, add(A,B) as "Total" FROM myStream S; 

Input

A B
21 45
12 73

Output

A B Total
21 45 66
12 73 85