Monitoring s-Server and Components

You can use a number of techniques to monitor s-Server and its components. This topic provides an overview of monitoring s-Server and its components.

This page covers the following subtopics:

You may also find it useful to also incorporate generic system monitoring tools including:

  • SMTP for verifying system presence, monitoring processing load, disc and memory usage.
  • JMX for remote monitoring of suitably enabled Java VMs, such as for monitoring Kafka brokers.

For information on these tools, please see your operating system’s documentation. To ensure that your system remains operational, you should monitor all servers in some way (see Monitoring the Monitors below for information on managing monitoring systems themselves).

Monitoring SQLstream Products

Services / Processes

SQLstream service PIDs are recorded in pid files under /var/run/sqlstream. For each service to be monitored, verify that:

  • The pid file exists (if not, the service has been cleanly shut down)
  • The process corresponding to the pid is running (if not, the service has died unexpectedly)

s-Server itself has two pid files: s-serverd.pid for the core and jvm.pid for the Java VM. Here is a simple bash fragment to check process status, and report UP, BROKEN or DOWN for each:

for s in $SERVICES
do
  pid=0
  message=

  pidfile=/var/run/sqlstream/$s.pid

  if [ "$s" = "snmpd" ]
  then
    # this is not owned by sqlstream, so use simpler method
    sudo service snmpd status > /dev/null
    if [ $? -eq 0 ]
    then
      status=UP
      message="service running normally"
    else
      status=DOWN
      message="service not running"
    fi
  elif [ -e $pidfile ]
  then
    pid=`cat $pidfile`
    pgrep --pidfile $pidfile > /dev/null
    if [ $? -ne 0 ]
    then
      # there is no matching process
      status=BROKEN
      message="process not running"

      # should we restart the service?
      # restart $s
    else
      # all good - process running
      status=UP
      message="service running normally"
    fi

    if [ -z "$pid" ]
    then
      pid=0
    fi
  else


    status=DOWN
    message="service not running"
  fi

  # send the service status in all cases, as we are using it to drive alerts as well as status monitor
  echo "$server,$s,$status,$pid,$message" >> $serviceStatusFile
  statustext="$statustext - $s:$status"
done

Process Memory

We recommend using your operating systems utilities and/or jstack to monitor heap size for the various SQLstream processes. Here is a simple bash function to check heap size for a given JVM. Parameter 1 is the name of the service (such as jvm or webagentd). The PID and heap stats are returned in jspid and jstats (the latter contains two values, current and max heap size).

for s in $SERVICES
  do
     pid=0
     message=

     pidfile=/var/run/sqlstream/$s.pid

     if [ "$s" = "snmpd" ]
     then
         # this is not owned by sqlstream, so use simpler method
         sudo service snmpd status > /dev/null
         if [ $? -eq 0 ]
         then
             status=UP
             message="service running normally"
         else
             status=DOWN
             message="service not running"
         fi
     elif [ -e $pidfile ]
     then
         pid=`cat $pidfile`
         pgrep --pidfile $pidfile > /dev/null
         if [ $? -ne 0 ]
         then
             # there is no matching process
             status=BROKEN
             message="process not running"

             # should we restart the service?
             # restart $s
         else
             # all good - process running
             status=UP
             message="service running normally"
         fi

         if [ -z "$pid" ]
         then
             pid=0
         fi
     else


         status=DOWN
         message="service not running"
     fi

   # send the service status in all cases, as we are using it to drive alerts as well as status monitor
     echo "$server,$s,$status,$pid,$message" >> $serviceStatusFile
     statustext="$statustext - $s:$status"
done

The heap size required will vary substantially depending on the nature of the processing pipelines installed. The important thing is to establish a normal pattern, then identify exceptional conditions when they occur.

Log Files

The default locations for trace/log files are:

Service Trace Properties / Log Directory Trace Properties Log Filename
s-serverd /var/log/sqlstream which is a link to $SQLSTREAM_HOME/trace Trace.properties Trace.log.[0-99]
webagentd $SQLSTREAM_HOME/../clienttools/WebAgent/trace.log. trace.properties trace.log.[0-99]

Monitoring WebAgent

If you are using WebAgent, you can monitor it using its /status API. A tool for accessing this API is available at port 5580 at whatever host is running WebAgent. If you launch WebAgent manually (if you are not running it as a service, you will need to enable the -a option when you launch WebAgent in order to make the API available. If there is no response at all, that may be because the agent is down or it could be that the agent is up but is currently unable to connect to s-Server. Here is a very simple bash function to check the WebAgent status (returned as waStatus):

function checkwebAgentStatus () {
WAPORT=5580

rm -f /tmp/sqlstream/wastatus.json /tmp/sqlstream/wastatus.log 2>/dev/null

if [ -f /var/run/sqlstream/webagentd.pid ]
then
  # apparently running
  # TODO - check service is not broken
  # wget localhost:$WAPORT/status -O /tmp/sqlstream/wastatus.json -o /tmp/sqlstream/wastatus.log 2>&1 >/dev/null
  # seems curl is more likely to be available
  curl --silent -o /tmp/sqlstream/wastatus.json localhost:$WAPORT/status2>&1 >/dev/null
  if [ $? -ne 0 ]
  then
    # there was some problem with the fetch
    waStatus="UNREACHABLE"
  else
    # we got a status back
    grep "\"message\":\"OK\"" /tmp/sqlstream/wastatus.json 2>&1 >/dev/null
    if [ $? -eq 0 ]
    then
        # status was apparently OK
        waStatus="UP"
    else
      waStatus="BROKEN"
      # save a timestamped copy of the file
      mv /tmp/sqlstream/wastatus.json /tmp/status/wastatus.$tstamp
    fi
  fi
else
  waStatus="DOWN"
fi
}

Self Monitoring of SQLstream s-Server

When the SQLstream server and its supporting components such as WebAgent are to be deployed as part of a production system, it is important to understand which self-monitoring capabilities can be integrated into a monitoring and scaling framework. The following mechanisms can be used to detect an unhealthy server, or to recognize when scaling may be needed.

Webagent monitoring

The webagent (usually deployed with a base URL of http://:5580) has a number of entry points that are useful. See Using Telemetry to Monitor Performance for more information about the telemetry URLs, and Websocket Requests for WebAgent status.

API Description
/status returns a JSON document which proves that webagent is running, but doesn’t indicate whether s-Server is up or down
/telemetry/graph?format=json&nowait=true confirms both webagent and s-Server are up – the nowait parameter ensures that any problem is revealed rapidly – if s-server is down an error status will be returned
/telemetry/graph?format=json returns detailed telemetry data which confirms that the scheduler is running, and also can be used to tell which pumps are currently running, blocked etc. The latest rowtimes recorded against stream operators can be used to identify lags – if lag is increasing beyond acceptable thresholds it may be time to scale up the deployment. Conversely if lags are below threshold and cpu usage is acceptable it may be possible to scale down

Server-side command-line monitoring

  • The $SQLSTREAM_HOME/bin/serverReady script can be used to verify that s-Server is running. To do that remotely, call it from a wrapper script placed in a well known directory (such as /etc/sqlstream) that can set the SQLSTREAM_HOME environment variable by calling /etc/sqlstream/environment. serverReady proves that s-Server is up and accepting connections.
  • As long as your container includes the required wrapper script, this can be called remotely using ssh, docker exec, or other remote call mechanisms.

Open Ports

See Configuring Ports for Guavus SQLstream for a list of (default) ports used by Guavus SQLstream.

Note: Some of these port allocations can be modified either during the installation process, or when starting services / processes. For more information, see the documentation or contact SQLstream Support.

Application-Specific Monitoring

Checking that data is being processed

A very simple way of telling that data is being processed is to arrange for some sort of summary data to be delivered to a file every minute (or at some other suitable frequency). As well as giving application-level statistics (such as counting input rates) this provides immediate assurance that the end-to-end process is running.

  • Use the ECDA adapter for writing to a file (see Writing to the File System). Rotate the file at least every day(FILE_ROTATION_TIME).
  • Periodically monitor the directory for the latest file (as defined by ORIGINAL_FILENAME), and raise an exception if is more than 2 aggregation periods old.
  • arrange to purge files older than a few days from the output directory Of course in a complex system there may be many inputs and outputs; each of these could be monitored by tapping off the processing pipeline in this way.

Using SQLstream as a Monitoring Solution

It is of course possible to build a monitoring system using SQLstream itself. Data flows can be aggregated and analysed; changes to data flow patterns can be detected and alerted on. Long term storage of the data into a database is simple.

Monitoring Options

File based monitoring on the individual SQLstream servers may be inconvenient, unless there is already a standardised monitoring agent present on the application nodes (for example, logStash as a companion to Kibana). As an alternative, consider push/send (or use agents to receive/pull) the periodic messages direct to a central monitoring service. Wherever possible the system being monitored should be unaware of the monitoring framework. The monitoring framework should be able to scale easily from monitoring a simple, single application node to monitoring one or more clusters.

Push vs Pull

One of the first decisions to be taken is whether the application should push monitoring data to the monitor, or the monitor should poll data from the application.

Push Mechanisms Available

  • Network sockets: the lowest common denominator, very simple to implement. No protection against loss of data, and the flow is point-to-point (so there is no failover and no.
  • Kafka: provides a “lingua franca” transport mechanism that includes data store and forward, replay, and replication. The SQLstream application service doesn’t need to be aware of where the monitor is running - it only needs to have a list of (at least some) Kafka brokers.
  • Other MQs: as for Kafka.

A big advantage of the push approach is that the monitor doesn’t have to worry which application nodes to monitor; it can passively accepts data from any running system (as long as messages are all identified by originating server).

Pull Mechanisms Available

  • SQLstream JDBC (SELECT STREAM … FROM …)
  • SQLstream ECDA Agents
  • SQLstream Federation

The monitor can pull data (without polling) by setting up streaming queries at the monitored application nodes. These can pull data into a SQLstream processing pipeline for further monitoring, or ECDA agents can hand off the data into a local flat-file on the monitor server, assuming that will be loaded/processed by the monitoring software.

The drawback with pull-based approaches is that the monitor needs to set up queries to all application nodes (SQLstream s-servers). This can substantially increase the complexity of the monitor. The monitor of course also needs to know exactly which application nodes are expected to be running at any time.

If the SQLstream monitor needs to access the SQLstream application service directly, they both need to use the same version of the SQLstream JDBC driver. If you want to be able to upgrade the monitor independently of the application, you should consciously break this dependency; instead use message transport mechanisms that are not SQLstream version dependent, such as Kafka or network sockets.

Conclusion

Generally, a push-based approach is likely to be the easiest to build and deploy. If Kafka is in wide use, consider benefiting from its store-and-forward and replay capabilities.

Location of monitor analysis processing

Analytical processing can take place either in the application node or at the central monitor. The choice will depend on many factors:

  • Analytical capability of the monitor compared to the SQLstream application.

  • Processing loads at application and monitor servers.

  • Preference for encapsulation of application-specific functionality (the monitor does not / should not need detailed knowledge of* the application).

  • Can the necessary metrics be calculated at the source? Of course some application-specific final stage of aggregation is always required at the monitor - but where possible make this as application-neutral as possible.

  • For example: use a single stream for counting transaction volumes for all transactions across all application nodes. Then new transactions can be added to the application without modifying the monitor.

  • This also allows higher level aggregations (eg count of total volume, as well as volume by transaction).

Monitoring the Monitors

There is always a risk that monitoring systems and subsystems can themselves fail. To mitigate that, we recommend that you always include some basic stand-alone monitoring or alerting on every application and monitor node. This may be as simple as some bash scripts and email alerters, or it could use common network monitoring tools such as Nagios. These tools should be completely independent of the main application components (such as Kafka and SQLstream).

s-Server Features for Managing and Monitoring

This section describes system views and system procedures useful to a system administrator for managing and monitoring a running SQLstream system. All system views and system procedures described here are located in the schema SYS_BOOT.MGMT. The code for the system procedures is in the java package net.sf.farrago.syslib). The SQL command ALTER SYSTEM can also modify various system parameters. For more details, see the topic ALTER SYSTEM in the s-Server Streaming SQL Reference Guide.

Monitoring Views

View Description
OBJECTS_IN_USE_VIEW Displays which objects are in use by which statements.
PUMP_OPTIONS Lists values of each session variable associated with a pump (if any)
SESSION_PARAMETERS_VIEW Displays private parameter settings for the current session
SESSIONS_VIEW Displays all active sessions. Each row describes a session, giving a unique session id, and such attributes as the jdbc connection URL, the user, the transaction status, etc.
STATEMENTS_VIEW Displays all executing statements. Each statement has a row, which gives a unique statement id, and such attributes as the containing session (as its session id, from the sessions_view), the start time, the SQL being executed, the current value of any dynamic parameters, etc.

Additional Metadata Views

View Description
HISTOGRAM_BARS_VIEW Displays statistics about tables, columns, and indexes
HISTOGRAMS_VIEW Displays statistics about tables, columns, and indexes
PAGE_COUNTS_VIEW Displays statistics about tables, columns, and indexes
REPOSITORY_PROPERTIES_VIEW Displays properties of the MDR repository used to store Farrago’s catalog.
ROW_COUNTS_VIEW Displays statistics about tables, columns, and indexes
SEQUENCES_VIEW Displays the state of generated columns

Scripts

Some of these views are used in scripts, such as the support scripts with extension .sql found in the SQLSTREAM_HOME/support/sql directory for use in querying your database schemas and configurations. Other examples of scripts appear in the demo subdirectories of the SQLSTREAM_HOME/demo directory.

In SQLline you can use the !run command to execute any such script, using the following template:

!run <support script name>.sql

For more details on using SQLline, see the topic SQLline in the s-Server Getting Started Guide. Most script names describe what they do, and some scripts use the views named above. Here are three examples:

!run showSessions.sql

This command uses the SESSIONS_VIEW in running the following command:

select id, system_user_name, session_user_name,url from sys_boot.mgmt.sessions_view;
!run showStatements.sql

This command uses the STATEMENTS_VIEW in running the following command:

SELECT * from sys_boot.mgmt.statements_view;
SELECT * from table(sys_boot.mgmt.session_parameters());

This command uses the SESSION_PARAMETERS_VIEW in running the following command:

SELECT * from sys_boot.mgmt.statements_view;

The $SQLSTREAM_HOME/support/sql directory includes the following support scripts:

Name Name Name
showFennelConfig.sql showFarragoConfig.sql showForeignServers.sql
showForeignStreams.sql showProcedureColumns.sql showProcedures.sql
showPumps.sql showSchemas.sql showSessionParams.sql
showSessions.sql showStatements.sql showSystemInfo.sql
showTableColumns.sql showTableColumnsAll.sql showUsers.sql

Management Table Functions

These are all functions in the SYS_BOOT.MGMT schema; like other SYS_BOOT.MGMT objects they can be referenced without specifying the catalog and schema.

UDX Description
REPOSITORY_INTEGRITY_VIOLATIONS() Checks for integrity violations in the MDR repository.
THREADS() Lists threads running in the JVM.
THREAD_STACK_ENTRIES() Dumps the stack for each thread
SYSTEM_INFO() Retrieves information about the environment in which SQLstream is running (e.g., environment variables, system properties, and OS and hardware information).
PERFORMANCE_COUNTERS() Monitors performance counters such as cache hit ratios.

Example usage:

   SELECT * FROM TABLE(thread_stack_entries()) WHERE thread_id = 18;

Utility functions

Function Description
SLEEP Can be used to introduce an artificial delay into a SQL statement.

Management Procedures

You can call these procedures using code along the following lines:

CALL kill_statement(39);
Procedure Description
FLUSH_CODE_CACHE Discards all entries from the global code cache.
STAT_SET_ROW_COUNT Synthesizes an artificial row count for a table
STAT_SET_PAGE_COUNT Synthesizes an artificial page count for an index
STAT_SET_COLUMN_HISTOGRAM Synthesizes an artificial histogram for a column
KILL_STATEMENT(stmt_id) An administrator can kill an executing, specified by id (as seen in STATEMENTS_VIEW). Killing a statement aborts its execution and frees all resources allocated to the statement. A corresponding JDBC Statement will be invalidated, and any JDBC ResultSet will be terminated (with an SQLException to indicate that the statement was killed from outside.)
KILL_STATEMENT_MATCH(‘a sql fragment’) Terminates all statements whose sql commands contain the provided text.
KILL_SESSION(session_id) An administrator can kill an active session by calling this procedure. The session is specified by the id that appears in the sessions_view. Killing a session kills all statements executing in the session, aborts any transactions, and frees all allocated resources. A corresponding jdbc Connection is invalidated: any further use gets an SQLException. (However, if the optional boolean cancel_only parameter is specified as true for any of the kill calls, then the effect is just to request an asynchronous cancel and return immediately; no wait or resource release is performed, and the session or statement remains valid.)

Getting a Stack Dump

For troubleshooting purposes, you may at times need to get a stack dump for s-Server. To get a stack dump, we recommend using the following command

jstack <process id>

This is more useful if you specify a file name for the stackdump, as in the following:

jstack 12345 > stackdump.txt