StreamLab Pipeline Guides Overview

Pipeline guides are collections of commands, suggestions, and scripts that let you CREATE VIEWs on data sources. These views are composed of SQL; you use the Pipeline Guide interface to generate this SQL. You use pipeline guides to prepare sources for dashboards. You can also use pipeline guides to run analytics on streaming sources.

For example, you might have a guide that extracted the year (often the first four characters) from a timestamp column and added these to a new column called YEAR. Other guides might split a column, rename it, merge two, and so on. You might also have a guide that adds a running average column.

This section contains the following topics:

This topic contains the following subtopics:

Commands

Commands are sets of operations that you can perform on the pipeline guide's data source. These include, for example, commands to parse the source as a W3C log, to parse a timestamp, to split a column at a given character, to remove a column, and to rename a column. Commands are grouped by functionality, and you can switch between command sets by clicking the Select Command Set button in the top left hand corner of the pipeline guide.

<img src="/images/sl/sl_guide_commands.png"/> 

Suggestions

Suggestions appear in the Suggestions list in the middle left of the Guide interface. Suggestions change depending on your data source and your selection in the Output view.

<img src="/images/sl/sl_guide_suggestions.png" width="60%"/> 

Scripts

As you add suggestions to the script, these suggestions are implemented as SQL, with changes visible in the Output view. You can remove items from the script by clicking the - button. You can also visualize changes by clicking the View Dashboard button (/images/sl/sl_guide_view_dashboard_button.png).

<img src="/images/sl/sl_guide_script.png" width="60%"/> 

Adding a Pipeline Guide

Guides are collections of scripts that let you manipulate SQL objects in StreamLab

To add a pipeline guide:

  1. Drag it from the left hand column into the middle column.

  2. Click the pipeline guide to edit its properties. The Edit Guide page opens.

  3. Select a source or sink for the pipeline guide's input on the left side of the page. Select an output schema and prefix for the output view on the right side of the page. By default, StreamLab uses the project schema for the pipeline guide. You need to have previously created the source or sink in order for it to be available here.

Opening a Pipeline Guide

To open a pipeline guide, you click it. The Pipeline Guide page opens.

Here, you:

  • Select commands.
  • Implement the suggestions that result from these commands, which are added to the Script.
  • View the results of the Script in the Output view.

Using Pipeline Guides

Once you create a pipeline guide and select a source for the pipeline guide, you can begin adding steps to its script. To do so:

  • Choose a command category.
  • Choose a command from the command tabs and enter criteria for the command. The command appears as a suggestion.
  • Click the + icon to add the command to the Pipeline Guide script.
  • The results of the script appear in the Output View window.

Sample Pipeline Guide Process

Let's say you start with a simple log file with a stock ticker feed with date, close, volume, open, high, and low:

"2013/05/28","881.2700","2257410.0000","883.5000","892.1400","880.4000"

The log file initially appears in the Output view with two columns, one for rowtime and one column called MESSAGE that contains all the values separated by commas:

 <figure>
<img src="/images/sl/sl_sample_process_initial_output_view.png"/> 

Your first step here is to separate out the values.

  1. Begin by selecting the MESSAGE column. When you do, StreamLab offers you several suggestions in the Suggestions list:

    At this point you can simply click the + to the right of the first suggestion, "Split column MESSAGE using the automatic pattern Comma-Separated Values (CSV)".

  2. You can also select the Basic:Split command to view more information on the splitting operation.

  3. Note that the Split command has MESSAGE preselected for the column. That is because you have selected MESSAGE in the Output view. Also note that the options Auto and Comma-Separated Values (CSV) have been preselected. That is because these are the most likely selections for the column.

  4. Click the + sign to the right of the Split...CSV suggestion. The Message column splits into six separate columns.

  5. Next, you want to rename the new columns to date,close,volume,open,high,low. You can do so using the Basic:Rename List command. First, select columns MESSAGE_1 through MESSAGE_6. If it is not already selected, select Basic from the Commands popup menu, then select the Rename List tab.

  6. Next, enter the following into the To List field:"date,close,volume,open,high,low"

  7. Click the + sign to the right of the Rename Columns... suggestion. The columns are renamed:

  8. Next, cast the columns Close, Volume, and Open as column type DOUBLE. You can do so using the Basic:Cast command. First, select columns Close, Volume and Open. Next, select the CAST command and select DOUBLE from type. Note that the three columns appear in the Columns field. Click the + sign to the right of the Cast Columns... suggestion. The columns are cast as DOUBLE.

  9. Next, you can create a new column with a running average for close. To do so, select the column Close and choose the Analytics:Running Average command.

  10. Click the + sign to the right of the Add column named... suggestion. A new column called Avg appears:

Viewing Column Details and Suggestions

As rows stream into StreamLab, a piece of software called the scrutinizer continually checks these rows for patterns, offering suggestions for command that you can apply to the source.

To view details and suggestions on a column, mouse over the column's heading. StreamLab offers notes and suggestions about the column. For example, StreamLab notes that the column below might contain hostnames or IP addresses, or longitudes, or bearings. These suggestions can help you identify a column's contents.

<img src="/images/sl/sl_column_info.png" width="60%"/> 

Column Color Coding

Columns are coded according to the following color scheme:

  • White columns indicate columns of type text, such as VARCHAR (1024).
  • Yellow columns indicate columns of type time, such as TIMESTAMP.
  • Green columns indicate columns of type numerical, such as DOUBLE and INTEGER.

For example, in the screen grab below, columns *ROWTIME *and when are time columns, columns *id *and title are text columns, and columns magnitude, latitude, and longitude are type numerical.

Viewing Input stream

You can view the input (original) stream from the source by dragging the blue bar at the top of the Output view window.

Viewing Input Statistics

While StreamLab does calculations on all rows in a source, it only displays a subset of rows in the Output View. This is because sources can have massive amounts of rows, and these would flow by too fast to view meaningfully. The Output View header bar displays statistics for both actual rows and rows displayed.

  • The number of rows processed by the StreamLab Scrutinizer appear to the right of the Output View name.
  • The percentage of rows displayed in Output view appears to the right of the rows per second number.

Using the Output View

The Output View displays the results of the current Script. When you initially open a pipeline guide, the Output View displays the raw information of the source. (Because sources may have a large, fast-moving set of rows, the Output View displays a representative sample of the source.)

Once you implement Script items, the Output View changes to display the results of the Script.

StreamLab displays information on the source in two ways:

Viewing the Input View

You can view the input (original) stream from the source by dragging the blue bar at the top of the Output view window.

sl_view_input_stream

Viewing Input Statistics

While StreamLab does calculations on all rows in a source, it only displays a subset of rows in the Output View. This is because sources can have massive amounts of rows, and these would flow by too fast to view meaningfully. The Output View header bar displays statistics for both actual rows and rows displayed.

  • The number of rows processed by the StreamLab Scrutinizer appear to the right of the Output View name.

    • The percentage of rows displayed in Output view appears to the right of the rows per second number.

Suggestions Overview

Suggestions appear in the Suggestions list in the middle left of the Pipeline Guide interface. Each suggestion offers detail on the suggestion's action. Suggestions change depending on your data source and your selection in the Output view. Every command that you implement appears as a suggestion. To implement the command, you click the + icon to the right of the suggestion.

<img src="/images/sl/sl_guide_suggestions.png"/> 

Suggestions also change depending on the column you have selected in Output view. To get suggestions for a column, select it.

<img src="/images/sl/sl_guide_suggestion_column_selected.png"/> 

Selecting Within Cells

You can make cells active in Output View by double-clicking them. Once cells are active, you can select within the cell. When you select characters in a cell, StreamLab will automatically fill in command fields, and also change suggestions, both based on your selection.

<img src="/images/sl/sl_guide_suggestion_selecting_cell.png" width="60%"/> 

Viewing SQL Generated by Pipeline Guide

As you add commands to guides, the guides generate SQL. When you execute this SQL, the view is created or modified in your selected schema with the changes shown in the Output View window. To view or export SQL, click the View Log button in the upper right corner of the Pipeline Guides page:

<img src="/images/sl/sl_guide_view_log_button.png" width="80%"/> 

The Log window opens, listing all the SQL that you have generated . You can also run the SQL again by clicking the Execute button.

<img src="/images/sl/sl_guide_view_log.png"/> 

Switching Sources for a Pipeline Guide

In the Guide interface, you can switch sources. Doing so runs the pipeline guide script on another source.

To switch sources, click the Source button on the top right of the guide page under the colored icons:

This brings up the source selection page, with all sources and potential sources shown. You can select a different source - but if you have already created steps in the guide, you should make sure that the new source contains all the same column names that the pipeline depends on.