Skip to content

SQLStreamBuilder Job Management

Every time you execute SQL, it becomes a job. It runs on the deployment as an Apache Flink job. You can manage running them using the jobs tab.

jobs


Starting a job

To run a ContinuousSQL job, you must first compose a SQL statement in the Compose tab and select the Execute button. A job is spawned that performs the SQL continuously on the stream of data (Source Virtual Table) and pushes the results to a Sink Virtual Table (if configured) as well as materializing the results into a materialized view (if configured).

SQL

A job can not be started without some SQL being in the SQL editor window or you will get the [ERROR] You must provide a SQL query message. Specify the SQL you would like to run and select Execute. For help with creating valid SQL, see the Getting Started Guide and the SQL Syntax Reference and Fraud example.

There are two logical phases to running a job:

  1. Parse: The SQL is parsed and checked for validity and then compared against the virtual table schema(s) for correct typing and key/columns. This is done via a bespoke Eventador service.

  2. Execution: If the parse phase is successful, a job is dynamically created, and run on an open slot on your cluster. The job is a valid Apache Flink job.

The job creation phases can monitored via the Logs tab. The job is run on the cluster specified in the Environment Cluster drop down.

As results are sampled, they will appear in the Results tab in a tabular/columnar format. It's important to remember these results are a sample, not the entire result of the new stream being created of the output from the query. The entire result set is sent to the Sink Virtual Table and/or a Materialized View (if configured).

Sink Virtual Table

The Sink Virtual Table is an optional argument to a job that specifies the destination for the continuous results. You can configure various sink types in the Virtual Tables tab, then the Sink Virtual Tables sub-tab. If the selection is None then results are not sent to a sink but are available as sampled to the screen or to a materialized view (if configured).

Materialized Views

Materialized Views (MV) can be created as mutating snapshots of the query result data updated by a given key. The data is always the latest representation of itself, by key (analogous to a primary key in most RDBMS systems). To configure an MV, select the Materialized View sub-tab on the Compose tab.

  • Select a Primary Key for the MV. If this list is empty, then no SQL is specified on the SQL sub-tab, or that SQL is invalid. Select a key as a primary key for the MV. All data will be updated by this key.
  • Select a Retention Period. Data not being mutated during this period will be removed from the view.
  • Enable or Disable Recreate on Job Start. If enabled, the MV will be deleted when a job is started or re-started.
  • Enable or disable Ignore NULLS. If enabled, NULL values will NOT update values that are non null - they will be ignored.
  • Select an API Key.

You can then configure a REST endpoint to query this MV. Multiple query conditions can be created to allow for various ways to query the same data. This is sometimes referred to as a 'pull query'.

Resource allocation and capacity planning

When starting a job, the number of slots consumed on the specified cluster is equal to the Parallelism setting (see Advanced settings below). The default is 1 slot. If there are insufficient available slots to start a job you will see the [ERROR] Job execution failed. The StreamBuilder server returned an error: Insufficient task manager slots available. message. You can add slots using the Cluster menu on the left pane, then select Workers->Add.


Monitoring a job

  • Click on SQLStreamBuilder on the side menu.
  • Click on the SQL Jobs tab.
  • Select the job you would like to monitor.
  • Select the Log tab on the bottom pane.

The Logs tab shows detailed logging from the Flink job itself - any exceptions will be logged here. As the job processes messages, a count of messages processed is spooled to the log every 5 seconds:

8/13/2019, 9:42:49 AM - INFO - 2019-08-08 19:24:17.752157 : 5 Second Count: 2502

Stopping a job

  • Click on SQLStreamBuilder on the side menu.
  • Click on the SQL Jobs tab.
  • Click on the red stop button for the job you would like to stop.

Restarting a job

  • Click on SQLStreamBuilder on the side menu.
  • Click on the SQL Jobs tab.
  • Click on the select box below the State column and select Cancelled or Failed to see stopped jobs.
  • Select the job you would like to restart.
  • Select the Details tab at the bottom.
  • Select the Edit Selected Job button - this will bring up the SQL window in Edit Mode.
  • Select Restart to restart the job.

Editing a job

Editing a job is similar to restarting a job.

  • Click on SQLStreamBuilder on the side menu.
  • Click on the SQL Jobs tab.
  • Select the job you would like to edit.
  • Select the Details tab at the bottom.
  • Select the Edit Selected Job button - this will bring up the SQL window in Edit Mode.
  • Edit/alter the Target Deployment, Sink Virtual Table and the SQL itself as needed then select Restart to restart the job. The job will be stopped and restarted.

Sampling data for a running job

You can sample data from a running job. This is useful if you want to inspect the data to make sure the job is producing the results you expect. To sample the data from a running job:

  • Click on SQLStreamBuilder on the side menu.
  • Click on the SQL Jobs tab.
  • Select the job you would like to edit.
  • Select the Details tab at the bottom.
  • Select the Edit Selected Job button - this will bring up the SQL window in Edit Mode.
  • Click the Sample button. Results will be sampled and displayed in the results window. If there is no data meeting the SQL query, sampling will give up after a few tries.

Advanced Job Management

When starting a job a number of more advanced features can be selected to configure various properties of the job. To configure these features select the Show Advanced Settings button on the Compose tab.

Target SSB Version

This is the SQLStreamBuilder (SSB) version that the job is running under. Typically choosing the latest version is the best option. If you have an older job running on a previous version, simply restarting the job with the new version will upgrade it to the latest.

Restart Strategy & Restart Retry Time

The job will be restarted after Restart Retry Time seconds if it's set to Always. It will not be restarted if the user selects Stop from the SQL Jobs tab. If set to Never the job will not be restarted unless the user selects Restart from the Compose tab.

Job Parallelism

The number of threads to start to process the job. Each thread consumes a slot on the cluster.

Sample Interval

How often to sample data (in milliseconds) from the output stream. 1000ms is common and recommended.