Skip to content

SQLStreamBuilder Concepts Guide

ssb

Jobs

SSB runs in an interactive fashion where you can quickly see the results of your query and iterate on your SQL syntax. It also works in a persistent fashion where the SQL queries you execute run as jobs on the Flink cluster-operating on boundless streams of data until cancelled. This allows you to author, launch, and monitor stream processing jobs within SSB. Every SQL query is a job. The Job Management Guide has details on managing jobs.

Virtual Tables

SSB processes data from a source to a sink using the SQL specified (or just to the browser only). When a source or sink is created, you assign it a virtual table name. You then use that virtual table name to address the FROM table in your query (source) and specify the destination (sink) in the interface. This allows you to create powerful aggregations, filters - any SQL expression against the stream. See more information about Virtual Tables.

It's important to note that because this is streaming SQL, the query does not end until canceled, and results may not show up immediately if there is a long time window, or if data matching the criteria isn't being streamed at that moment.

Materialized Views

SSB has the capability to materialize results from a Continuous SQL query to a persistent view of the data for reading via REST. Applications can use this mechanism to 'query streams' of data without deploying expensive, slow, and maintenance intensive database systems. Materialized Views (MV) are built into the platform and require no configuration or maintenance. They act like a special kinda of sink, and may be used in place of a sink.

MV's are kept current with the mutating stream - they are updated by a primary key as data flows through the system. Data is updated by key and represents the latest view of the data by key. For example: vehicleID Z latest latitude and longitude is X an Y, as it moves latitude and longitude are updated by vehicleID. The primary key is defined at creation time and is immutable.

Querying MV's is done via a GET request over REST, and returns a JSON response as "Content-Type: application/json". It's important to note that queries aren't defined at query time. Rather, they are curated, saved, and granted access via the Eventador Platform.

Routing the query result data to a sink

SQLStreamBuilder is unique in that it lets you iterate on your SQL, but it also allows you to build robust data pipelines based on that SQL. When you execute a query, the results go to the Virtual Table Sink that you selected in the SQL window. This allows you to create aggregations, filters, joins, etc and then "pipe" the results to a sink. The schema for the results is the schema that you created when you ran the query (see above).

Results are also sampled to your browser so you can inspect the data, and iterate on your query. 100 rows are sampled back at a time. You can sample 100 more rows by clicking the Sample button. If you select Results in Browser for the Sink Virtual Table then results are only sampled to the browser.

Kafka Sinks

Kafka sinks are an output virtual table to send the data resulting from the query to. Kafka sink data is in JSON format.

Amazon S3 Sinks

An Amazon S3 Sink, is an empty S3 bucket that will be filled with objects containing the following filename layout:

2019-09-17T19:24:41.333Z

The contents of these files is one JSON message per line. The S3 sink is currently only supported in the same region as your cluster.

Schema

Schema is defined for a given source when the source is created. You define a schema using JSON for the incoming data, and each virtual table has its own schema. For instance, if the source is Apache Kafka, you specify a topic name and you define a schema for the data.

For sinks, the schema is defined as the table structure in the SQL statement, both for column names and datatypes. SSB supports aliases for column names like SELECT bar AS foo FROM baz as well as CAST(value AS type) for type conversions.

SSB supports a couple of meta commands like show tables and describe <table> that make it easy to understand the schema if you are familiar with most database platforms.

Null Keys

When a key is removed from the messages coming from a source, SSB will happily continue to consume messages; however, upon sinking, it will mark the missing key as NULL. Similarly, when a key is removed from the source schema but not the messages coming from the source, SSB will ignore the key on the incoming stream.

Source and Sink availability

Available Sources

Source Availability
Kafka Yes
Kinesis TBA
MongoDB Yes (debezium)

Available Sinks

Source Availability
Kafka Yes
Amazon S3 Yes
ElasticSearch Yes
JDBC (Relational DB) TBA
MongoDB TBA
Webhook Yes
Google GCS Yes

User Functions

SQLStreamBuilder User Functions allow you to write powerful function in Javascript that you can use to enhance the functionality of SQL. User Functions can be simple tranlation functions like Celsius to Fahrenheit, or more complex business logic, or even looking up data from external sources.

User Functions, once created can be referenced in any SQL statement as a SQL function. You can build a library of functions and your team can use them as needed. User functions also expose all the entire Java8 API further enhancing the usefulness of SQLStreamBuilder. For more information on User Functions, see the User Function Guide.