SQLStreamBuilder Input Transforms

Introduction

Input Transforms are a powerful way to clean, mutate, and arrange data that is poorly organized, has changing format, has data that is junk or otherwise hard to use. You create a javascript function to transform the data after it’s been consumed from the source and before queries happen in SQLStreamBuilder. You define your schema on the output of the transform. It’s great for sources that:

  • Aren’t in your control, perhaps a data feed from a 3rd party provider
  • Are some hard to change format (maybe a legacy feed, other teams of feeds within your organization)
  • Are inconsistent message to message
  • Have no uniform keys, or data without keys (like a nested arrays), but are still valid JSON.
  • The schema you want doesn’t match the incoming topic exactly.

input_transforms

Simply put, if your input data is JSON, and it’s hard to express with a schema - Input Transformations are for you. With SQLStreamBuilder + Input Transforms your schema doesn’t have to be perfect for you to start writing SQL stream processors.

  • Input Transformations are defined for a virtual table source.
  • One transformation per source.
  • Takes record as an JSON formatted string input variable. The input is always named record.
  • Emits the output of the last line to the calling JVM. It could be any variable name. In our examples we use out and emit it as a JSON formatted string. It must be JSON formatted.

A basic input transformation looks like this:

var out = JSON.parse(record);     // record is input, parse JSON formatted string to object
                                  // do whatever you want here..
JSON.stringify(out);              // emit JSON formatted string of object

Creating an input transform

  • Log into SQLStreamBuilder.
  • Select SQLStreamBuilder
  • Select the Virtual Tables tab.
  • Select the edit button (pencil icon) for the Virtual Table Name you would like to add a transform to. If you haven’t created a source yet, see the getting started guide.
  • Select the Display/Edit Transformations link.
  • Add your transform code.
  • Be sure the output of your transform matches the Manual Schema definition.

An example transform

For example, perhaps the input data has mixed case keys, and we want to normalize it to an uppercase key.

trans

The input data looks like this:

{"temp": 12}
{"Temp": 44}

The input transform would look like:

var parsedVal = JSON.parse(record);
out = {};

Object.keys(parsedVal).forEach(function(key) {
    out[key.toUpperCase()] = parsedVal[key]
});

JSON.stringify(out);

and the resulting data would look like:

{"TEMP": 12}
{"TEMP": 44}

and we would make a schema in for my_virtual_table that looks like:

{
  "namespace": "temps",
  "type": "record",
  "name": "com.eventador.temps",
  "fields": [
  {
    "name": "TEMP",
    "type": "int"
  }
}

Finally we would query it like:

SELECT temp FROM my_virtual_table WHERE temp > 20;

The resulting datastream would look like:

{"TEMP": 44}