Skip to main content

Built-in Operations

script

This is used to allow other languages other than javascript to process data. Note that this is not meant to be highly efficient as it creates a child process that runs the specified script in the job. Communication between teraslice and the script is done stdin and stdout with the data format expected to be JSON. If another language is needed, it might be a better idea to use C++ or rust to add a module that Node can create native bindings so that you can require the code like a regular javascript module.

ConfigurationDescriptionTypeNotes
_opName of operation, it must reflect the exact name of the fileStringrequired
commandwhat command to runStringrequired
argsarguments to pass along with the commandArrayoptional
optionsObj containing options to pass into the process envObjectoptional
assetName of asset containing command to runStringoptional

Note: nodejs 8.x spawn documentation

Example configuration:

{
"_op": "script",
"command": "someFile.py",
"args": ["-someFlag1", "-someFlag2"],
"asset": "someAsset",
"options": {}
}

Example Job: examples/jobs/script/test_script_job.json

{
"name": "ES DataGen test script",
"lifecycle": "persistent",
"workers": 1,
"assets": ["standard"],
"operations": [
{
"_op": "data_generator",
"size": 100000,
"stress_test": true
},
{
"_op": "script",
"command": "test_script.py",
"asset": "test_script",
"args": [""],
"options": {}
},
{
"_op": "noop"
}
]
}

script usage example:

  • Create and upload asset
cd examples/jobs/script
zip -r test_script.zip test_script
curl -XPOST -H "Content-Type: application/octet-stream" localhost:5678/assets --data-binary @test_script.zip
  • Submit Job
curl -XPOST localhost:5678/jobs -d@test_script_job.json

stdout

This is primarily used for develop purposes, it console logs the incoming data, it's meant to inspect in between operations or end of outputs

ConfigurationDescriptionTypeNotes
limitSpecify a number > 0 to limit the number of results printed to the console log. Default is to print all results.Numberoptional

Example configuration

{
"_op": "stdout"
}

noop

This processor simply passes the data through, unmodified. It is primarily used for develop purposes.

Example configuration:

{
"_op": "noop"
}

There is no configuration for this processor.

delay

Wait a specific amount of time, and passes the data through.

ConfigurationDescriptionTypeNotes
msMilliseconds to delay before passing data throughdurationoptional, defaults to 100

Example configuration:

{
"_op": "delay",
"ms": 1000
}

test-reader

Slice and fetch data specified in a file. Useful for testing processors in teraslice-test-harness.

ConfigurationDescriptionTypeNotes
passthrough_sliceIf set to true then the fetcher will return what it is given, expects the value to be an arrayFile Pathoptional
fetcher_data_file_pathFile to path to JSON array of data records.File Pathoptional
slicer_data_file_pathFile to path to JSON array of slice request.File Pathoptional

Example configuration for reading for a file:

{
"_op": "test-reader",
"fetcher_data_file_path": "/path/to/fetcher-data-file.json"
}

/path/to/fetcher-data-file.json

[
{
"foo": "bar"
},
{
"foo": "baz"
},
]

Example test for pass_through_slice:

const { WorkerTestHarness, newTestJobConfig } = require('teraslice-test-harness');

describe('Pass Through Test', () => {
const job = newTestJobConfig({
operations: [
{
_op: 'test-reader',
passthrough_slice: true
},
{ _op: 'noop' }
],
});

const harness = new WorkerTestHarness(job, {});

beforeAll(() => harness.initialize());
afterAll(() => harness.shutdown());

it('should be able to run a slice', async () => {
const input = [
{ foo: 'bar' },
{ foo: 'baz' }
];

const output = await harness.runSlice(input);
expect(output).toEqual(input);
});
});