Skip to main content

sample_random

Given an array of JSON documents, will return an array containing a subset of those input documents. It iterates through the array and generates a random number between 1 and 100 for each record, and if the number <= probability it is kept. Must be between 0 and 100, with 100 keeping all records and 0 rejecting all records.

Usage

Reduce the returned array

Example of a job using the sample_random processor

{
"name" : "testing",
"workers" : 1,
"slicers" : 1,
"lifecycle" : "once",
"assets" : [
"standard"
],
"operations" : [
{
"_op": "test-reader"
},
{
"_op": "sample_random",
"percent_kept": "50",
}
]
}

Example of the data and the expected results

const data = [
DataEntity.make({ name: 'lilly', otherField: 1 }),
DataEntity.make({ name: 'willy', otherField: 2 }),
DataEntity.make({ name: 'billy', otherField: 3 }),
DataEntity.make({ name: 'dilly', otherField: 4 }),
]

const results = await processor.run(data);

results === [
{ name: 'lilly', otherField: 1 },
{ name: 'billy', otherField: 3 },
]

Parameters

ConfigurationDescriptionTypeNotes
_opName of operation, it must reflect the exact name of the fileStringrequired
probability_to_keepThe probability of the record being kept. It iterates through the array and generates a random number between 1 and 100, and if the number <= probability it is kept. Must be between 0 and 100, with 100 keeping all records and 0 rejecting all recordsNumber, defaults to 100required