sample_exact

given an array of JSON documents will return an array containing a shuffled subset of those input documents. The size of the subset will be the percentage multiplied against the length of the array rounded down.

Usage

Reduce and shuffle the returned array

Example of a job using the sample_exact processor

{
    "name" : "testing",
    "workers" : 1,
    "slicers" : 1,
    "lifecycle" : "once",
    "assets" : [
        "standard"
    ],
    "operations" : [
        {
            "_op": "test-reader"
        },
        {
            "_op": "sample_exact",
            "percent_kept": "50",
        }
    ]
}

Example of the data and the expected results

const data = [
    DataEntity.make({ name: 'lilly', otherField: 1 }),
    DataEntity.make({ name: 'willy', otherField: 2 }),
    DataEntity.make({ name: 'billy', otherField: 3 }),
    DataEntity.make({ name: 'dilly', otherField: 4 }),
]

const results = await processor.run(data);

results === [
    { name: 'dilly', name_again: 'dilly', otherField: 4  },
    { name: 'willy', name_again: 'willy', otherField: 2  },
]

Parameters

Configuration	Description	Type	Notes
_op	Name of operation, it must reflect the exact name of the file	String	required
percent_kept	The percentage of documents to be kept from the input. Must be between 0 and 100	Number	required, defaults to 100

Usage​

Reduce and shuffle the returned array​

Parameters​

Usage

Reduce and shuffle the returned array

Parameters