Skip to main content

file_exporter

The file_exporter processor will export data to the filesystem local to the processes running this job.

For this processor to run, a path is required in the configuration. All intermediate directories must pre-exist, and the workers will need to have adequate permissions to write to that directory.

Usage

Write ldjson to file and restrict fields

This is an example of converting the input data into ldjson and sent to the worker's /app/data/test_files directory. Since fields is specified, only the fields listed will be allowed through. Since file_per_slice is set to true, each slice will create a new file, which is the workerId as well as the slice order number.

Example Job

{
"name": "file_exporter",
"lifecycle": "once",
"workers": 1,
"max_retries": 0,
"assets": [
"file",
"standard"
],
"operations": [
{
"_op": "test-reader"
},
{
"_op": "file_exporter",
"path": "/app/data/test_files",
"format": "ldjson",
"line_delimiter": "\n",
"file_per_slice": true,
"fields": ["name"]
}
]
}

Here is a representation of what the processor will do with the configuration listed in the job above

const firstSlice = [
{ name: 'chilly', age: 33 },
{ name: 'willy', age: 31 },
{ name: 'philly', age: 43 },
];

const results = await process.run(firstSlice);

// the processor will always return the input records.
results === firstSlice;

// file made at /app/data/test_files/{WORKER_ID}.0
`
"{"name":"chilly"}"\n
"{"name":"willy"}"\n
"{"name":"philly"}"\n
`

const secondSlice = [
{ name: 'fred', age: 33 },
{ name: 'art', age: 31 },
{ name: 'herbert', age: 43 },
];

const results = await process.run(secondSlice);

// the processor will always return the input records.
results === secondSlice;

// file made at /app/data/test_files/{WORKER_ID}.1
`
"{"name":"fred"}"\n
"{"name":"art"}"\n
"{"name":"herbert"}"\n
`

Write csv to file

This test job will send data to csv files that include column headers in the worker's /app/data/test_files directory.

Example Job

{
"name": "file_exporter",
"lifecycle": "once",
"workers": 1,
"max_retries": 0,
"assets": [
"file",
"standard"
],
"operations": [
{
"_op": "test-reader"
},
{
"_op": "file_exporter",
"path": "/app/data/test_files",
"format": "csv",
"include_header": true
}
]
}

Here is a representation of what the processor will do with the configuration listed in the job above

const firstSlice = [
{ name: 'chilly', age: 33 },
{ name: 'willy', age: 31 },
{ name: 'philly', age: 43 },
];

const results = await process.run(firstSlice);

// the processor will always return the input records.
results === firstSlice;

// file made at /app/data/test_files/{WORKER_ID}.0
`
"name","age"\n
"chilly", 33"\n
"willy", 31"\n
"philly", 43"\n
`

Parameters

ConfigurationDescriptionTypeNotes
_opName of operation, it must reflect the exact name of the fileStringrequired
_api_nameName of api used for file_exporterStringrequired

API usage in a job

In file_assets v4, teraslice apis must be set within the job configuration. Teraslice will no longer automatically setup the api for you. All fields related to the api that were previously allowed on the operation config must be specified in the api config. Configurations for the api should no longer be set on the operation as they will be ignored. The api's _name must match the operation's _api_name.

{
"name": "file_exporter",
"lifecycle": "once",
"workers": 1,
"max_retries": 0,
"assets": [
"file",
"standard"
],
"apis": [
{
"_name": "file_sender_api",
"path": "/app/data/test_files",
"format": "tsv",
"file_per_slice": true,
"include_header": true
}
],
"operations": [
{
"_op": "data_generator",
"size": 500000
},
{
"_op": "file_exporter",
"_api_name": "file_sender_api"
}
]
}