file_exporter
The file_exporter processor will export data to the filesystem local to the processes running this job.
For this processor to run, a path is required in the configuration. All intermediate directories must pre-exist, and the workers will need to have adequate permissions to write to that directory.
Usage
Write ldjson to file and restrict fields
This is an example of converting the input data into ldjson and sent to the worker's /app/data/test_files directory. Since fields is specified, only the fields listed will be allowed through. Since file_per_slice is set to true, each slice will create a new file, which is the workerId as well as the slice order number.
Example Job
{
"name": "file_exporter",
"lifecycle": "once",
"workers": 1,
"max_retries": 0,
"assets": [
"file",
"standard"
],
"operations": [
{
"_op": "test-reader"
},
{
"_op": "file_exporter",
"path": "/app/data/test_files",
"format": "ldjson",
"line_delimiter": "\n",
"file_per_slice": true,
"fields": ["name"]
}
]
}
Here is a representation of what the processor will do with the configuration listed in the job above
const firstSlice = [
{ name: 'chilly', age: 33 },
{ name: 'willy', age: 31 },
{ name: 'philly', age: 43 },
];
const results = await process.run(firstSlice);
// the processor will always return the input records.
results === firstSlice;
// file made at /app/data/test_files/{WORKER_ID}.0
`
"{"name":"chilly"}"\n
"{"name":"willy"}"\n
"{"name":"philly"}"\n
`
const secondSlice = [
{ name: 'fred', age: 33 },
{ name: 'art', age: 31 },
{ name: 'herbert', age: 43 },
];
const results = await process.run(secondSlice);
// the processor will always return the input records.
results === secondSlice;
// file made at /app/data/test_files/{WORKER_ID}.1
`
"{"name":"fred"}"\n
"{"name":"art"}"\n
"{"name":"herbert"}"\n
`
Write csv to file
This test job will send data to csv files that
include column headers in the worker's /app/data/test_files directory.
Example Job
{
"name": "file_exporter",
"lifecycle": "once",
"workers": 1,
"max_retries": 0,
"assets": [
"file",
"standard"
],
"operations": [
{
"_op": "test-reader"
},
{
"_op": "file_exporter",
"path": "/app/data/test_files",
"format": "csv",
"include_header": true
}
]
}
Here is a representation of what the processor will do with the configuration listed in the job above
const firstSlice = [
{ name: 'chilly', age: 33 },
{ name: 'willy', age: 31 },
{ name: 'philly', age: 43 },
];
const results = await process.run(firstSlice);
// the processor will always return the input records.
results === firstSlice;
// file made at /app/data/test_files/{WORKER_ID}.0
`
"name","age"\n
"chilly", 33"\n
"willy", 31"\n
"philly", 43"\n
`
Parameters
| Configuration | Description | Type | Notes |
|---|---|---|---|
| _op | Name of operation, it must reflect the exact name of the file | String | required |
| _api_name | Name of api used for file_exporter | String | required |
API usage in a job
In file_assets v4, teraslice apis must be set within the job configuration. Teraslice will no longer automatically setup the api for you. All fields related to the api that were previously allowed on the operation config must be specified in the api config. Configurations for the api should no longer be set on the operation as they will be ignored. The api's _name must match the operation's _api_name.
{
"name": "file_exporter",
"lifecycle": "once",
"workers": 1,
"max_retries": 0,
"assets": [
"file",
"standard"
],
"apis": [
{
"_name": "file_sender_api",
"path": "/app/data/test_files",
"format": "tsv",
"file_per_slice": true,
"include_header": true
}
],
"operations": [
{
"_op": "data_generator",
"size": 500000
},
{
"_op": "file_exporter",
"_api_name": "file_sender_api"
}
]
}