Data-Mate: `Data Frame Data Frame Data Frame`
data-frame/DataFrame.DataFrame
An immutable columnar table with APIs for data pipelines.
Note
null/undefined values are treated the same
Type parameters
Name | Type |
---|---|
T | extends Record <string , unknown > = Record <string , any > |
Table of contents
Constructors
Properties
Accessors
Methods
- [iterator]
- aggregate
- appendAll
- appendOne
- assign
- compact
- concat
- countEmptyRows
- createTupleFrom
- deepSelect
- distinct
- entries
- filterBy
- filterDataFrameRows
- fork
- forkWithBuilders
- getColumn
- getColumnAt
- getColumnIndex
- getColumnOrThrow
- getRow
- hasEmptyRows
- hasNilValues
- limit
- orderBy
- removeEmptyRows
- rename
- renameDataFrame
- require
- rows
- rowsWithoutDuplicates
- search
- select
- selectAt
- serialize
- serializeIterator
- slice
- sort
- toArray
- toJSON
- unique
- deserialize
- deserializeIterator
- empty
- fromJSON
Constructors
constructor
• new DataFrame<T
>(columns
, options?
): DataFrame
<T
>
Type parameters
Name | Type |
---|---|
T | extends Record <string , unknown > = Record <string , any > |
Parameters
Name | Type | |
---|---|---|
columns | readonly Column <any , keyof T >[] \ | Column <any , keyof T >[] |
options? | DataFrameOptions |
Returns
DataFrame
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:161
Properties
_fieldToColumnIndexCache
• Protected
Optional
_fieldToColumnIndexCache: Map
<keyof T
, number
>
Use this to cache the the column index needed, this should speed things up
Defined in
data-mate/src/data-frame/DataFrame.ts:159
columns
• Readonly
columns: readonly Column
<any
, keyof T
>[]
The list of columns
Defined in
data-mate/src/data-frame/DataFrame.ts:135
fields
• Readonly
fields: readonly keyof T
[]
An array of the column names
Defined in
data-mate/src/data-frame/DataFrame.ts:140
metadata
• Readonly
metadata: Record
<string
, any
>
Metadata about the Frame
Defined in
data-mate/src/data-frame/DataFrame.ts:145
name
• Optional
name: string
The name of the Frame
Defined in
data-mate/src/data-frame/DataFrame.ts:130
size
• Readonly
size: number
Size of the DataFrame
Defined in
data-mate/src/data-frame/DataFrame.ts:150
Accessors
config
• get
config(): DataTypeConfig
Generate the DataType config from the columns.
Returns
DataTypeConfig
Defined in
data-mate/src/data-frame/DataFrame.ts:301
id
• get
id(): string
A Unique ID for the DataFrame The ID will only change if the columns or data change
Returns
string
Defined in
data-mate/src/data-frame/DataFrame.ts:273
Methods
[iterator]
▸ [iterator](): IterableIterator
<DataEntity
<T
, _DataEntityMetadata
<Record
<string
, any
>>>>
Iterate over each row, this returns the JSON compatible values.
Returns
IterableIterator
<DataEntity
<T
, _DataEntityMetadata
<Record
<string
, any
>>>>
Defined in
data-mate/src/data-frame/DataFrame.ts:194
aggregate
▸ aggregate(): AggregationFrame
<T
>
Create a AggregationFrame instance which can be used to run aggregations
Returns
Defined in
data-mate/src/data-frame/DataFrame.ts:380
appendAll
▸ appendAll(frames
, limit?
): DataFrame
<T
>
Append one or more data frames to the end of this DataFrame. Useful for incremental building an DataFrame since the cost of this is relatively low.
This is more efficient than using DataFrame.concat but comes with less data type checking and may less safe so use with caution
Parameters
Name | Type | |
---|---|---|
frames | DataFrame <T >[] \ | readonly DataFrame <T >[] |
limit? | number |
Returns
DataFrame
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:824
appendOne
▸ appendOne(frame
): DataFrame
<T
>
Append one to the end of this DataFrame. This is does less than appendAll so it is faster.
Useful for incremental building an DataFrame since the cost of this is relatively low.
This is more efficient than using DataFrame.concat but comes with less data type checking and may less safe so use with caution
Parameters
Name | Type |
---|---|
frame | DataFrame <T > |
Returns
DataFrame
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:804
assign
▸ assign<R
>(columns
): DataFrame
<T
& R
>
Assign new columns to a new DataFrame. If given a column already exists, the column will replace the existing one.
Type parameters
Name | Type |
---|---|
R | extends Record <string , unknown > = Record <string , any > |
Parameters
Name | Type |
---|---|
columns | readonly Column <any , string >[] |
Returns
DataFrame
<T
& R
>
Defined in
data-mate/src/data-frame/DataFrame.ts:733
compact
▸ compact(): DataFrame
<T
>
Reduce amount of noise in a DataFrame by removing the amount of duplicates, including duplicate objects in array values
Returns
DataFrame
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:718
concat
▸ concat(arg
): DataFrame
<T
>
Concat rows, or columns, to the end of the existing Columns
Parameters
Name | Type | |||
---|---|---|---|---|
arg | readonly Column <any , keyof T >[] \ | Column <any , keyof T >[] \ | Partial <T >[] \ | readonly Partial <T >[] |
Returns
DataFrame
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:753
countEmptyRows
▸ countEmptyRows(): number
Count the number of empty rows
Returns
number
Defined in
data-mate/src/data-frame/DataFrame.ts:598
createTupleFrom
▸ createTupleFrom<R
, V
>(fields
, as
): DataFrame
<T
& Record
<R
, V
>>
Merge two or more columns into a Tuple
Type parameters
Name | Type |
---|---|
R | extends string |
V | unknown [] |
Parameters
Name | Type |
---|---|
fields | readonly keyof T [] |
as | R |
Returns
DataFrame
<T
& Record
<R
, V
>>
Defined in
data-mate/src/data-frame/DataFrame.ts:911
deepSelect
▸ deepSelect<R
>(fieldSelectors
): DataFrame
<R
>
Select fields in a data frame, this will work with nested object fields.
Fields that don't exist in the data frame are safely ignored to make this function handle more suitable for production environments
Type parameters
Name | Type |
---|---|
R | extends Record <string , unknown > |
Parameters
Name | Type | |
---|---|---|
fieldSelectors | string [] \ | readonly string [] |
Returns
DataFrame
<R
>
Defined in
data-mate/src/data-frame/DataFrame.ts:322
distinct
▸ distinct(...fieldArg
): DataFrame
<T
>
Alias for unique
Parameters
Name | Type |
---|---|
...fieldArg | FieldArg <keyof T >[] |
Returns
DataFrame
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:652
entries
▸ entries(json
, options?
): IterableIterator
<[index: number, row: DataEntity<T, _DataEntityMetadata<Record<string, any>>>]>
Iterate over each index and row, this returns the internal stored values.
Parameters
Name | Type |
---|---|
json | true |
options? | SerializeOptions |
Returns
IterableIterator
<[index: number, row: DataEntity<T, _DataEntityMetadata<Record<string, any>>>]>
Defined in
data-mate/src/data-frame/DataFrame.ts:201
▸ entries(json?
, options?
): IterableIterator
<[index: number, row: T]>
Parameters
Name | Type |
---|---|
json? | false |
options? | SerializeOptions |
Returns
IterableIterator
<[index: number, row: T]>
Defined in
data-mate/src/data-frame/DataFrame.ts:204
filterBy
▸ filterBy(filters
, json?
): DataFrame
<T
>
Filter the DataFrame by fields, all fields must return true for a given row to returned in the filtered DataType
Parameters
Name | Type | |||
---|---|---|---|---|
filters | FilterByFn <T > \ | Partial <{ [P in string \ | number \ | symbol]: Function }> |
json? | boolean |
Returns
DataFrame
<T
>
Example
dataFrame.filter({
name(val) {
return val != null;
},
age(val) {
return val != null && val >= 20;
}
});
Defined in
data-mate/src/data-frame/DataFrame.ts:485
filterDataFrameRows
▸ filterDataFrameRows(fn
, stopAtMatch?
): DataFrame
<T
>
This allows you to filter each row more efficiently by since the rows aren't pulled from the data frame unless they match.
This was designed to be used in
Parameters
Name | Type |
---|---|
fn | FilterByRowsFn |
stopAtMatch? | number |
Returns
DataFrame
<T
>
See
DataFrame.search
Defined in
data-mate/src/data-frame/DataFrame.ts:533
fork
▸ fork<R
>(columns
): DataFrame
<R
>
Create a new DataFrame with the same metadata but with different data
Type parameters
Name | Type |
---|---|
R | extends Record <string , unknown > = T |
Parameters
Name | Type | |
---|---|---|
columns | Column <any , keyof R >[] \ | readonly Column <any , keyof R >[] |
Returns
DataFrame
<R
>
Defined in
data-mate/src/data-frame/DataFrame.ts:289
forkWithBuilders
▸ forkWithBuilders(builders
, limit?
): DataFrame
<T
>
Create a new data frame from the builders
Parameters
Name | Type |
---|---|
builders | Iterable <[name: keyof T, builder: Builder<any>]> |
limit? | number |
Returns
DataFrame
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:703
getColumn
▸ getColumn<P
>(field
): undefined
| Column
<T
[P
], P
>
Get a column by name
Type parameters
Name | Type | ||
---|---|---|---|
P | extends string \ | number \ | symbol |
Parameters
Name | Type |
---|---|
field | P |
Returns
undefined
| Column
<T
[P
], P
>
Defined in
data-mate/src/data-frame/DataFrame.ts:948
getColumnAt
▸ getColumnAt<P
>(index
): undefined
| Column
<T
[P
], P
>
Get a column by index
Type parameters
Name | Type | ||
---|---|---|---|
P | extends string \ | number \ | symbol |
Parameters
Name | Type |
---|---|
index | number |
Returns
undefined
| Column
<T
[P
], P
>
Defined in
data-mate/src/data-frame/DataFrame.ts:988
getColumnIndex
▸ getColumnIndex(field
): number
This returns -1 if not found. The column index will be cached. In the case with duplicate named columned, the first one found wins
Parameters
Name | Type |
---|---|
field | keyof T |
Returns
number
Defined in
data-mate/src/data-frame/DataFrame.ts:958
getColumnOrThrow
▸ getColumnOrThrow<P
>(field
): Column
<T
[P
], P
>
Get a column by name or throw if not found
Type parameters
Name | Type | ||
---|---|---|---|
P | extends string \ | number \ | symbol |
Parameters
Name | Type |
---|---|
field | P |
Returns
Column
<T
[P
], P
>
Defined in
data-mate/src/data-frame/DataFrame.ts:975
getRow
▸ getRow(index
, json?
, options?
): undefined
| DataEntity
<T
, _DataEntityMetadata
<Record
<string
, any
>>>
Get a row by index, if the row has only null values, returns undefined
Parameters
Name | Type |
---|---|
index | number |
json? | true |
options? | SerializeOptions |
Returns
undefined
| DataEntity
<T
, _DataEntityMetadata
<Record
<string
, any
>>>
Defined in
data-mate/src/data-frame/DataFrame.ts:995
▸ getRow(index
, json?
, options?
): undefined
| T
Parameters
Name | Type |
---|---|
index | number |
json? | false |
options? | SerializeOptions |
Returns
undefined
| T
Defined in
data-mate/src/data-frame/DataFrame.ts:1000
hasEmptyRows
▸ hasEmptyRows(): boolean
Check if there are any empty rows at all
Returns
boolean
Defined in
data-mate/src/data-frame/DataFrame.ts:617
hasNilValues
▸ hasNilValues(): boolean
Check if there are any columns with nil values
Returns
boolean
Defined in
data-mate/src/data-frame/DataFrame.ts:633
limit
▸ limit(num
): DataFrame
<T
>
Returns a DataFrame with a limited number of rows
A negative value will select from the ending indices
Parameters
Name | Type |
---|---|
num | number |
Returns
DataFrame
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:1083
orderBy
▸ orderBy(...fieldArgs
): DataFrame
<T
>
Order the rows by fields, format of is field:asc
or field:desc
.
Defaults to asc
if none specified
Parameters
Name | Type |
---|---|
...fieldArgs | FieldArg <string >[] |
Returns
DataFrame
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:391
▸ orderBy(...fieldArgs
): DataFrame
<T
>
Parameters
Name | Type |
---|---|
...fieldArgs | FieldArg <keyof T >[] |
Returns
DataFrame
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:392
removeEmptyRows
▸ removeEmptyRows(): DataFrame
<T
>
Remove the empty rows from the data frame, this is optimization that won't require moving around as much memory
Returns
DataFrame
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:566
rename
▸ rename<K
, R
>(name
, renameTo
): DataFrame
<Omit
<T
, K
> & Record
<R
, T
[K
]>>
Rename an existing column
Type parameters
Name | Type | ||
---|---|---|---|
K | extends string \ | number \ | symbol |
R | extends string |
Parameters
Name | Type |
---|---|
name | K |
renameTo | R |
Returns
DataFrame
<Omit
<T
, K
> & Record
<R
, T
[K
]>>
Defined in
data-mate/src/data-frame/DataFrame.ts:887
renameDataFrame
▸ renameDataFrame(renameTo
): DataFrame
<T
>
Rename the data frame
Parameters
Name | Type |
---|---|
renameTo | string |
Returns
DataFrame
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:900
require
▸ require(...fieldArg
): DataFrame
<T
>
Require specific columns to exist on every row
Parameters
Name | Type |
---|---|
...fieldArg | FieldArg <keyof T >[] |
Returns
DataFrame
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:455
rows
▸ rows(json
, options?
): IterableIterator
<DataEntity
<T
, _DataEntityMetadata
<Record
<string
, any
>>>>
Iterate each row
Parameters
Name | Type |
---|---|
json | true |
options? | SerializeOptions |
Returns
IterableIterator
<DataEntity
<T
, _DataEntityMetadata
<Record
<string
, any
>>>>
Defined in
data-mate/src/data-frame/DataFrame.ts:221
▸ rows(json?
, options?
): IterableIterator
<T
>
Parameters
Name | Type |
---|---|
json? | false |
options? | SerializeOptions |
Returns
IterableIterator
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:222
rowsWithoutDuplicates
▸ rowsWithoutDuplicates(json
, options?
): IterableIterator
<T
| DataEntity
<T
, _DataEntityMetadata
<Record
<string
, any
>>>>
This is more expensive and little more complicated. In the future we pass in json=false to each column and the call toJSONCompatibleValue after each generating the hash to be consistent with hash
Parameters
Name | Type |
---|---|
json | true |
options? | SerializeOptions |
Returns
IterableIterator
<T
| DataEntity
<T
, _DataEntityMetadata
<Record
<string
, any
>>>>
Defined in
data-mate/src/data-frame/DataFrame.ts:245
▸ rowsWithoutDuplicates(json?
, options?
): IterableIterator
<T
>
Parameters
Name | Type |
---|---|
json? | false |
options? | SerializeOptions |
Returns
IterableIterator
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:248
search
▸ search(query
, variables?
, overrideParsedQuery?
, stopAtMatch?
): DataFrame
<T
>
Search the DataFrame using an xLucene query
Parameters
Name | Type |
---|---|
query | string |
variables? | xLuceneVariables |
overrideParsedQuery? | Node |
stopAtMatch? | number |
Returns
DataFrame
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:442
select
▸ select<K
>(...fieldArg
): DataFrame
<Pick
<T
, K
>>
Get a column, or columns by name, returns a new DataFrame
Type parameters
Name | Type | ||
---|---|---|---|
K | extends string \ | number \ | symbol |
Parameters
Name | Type |
---|---|
...fieldArg | FieldArg <K >[] |
Returns
DataFrame
<Pick
<T
, K
>>
Defined in
data-mate/src/data-frame/DataFrame.ts:308
selectAt
▸ selectAt<R
>(...indices
): DataFrame
<R
>
Get a column, or columns by index, returns a new DataFrame
Type parameters
Name | Type |
---|---|
R | extends Record <string , unknown > = T |
Parameters
Name | Type |
---|---|
...indices | number [] |
Returns
DataFrame
<R
>
Defined in
data-mate/src/data-frame/DataFrame.ts:371
serialize
▸ serialize(): string
Converts the DataFrame into an optimized serialized format, including the metadata. This returns a string that includes the data frame header and all of columns joined with a new line.
There is 1GB limit for the whole data frame using this method, to achieve a 1GB limit per column, use {@see serializeIterator}
Returns
string
Defined in
data-mate/src/data-frame/DataFrame.ts:1153
serializeIterator
▸ serializeIterator(): Iterable
<string
>
Converts the DataFrame into an optimized serialized format, including the metadata. This returns an iterator and requires external code to join yield chunks with a new line.
There is 1GB limit per column using this method
Returns
Iterable
<string
>
Defined in
data-mate/src/data-frame/DataFrame.ts:1130
slice
▸ slice(start?
, end?
): DataFrame
<T
>
Returns a DataFrame with a a specific set of rows
Parameters
Name | Type |
---|---|
start? | number |
end? | number |
Returns
DataFrame
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:1100
sort
▸ sort(...fieldArgs
): DataFrame
<T
>
Sort the records by a field, an alias of orderBy.
Parameters
Name | Type |
---|---|
...fieldArgs | FieldArg <string >[] |
Returns
DataFrame
<T
>
See
orderBy
Defined in
data-mate/src/data-frame/DataFrame.ts:433
▸ sort(...fieldArgs
): DataFrame
<T
>
Parameters
Name | Type |
---|---|
...fieldArgs | FieldArg <keyof T >[] |
Returns
DataFrame
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:434
toArray
▸ toArray(): T
[]
Convert the DataFrame an array of objects (the output may not be JSON compatible)
Returns
T
[]
Defined in
data-mate/src/data-frame/DataFrame.ts:1119
toJSON
▸ toJSON(options?
): DataEntity
<T
, _DataEntityMetadata
<Record
<string
, any
>>>[]
Convert the DataFrame an array of objects (the output is JSON compatible)
Parameters
Name | Type |
---|---|
options? | SerializeOptions |
Returns
DataEntity
<T
, _DataEntityMetadata
<Record
<string
, any
>>>[]
Defined in
data-mate/src/data-frame/DataFrame.ts:1112
unique
▸ unique(...fieldArg
): DataFrame
<T
>
Remove duplicate rows with the same value for select fields
Parameters
Name | Type |
---|---|
...fieldArg | FieldArg <keyof T >[] |
Returns
DataFrame
<T
>
Defined in
data-mate/src/data-frame/DataFrame.ts:643
deserialize
▸ deserialize<R
>(data
): Promise
<DataFrame
<R
>>
Create a DataFrame from a serialized format, the first row is data frame metadata, all of the subsequent rows are serialized columns. The rows should be joined with a newline.
When using this method, the whole serialized file should be passed in.
For a more advanced steam like processing, see {@see DataFrame.deserializeIterator} Using that method may be required for deserializing a buffer or string greater than 1GB.
Type parameters
Name | Type |
---|---|
R | extends Record <string , unknown > = Record <string , any > |
Parameters
Name | Type | |
---|---|---|
data | string \ | Buffer |
Returns
Promise
<DataFrame
<R
>>
Defined in
data-mate/src/data-frame/DataFrame.ts:119
deserializeIterator
▸ deserializeIterator<R
>(data
): Promise
<DataFrame
<R
>>
Create a DataFrame from a serialized format, the first row is data frame metadata, all of the subsequent rows are serialized columns.
When using this method, the input should be split by a new line.
Type parameters
Name | Type |
---|---|
R | extends Record <string , unknown > = Record <string , any > |
Parameters
Name | Type | |||
---|---|---|---|---|
data | Iterable <string \ | Buffer > \ | AsyncIterable <string \ | Buffer > |
Returns
Promise
<DataFrame
<R
>>
Defined in
data-mate/src/data-frame/DataFrame.ts:78
empty
▸ empty<R
>(config
, options?
): DataFrame
<R
>
Create an empty DataFrame
Type parameters
Name | Type |
---|---|
R | extends Record <string , unknown > = Record <string , any > |
Parameters
Name | Type | |
---|---|---|
config | DataTypeConfig \ | Readonly <Overwrite <DataTypeConfig , { fields : ReadonlyDataTypeFields }>> |
options? | DataFrameOptions |
Returns
DataFrame
<R
>
Defined in
data-mate/src/data-frame/DataFrame.ts:61
fromJSON
▸ fromJSON<R
>(config
, records?
, options?
): DataFrame
<R
>
Create a DataFrame from an array of JSON objects
Type parameters
Name | Type |
---|---|
R | extends Record <string , unknown > = Record <string , any > |
Parameters
Name | Type | |
---|---|---|
config | DataTypeConfig \ | Readonly <Overwrite <DataTypeConfig , { fields : ReadonlyDataTypeFields }>> |
records? | R [] | |
options? | DataFrameOptions |
Returns
DataFrame
<R
>