Tables and DataFrames
Tables.jl provides an ecosystem-wide interface to tabular data in Julia, ensuring interoperability with DataFrames.jl, CSV.jl, and hundreds of other packages that implement the standard.
Dimensional data are tables
DimensionalData.jl implements the Tables.jl interface for AbstractDimArray and AbstractDimStack. DimStack layers are unrolled so they are all the same size, and dimensions loop to match the length of the largest layer.
Columns are given the name of the array or stack layer, and the result of DD.name(dimension) for Dimension columns.
Looping of dimensions and stack layers is done lazily, and does not allocate unless collected.
Materializing tables to DimArray or DimStack
DimArray and DimStack have fallback methods to materialize any Tables.jl-compatible table.
By default, it will treat columns such as X, Y, Z, and Band as dimensions, and other columns as data. Pass a name keyword argument to determine which column(s) are used.
You have full control over which columns are dimensions - and what those dimensions look like exactly. If you pass a Tuple of Symbol or dimension types (e.g. X) as the second argument, those columns are treated as dimensions. Passing a Tuple of dimensions preserves these dimensions - with values matched to the corresponding columns.
Materializing tables will worked even if the table is not ordered, and can handle missing values.
Example
using DimensionalData
using Dates
using DataFramesDefine some dimensions:
julia> x, y, c = X(1:10), Y(1:10), Dim{:category}('a':'z')(↓ X 1:10,
→ Y 1:10,
↗ category 'a':1:'z')julia> A = rand(x, y, c; name=:data)┌ 10×10×26 DimArray{Float64, 3} data ┐
├────────────────────────────────────┴────────────── dims ┐
↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
→ Y Sampled{Int64} 1:10 ForwardOrdered Regular Points,
↗ category Categorical{Char} 'a':1:'z' ForwardOrdered
└─────────────────────────────────────────────────────────┘
[:, :, 1]
↓ → 1 2 3 … 8 9 10
1 0.599241 0.518938 0.341133 0.486559 0.162516 0.934189
2 0.192192 0.336376 0.636476 0.840593 0.687569 0.294486
3 0.607291 0.963657 0.353968 0.120955 0.434286 0.887294
⋮ ⋱ ⋮
7 0.194849 0.975511 0.612828 0.888721 0.890574 0.436622
8 0.364097 0.163103 0.142055 0.049689 0.259847 0.570725
9 0.394448 0.755939 0.54624 0.156388 0.210664 0.966517
10 0.828604 0.359421 0.51621 … 0.828161 0.107233 0.74172Converting to DataFrame
Arrays will have columns for each dimension, and only one data column
julia> DataFrame(A)2600×4 DataFrame
Row │ X Y category data
│ Int64 Int64 Char Float64
──────┼───────────────────────────────────
1 │ 1 1 a 0.599241
2 │ 2 1 a 0.192192
3 │ 3 1 a 0.607291
4 │ 4 1 a 0.921958
5 │ 5 1 a 0.449491
6 │ 6 1 a 0.581131
7 │ 7 1 a 0.194849
8 │ 8 1 a 0.364097
⋮ │ ⋮ ⋮ ⋮ ⋮
2594 │ 4 10 z 0.852872
2595 │ 5 10 z 0.0958843
2596 │ 6 10 z 0.315302
2597 │ 7 10 z 0.236866
2598 │ 8 10 z 0.894053
2599 │ 9 10 z 0.350024
2600 │ 10 10 z 0.417756
2585 rows omittedConverting to CSV
We can also write arrays and stacks directly to CSV.jl, or any other data type supporting the Tables.jl interface.
using CSV
CSV.write("dimstack.csv", st)
readlines("dimstack.csv")2601-element Vector{String}:
"X,Y,category,data1,data2"
"1,1,a,0.55560637324799,0.845200516911609"
"2,1,a,0.10276733254788795,0.9104238640380062"
"3,1,a,0.22237128922242078,0.8268020755919178"
"4,1,a,0.5501481631111826,0.9447511416331498"
"5,1,a,0.09300753748828394,0.15945803739833375"
"6,1,a,0.48952511607945026,0.6146564273146751"
"7,1,a,0.7938317326707394,0.9770663775826343"
"8,1,a,0.0019198597596568057,0.798655984630017"
"9,1,a,0.44833963865079907,0.40268027828179853"
⋮
"2,10,z,0.9675326879984427,0.41940525122635797"
"3,10,z,0.5099922507050859,0.07986058669268159"
"4,10,z,0.3053673139967894,0.4496996354823414"
"5,10,z,0.8146121812750928,0.9452913850518949"
"6,10,z,0.38167574879167476,0.24524306337289326"
"7,10,z,0.17977958441149666,0.1985699519321249"
"8,10,z,0.7044663405368152,0.694278906020718"
"9,10,z,0.5697400488168892,0.20636222545147498"
"10,10,z,0.8560905731682101,0.8428656510212863"Converting a DataFrame to a DimArray or DimStack
The Dataframe we use will have 5 columns: X, Y, category, data1, and data2
julia> df = DataFrame(st)2600×5 DataFrame
Row │ X Y category data1 data2
│ Int64 Int64 Char Float64 Float64
──────┼──────────────────────────────────────────────
1 │ 1 1 a 0.555606 0.845201
2 │ 2 1 a 0.102767 0.910424
3 │ 3 1 a 0.222371 0.826802
4 │ 4 1 a 0.550148 0.944751
5 │ 5 1 a 0.0930075 0.159458
6 │ 6 1 a 0.489525 0.614656
7 │ 7 1 a 0.793832 0.977066
8 │ 8 1 a 0.00191986 0.798656
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
2594 │ 4 10 z 0.305367 0.4497
2595 │ 5 10 z 0.814612 0.945291
2596 │ 6 10 z 0.381676 0.245243
2597 │ 7 10 z 0.17978 0.19857
2598 │ 8 10 z 0.704466 0.694279
2599 │ 9 10 z 0.56974 0.206362
2600 │ 10 10 z 0.856091 0.842866
2585 rows omittedConverting this DataFrame to a DimArray without other arguments will read the category columns as data and ignore data1 and data2:
julia> DimArray(df)┌ 10×10 DimArray{Char, 2} category ┐
├──────────────────────────────────┴────────────────── dims ┐
↓ X Sampled{Int64} 1:1:10 ForwardOrdered Regular Points,
→ Y Sampled{Int64} 1:1:10 ForwardOrdered Regular Points
└───────────────────────────────────────────────────────────┘
↓ → 1 2 3 4 5 6 7 8 9 10
1 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z'
2 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z'
3 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z'
4 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z'
⋮ ⋮ ⋮
7 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z'
8 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z'
9 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z'
10 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z'Specify dimenion names to ensure these get treated as dimensions. Now data1 is read in instead.
julia> DimArray(df, (X,Y,:category))┌ 10×10×26 DimArray{Float64, 3} data1 ┐
├─────────────────────────────────────┴──────────────── dims ┐
↓ X Sampled{Int64} 1:1:10 ForwardOrdered Regular Points,
→ Y Sampled{Int64} 1:1:10 ForwardOrdered Regular Points,
↗ category Categorical{Char} ['a', …, 'z'] ForwardOrdered
└────────────────────────────────────────────────────────────┘
[:, :, 1]
↓ → 1 2 3 … 8 9 10
1 0.555606 0.140713 0.44195 0.8843 0.224013 0.182997
2 0.102767 0.119252 0.561738 0.654229 0.339039 0.967533
3 0.222371 0.645555 0.647965 0.719804 0.333765 0.509992
⋮ ⋱ ⋮
7 0.793832 0.840653 0.267666 0.515599 0.662927 0.17978
8 0.00191986 0.438789 0.349911 0.0595599 0.594161 0.704466
9 0.44834 0.796331 0.566217 0.691596 0.639773 0.56974
10 0.42684 0.0427491 0.752123 … 0.698195 0.757712 0.856091You can also pass in the actual dimensions.
julia> DimArray(df, dims(st))┌ 10×10×26 DimArray{Float64, 3} data1 ┐
├─────────────────────────────────────┴───────────── dims ┐
↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
→ Y Sampled{Int64} 1:10 ForwardOrdered Regular Points,
↗ category Categorical{Char} 'a':1:'z' ForwardOrdered
└─────────────────────────────────────────────────────────┘
[:, :, 1]
↓ → 1 2 3 … 8 9 10
1 0.555606 0.140713 0.44195 0.8843 0.224013 0.182997
2 0.102767 0.119252 0.561738 0.654229 0.339039 0.967533
3 0.222371 0.645555 0.647965 0.719804 0.333765 0.509992
⋮ ⋱ ⋮
7 0.793832 0.840653 0.267666 0.515599 0.662927 0.17978
8 0.00191986 0.438789 0.349911 0.0595599 0.594161 0.704466
9 0.44834 0.796331 0.566217 0.691596 0.639773 0.56974
10 0.42684 0.0427491 0.752123 … 0.698195 0.757712 0.856091Pass in a name argument to read in data2 instead.
julia> DimArray(df, dims(st); name = :data2)┌ 10×10×26 DimArray{Float64, 3} data2 ┐
├─────────────────────────────────────┴───────────── dims ┐
↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
→ Y Sampled{Int64} 1:10 ForwardOrdered Regular Points,
↗ category Categorical{Char} 'a':1:'z' ForwardOrdered
└─────────────────────────────────────────────────────────┘
[:, :, 1]
↓ → 1 2 3 … 8 9 10
1 0.845201 0.94177 0.490656 0.482435 0.386324 0.859582
2 0.910424 0.427826 0.862728 0.248024 0.293925 0.0595812
3 0.826802 0.512368 0.938823 0.655281 0.33373 0.675824
⋮ ⋱ ⋮
7 0.977066 0.628265 0.887326 0.502325 0.9802 0.308634
8 0.798656 0.471819 0.88707 0.944636 0.751848 0.441872
9 0.40268 0.021849 0.121545 0.249602 0.793648 0.779541
10 0.0522006 0.933603 0.228074 … 0.289268 0.788237 0.548612