Tables and DataFrames
Tables.jl provides an ecosystem-wide interface to tabular data in Julia, ensuring interoperability with DataFrames.jl, CSV.jl, and hundreds of other packages that implement the standard.
Dimensional data are tables
DimensionalData.jl implements the Tables.jl interface for AbstractDimArray and AbstractDimStack. DimStack layers are unrolled so they are all the same size, and dimensions loop to match the length of the largest layer.
Columns are given the name of the array or stack layer, and the result of DD.name(dimension) for Dimension columns.
Looping of dimensions and stack layers is done lazily, and does not allocate unless collected.
Materializing tables to DimArray or DimStack
DimArray and DimStack have fallback methods to materialize any Tables.jl-compatible table.
By default, it will treat columns such as X, Y, Z, and Band as dimensions, and other columns as data. Pass a name keyword argument to determine which column(s) are used.
You have full control over which columns are dimensions - and what those dimensions look like exactly. If you pass a Tuple of Symbol or dimension types (e.g. X) as the second argument, those columns are treated as dimensions. Passing a Tuple of dimensions preserves these dimensions - with values matched to the corresponding columns.
Materializing tables will worked even if the table is not ordered, and can handle missing values.
Example
using DimensionalData
using Dates
using DataFramesDefine some dimensions:
julia> x, y, c = X(1:10), Y(1:10), Dim{:category}('a':'z')(↓ X 1:10,
→ Y 1:10,
↗ category 'a':1:'z')julia> A = rand(x, y, c; name=:data)┌ 10×10×26 DimArray{Float64, 3} data ┐
├────────────────────────────────────┴────────────── dims ┐
↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
→ Y Sampled{Int64} 1:10 ForwardOrdered Regular Points,
↗ category Categorical{Char} 'a':1:'z' ForwardOrdered
└─────────────────────────────────────────────────────────┘
[:, :, 1]
↓ → 1 2 3 … 8 9 10
1 0.960754 0.73427 0.71403 0.0450694 0.685225 0.66882
2 0.0965086 0.122976 0.731753 0.474659 0.391502 0.0648408
3 0.889194 0.356028 0.550553 0.348197 0.495366 0.433724
⋮ ⋱ ⋮
7 0.122571 0.245564 0.431383 0.258165 0.351907 0.99726
8 0.418412 0.939201 0.666574 0.0908083 0.802274 0.747231
9 0.224351 0.240351 0.0933704 0.773992 0.99531 0.365215
10 0.767136 0.390515 0.782823 … 0.91991 0.605097 0.113556Converting to DataFrame
Arrays will have columns for each dimension, and only one data column
julia> DataFrame(A)2600×4 DataFrame
Row │ X Y category data
│ Int64 Int64 Char Float64
──────┼───────────────────────────────────
1 │ 1 1 a 0.960754
2 │ 2 1 a 0.0965086
3 │ 3 1 a 0.889194
4 │ 4 1 a 0.685603
5 │ 5 1 a 0.0987646
6 │ 6 1 a 0.191188
7 │ 7 1 a 0.122571
8 │ 8 1 a 0.418412
⋮ │ ⋮ ⋮ ⋮ ⋮
2594 │ 4 10 z 0.227142
2595 │ 5 10 z 0.635786
2596 │ 6 10 z 0.210417
2597 │ 7 10 z 0.849817
2598 │ 8 10 z 0.261216
2599 │ 9 10 z 0.0459272
2600 │ 10 10 z 0.434794
2585 rows omittedConverting to CSV
We can also write arrays and stacks directly to CSV.jl, or any other data type supporting the Tables.jl interface.
using CSV
CSV.write("dimstack.csv", st)
readlines("dimstack.csv")2601-element Vector{String}:
"X,Y,category,data1,data2"
"1,1,a,0.2674330482715843,0.5501481631111826"
"2,1,a,0.5992407552660244,0.09300753748828394"
"3,1,a,0.19219227965820063,0.48952511607945026"
"4,1,a,0.6072910004472037,0.7938317326707394"
"5,1,a,0.9219584479428687,0.0019198597596568057"
"6,1,a,0.449490631413745,0.8612776980335002"
"7,1,a,0.5811306546643178,0.20758428874582302"
"8,1,a,0.1948490023468078,0.023646798570656102"
"9,1,a,0.20144095329862288,0.11925244363082943"
⋮
"2,10,z,0.9341886269251364,0.6005065544080029"
"3,10,z,0.29448593792551514,0.36851882799081104"
"4,10,z,0.8872944242976297,0.23350386812772128"
"5,10,z,0.012096736709184541,0.7959265671836858"
"6,10,z,0.26634216134156385,0.3777991041100621"
"7,10,z,0.4858762080349691,0.2276004407628871"
"8,10,z,0.27135422404853515,0.1132529224292641"
"9,10,z,0.25236585444042137,0.25073570045665916"
"10,10,z,0.9656269833042522,0.40747087988600206"Converting a DataFrame to a DimArray or DimStack
The Dataframe we use will have 5 columns: X, Y, category, data1, and data2
julia> df = DataFrame(st)2600×5 DataFrame
Row │ X Y category data1 data2
│ Int64 Int64 Char Float64 Float64
──────┼───────────────────────────────────────────────
1 │ 1 1 a 0.267433 0.550148
2 │ 2 1 a 0.599241 0.0930075
3 │ 3 1 a 0.192192 0.489525
4 │ 4 1 a 0.607291 0.793832
5 │ 5 1 a 0.921958 0.00191986
6 │ 6 1 a 0.449491 0.861278
7 │ 7 1 a 0.581131 0.207584
8 │ 8 1 a 0.194849 0.0236468
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
2594 │ 4 10 z 0.887294 0.233504
2595 │ 5 10 z 0.0120967 0.795927
2596 │ 6 10 z 0.266342 0.377799
2597 │ 7 10 z 0.485876 0.2276
2598 │ 8 10 z 0.271354 0.113253
2599 │ 9 10 z 0.252366 0.250736
2600 │ 10 10 z 0.965627 0.407471
2585 rows omittedConverting this DataFrame to a DimArray without other arguments will read the category columns as data and ignore data1 and data2:
julia> DimArray(df)┌ 10×10 DimArray{Char, 2} category ┐
├──────────────────────────────────┴────────────────── dims ┐
↓ X Sampled{Int64} 1:1:10 ForwardOrdered Regular Points,
→ Y Sampled{Int64} 1:1:10 ForwardOrdered Regular Points
└───────────────────────────────────────────────────────────┘
↓ → 1 2 3 4 5 6 7 8 9 10
1 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z'
2 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z'
3 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z'
4 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z'
⋮ ⋮ ⋮
7 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z'
8 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z'
9 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z'
10 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z' 'z'Specify dimenion names to ensure these get treated as dimensions. Now data1 is read in instead.
julia> DimArray(df, (X,Y,:category))┌ 10×10×26 DimArray{Float64, 3} data1 ┐
├─────────────────────────────────────┴──────────────── dims ┐
↓ X Sampled{Int64} 1:1:10 ForwardOrdered Regular Points,
→ Y Sampled{Int64} 1:1:10 ForwardOrdered Regular Points,
↗ category Categorical{Char} ['a', …, 'z'] ForwardOrdered
└────────────────────────────────────────────────────────────┘
[:, :, 1]
↓ → 1 2 3 … 8 9 10
1 0.267433 0.828604 0.359421 0.285943 0.967824 0.107233
2 0.599241 0.518938 0.341133 0.486559 0.162516 0.934189
3 0.192192 0.336376 0.636476 0.609166 0.687569 0.294486
⋮ ⋱ ⋮
7 0.581131 0.364945 0.450703 0.0325131 0.645678 0.485876
8 0.194849 0.975511 0.612828 0.888721 0.890574 0.271354
9 0.201441 0.163103 0.142055 0.049689 0.391894 0.252366
10 0.394448 0.755939 0.54624 … 0.156388 0.210664 0.965627You can also pass in the actual dimensions.
julia> DimArray(df, dims(st))┌ 10×10×26 DimArray{Float64, 3} data1 ┐
├─────────────────────────────────────┴───────────── dims ┐
↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
→ Y Sampled{Int64} 1:10 ForwardOrdered Regular Points,
↗ category Categorical{Char} 'a':1:'z' ForwardOrdered
└─────────────────────────────────────────────────────────┘
[:, :, 1]
↓ → 1 2 3 … 8 9 10
1 0.267433 0.828604 0.359421 0.285943 0.967824 0.107233
2 0.599241 0.518938 0.341133 0.486559 0.162516 0.934189
3 0.192192 0.336376 0.636476 0.609166 0.687569 0.294486
⋮ ⋱ ⋮
7 0.581131 0.364945 0.450703 0.0325131 0.645678 0.485876
8 0.194849 0.975511 0.612828 0.888721 0.890574 0.271354
9 0.201441 0.163103 0.142055 0.049689 0.391894 0.252366
10 0.394448 0.755939 0.54624 … 0.156388 0.210664 0.965627Pass in a name argument to read in data2 instead.
julia> DimArray(df, dims(st); name = :data2)┌ 10×10×26 DimArray{Float64, 3} data2 ┐
├─────────────────────────────────────┴───────────── dims ┐
↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
→ Y Sampled{Int64} 1:10 ForwardOrdered Regular Points,
↗ category Categorical{Char} 'a':1:'z' ForwardOrdered
└─────────────────────────────────────────────────────────┘
[:, :, 1]
↓ → 1 2 3 … 8 9 10
1 0.550148 0.508679 0.989861 0.493978 0.901304 0.305367
2 0.0930075 0.940407 0.533085 0.00946023 0.722128 0.814612
3 0.489525 0.908901 0.84334 0.266004 0.295747 0.381676
⋮ ⋱ ⋮
7 0.207584 0.0427491 0.752123 0.698195 0.258356 0.00349739
8 0.0236468 0.44195 0.351499 0.0513387 0.593542 0.876841
9 0.119252 0.561738 0.406564 0.895753 0.967533 0.0455456
10 0.645555 0.647965 0.84088 … 0.977185 0.509992 0.626845