Skip to content

Tables and DataFrames

Tables.jl provides an ecosystem-wide interface to tabular data in Julia, ensuring interoperability with DataFrames.jl, CSV.jl, and hundreds of other packages that implement the standard.

Dimensional data are tables

DimensionalData.jl implements the Tables.jl interface for AbstractDimArray and AbstractDimStack. DimStack layers are unrolled so they are all the same size, and dimensions loop to match the length of the largest layer.

Columns are given the name of the array or stack layer, and the result of DD.name(dimension) for Dimension columns.

Looping of dimensions and stack layers is done lazily, and does not allocate unless collected.

Materializing tables to DimArray or DimStack

DimArray and DimStack have fallback methods to materialize any Tables.jl-compatible table.

By default, it will treat columns such as X, Y, Z, and Band as dimensions, and other columns as data. Pass a name keyword argument to determine which column(s) are used.

You have full control over which columns are dimensions - and what those dimensions look like exactly. If you pass a Tuple of Symbol or dimension types (e.g. X) as the second argument, those columns are treated as dimensions. Passing a Tuple of dimensions preserves these dimensions - with values matched to the corresponding columns.

Materializing tables will worked even if the table is not ordered, and can handle missing values.

Example

julia
using DimensionalData
using Dates
using DataFrames

Define some dimensions:

julia
julia> x, y, c = X(1:10), Y(1:10), Dim{:category}('a':'z')
(X 1:10,
Y 1:10,
category 'a':1:'z')
julia
julia> A = rand(x, y, c; name=:data)
10×10×26 DimArray{Float64, 3} data
├────────────────────────────────────┴────────────── dims ┐
X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
Y Sampled{Int64} 1:10 ForwardOrdered Regular Points,
category Categorical{Char} 'a':1:'z' ForwardOrdered
└─────────────────────────────────────────────────────────┘
[:, :, 1]
  1         2         38         9         10
  1    0.599241  0.518938  0.341133     0.486559  0.162516   0.934189
  2    0.192192  0.336376  0.636476     0.840593  0.687569   0.294486
  3    0.607291  0.963657  0.353968     0.120955  0.434286   0.887294
  ⋮                                  ⋱                       ⋮
  7    0.194849  0.975511  0.612828     0.888721  0.890574   0.436622
  8    0.364097  0.163103  0.142055     0.049689  0.259847   0.570725
  9    0.394448  0.755939  0.54624      0.156388  0.210664   0.966517
 10    0.828604  0.359421  0.51621   …  0.828161  0.107233   0.74172

Converting to DataFrame

Arrays will have columns for each dimension, and only one data column

julia
julia> DataFrame(A)
2600×4 DataFrame
  Row  X      Y      category  data      
 Int64  Int64  Char      Float64   
──────┼───────────────────────────────────
    1 │     1      1  a         0.599241
    2 │     2      1  a         0.192192
    3 │     3      1  a         0.607291
    4 │     4      1  a         0.921958
    5 │     5      1  a         0.449491
    6 │     6      1  a         0.581131
    7 │     7      1  a         0.194849
    8 │     8      1  a         0.364097
  ⋮   │   ⋮      ⋮       ⋮          ⋮
 2594 │     4     10  z         0.852872
 2595 │     5     10  z         0.0958843
 2596 │     6     10  z         0.315302
 2597 │     7     10  z         0.236866
 2598 │     8     10  z         0.894053
 2599 │     9     10  z         0.350024
 2600 │    10     10  z         0.417756
                         2585 rows omitted

Converting to CSV

We can also write arrays and stacks directly to CSV.jl, or any other data type supporting the Tables.jl interface.

julia
using CSV
CSV.write("dimstack.csv", st)
readlines("dimstack.csv")
2601-element Vector{String}:
 "X,Y,category,data1,data2"
 "1,1,a,0.55560637324799,0.845200516911609"
 "2,1,a,0.10276733254788795,0.9104238640380062"
 "3,1,a,0.22237128922242078,0.8268020755919178"
 "4,1,a,0.5501481631111826,0.9447511416331498"
 "5,1,a,0.09300753748828394,0.15945803739833375"
 "6,1,a,0.48952511607945026,0.6146564273146751"
 "7,1,a,0.7938317326707394,0.9770663775826343"
 "8,1,a,0.0019198597596568057,0.798655984630017"
 "9,1,a,0.44833963865079907,0.40268027828179853"

 "2,10,z,0.9675326879984427,0.41940525122635797"
 "3,10,z,0.5099922507050859,0.07986058669268159"
 "4,10,z,0.3053673139967894,0.4496996354823414"
 "5,10,z,0.8146121812750928,0.9452913850518949"
 "6,10,z,0.38167574879167476,0.24524306337289326"
 "7,10,z,0.17977958441149666,0.1985699519321249"
 "8,10,z,0.7044663405368152,0.694278906020718"
 "9,10,z,0.5697400488168892,0.20636222545147498"
 "10,10,z,0.8560905731682101,0.8428656510212863"

Converting a DataFrame to a DimArray or DimStack

The Dataframe we use will have 5 columns: X, Y, category, data1, and data2

julia
julia> df = DataFrame(st)
2600×5 DataFrame
  Row  X      Y      category  data1       data2    
 Int64  Int64  Char      Float64     Float64  
──────┼──────────────────────────────────────────────
    1 │     1      1  a         0.555606    0.845201
    2 │     2      1  a         0.102767    0.910424
    3 │     3      1  a         0.222371    0.826802
    4 │     4      1  a         0.550148    0.944751
    5 │     5      1  a         0.0930075   0.159458
    6 │     6      1  a         0.489525    0.614656
    7 │     7      1  a         0.793832    0.977066
    8 │     8      1  a         0.00191986  0.798656
  ⋮   │   ⋮      ⋮       ⋮          ⋮          ⋮
 2594 │     4     10  z         0.305367    0.4497
 2595 │     5     10  z         0.814612    0.945291
 2596 │     6     10  z         0.381676    0.245243
 2597 │     7     10  z         0.17978     0.19857
 2598 │     8     10  z         0.704466    0.694279
 2599 │     9     10  z         0.56974     0.206362
 2600 │    10     10  z         0.856091    0.842866
                                    2585 rows omitted

Converting this DataFrame to a DimArray without other arguments will read the category columns as data and ignore data1 and data2:

julia
julia> DimArray(df)
10×10 DimArray{Char, 2} category
├──────────────────────────────────┴────────────────── dims ┐
X Sampled{Int64} 1:1:10 ForwardOrdered Regular Points,
Y Sampled{Int64} 1:1:10 ForwardOrdered Regular Points
└───────────────────────────────────────────────────────────┘
  1     2     3     4     5     6     7     8     9     10
  1     'z'   'z'   'z'   'z'   'z'   'z'   'z'   'z'   'z'    'z'
  2     'z'   'z'   'z'   'z'   'z'   'z'   'z'   'z'   'z'    'z'
  3     'z'   'z'   'z'   'z'   'z'   'z'   'z'   'z'   'z'    'z'
  4     'z'   'z'   'z'   'z'   'z'   'z'   'z'   'z'   'z'    'z'
  ⋮                            ⋮                              ⋮
  7     'z'   'z'   'z'   'z'   'z'   'z'   'z'   'z'   'z'    'z'
  8     'z'   'z'   'z'   'z'   'z'   'z'   'z'   'z'   'z'    'z'
  9     'z'   'z'   'z'   'z'   'z'   'z'   'z'   'z'   'z'    'z'
 10     'z'   'z'   'z'   'z'   'z'   'z'   'z'   'z'   'z'    'z'

Specify dimenion names to ensure these get treated as dimensions. Now data1 is read in instead.

julia
julia> DimArray(df, (X,Y,:category))
10×10×26 DimArray{Float64, 3} data1
├─────────────────────────────────────┴──────────────── dims ┐
X Sampled{Int64} 1:1:10 ForwardOrdered Regular Points,
Y Sampled{Int64} 1:1:10 ForwardOrdered Regular Points,
category Categorical{Char} ['a', …, 'z'] ForwardOrdered
└────────────────────────────────────────────────────────────┘
[:, :, 1]
  1           2          38          9         10
  1    0.555606    0.140713   0.44195      0.8843     0.224013   0.182997
  2    0.102767    0.119252   0.561738     0.654229   0.339039   0.967533
  3    0.222371    0.645555   0.647965     0.719804   0.333765   0.509992
  ⋮                                     ⋱                        ⋮
  7    0.793832    0.840653   0.267666     0.515599   0.662927   0.17978
  8    0.00191986  0.438789   0.349911     0.0595599  0.594161   0.704466
  9    0.44834     0.796331   0.566217     0.691596   0.639773   0.56974
 10    0.42684     0.0427491  0.752123  …  0.698195   0.757712   0.856091

You can also pass in the actual dimensions.

julia
julia> DimArray(df, dims(st))
10×10×26 DimArray{Float64, 3} data1
├─────────────────────────────────────┴───────────── dims ┐
X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
Y Sampled{Int64} 1:10 ForwardOrdered Regular Points,
category Categorical{Char} 'a':1:'z' ForwardOrdered
└─────────────────────────────────────────────────────────┘
[:, :, 1]
  1           2          38          9         10
  1    0.555606    0.140713   0.44195      0.8843     0.224013   0.182997
  2    0.102767    0.119252   0.561738     0.654229   0.339039   0.967533
  3    0.222371    0.645555   0.647965     0.719804   0.333765   0.509992
  ⋮                                     ⋱                        ⋮
  7    0.793832    0.840653   0.267666     0.515599   0.662927   0.17978
  8    0.00191986  0.438789   0.349911     0.0595599  0.594161   0.704466
  9    0.44834     0.796331   0.566217     0.691596   0.639773   0.56974
 10    0.42684     0.0427491  0.752123  …  0.698195   0.757712   0.856091

Pass in a name argument to read in data2 instead.

julia
julia> DimArray(df, dims(st); name = :data2)
10×10×26 DimArray{Float64, 3} data2
├─────────────────────────────────────┴───────────── dims ┐
X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
Y Sampled{Int64} 1:10 ForwardOrdered Regular Points,
category Categorical{Char} 'a':1:'z' ForwardOrdered
└─────────────────────────────────────────────────────────┘
[:, :, 1]
  1          2         38         9         10
  1    0.845201   0.94177   0.490656     0.482435  0.386324   0.859582
  2    0.910424   0.427826  0.862728     0.248024  0.293925   0.0595812
  3    0.826802   0.512368  0.938823     0.655281  0.33373    0.675824
  ⋮                                   ⋱                       ⋮
  7    0.977066   0.628265  0.887326     0.502325  0.9802     0.308634
  8    0.798656   0.471819  0.88707      0.944636  0.751848   0.441872
  9    0.40268    0.021849  0.121545     0.249602  0.793648   0.779541
 10    0.0522006  0.933603  0.228074  …  0.289268  0.788237   0.548612