Skip to content

Group By

DimensionalData.jl provides a groupby function for dimensional grouping. This guide covers:

  • simple grouping with a function

  • grouping with Bins

  • grouping with another existing AbstractDimArray or Dimension

Grouping functions

Let's look at the kind of functions that can be used to group DateTime. Other types will follow the same principles, but are usually simpler.

First, load some packages:

julia
using DimensionalData
using Dates
using Statistics
const DD = DimensionalData

Now create a demo DateTime range

julia
julia> tempo = range(DateTime(2000), step=Hour(1), length=365*24*2)
DateTime("2000-01-01T00:00:00"):Hour(1):DateTime("2001-12-30T23:00:00")

Let's see how some common functions work.

The hour function will transform values to the hour of the day - the integers 0:23

julia
julia> hour.(tempo)
17520-element Vector{Int64}:
  0
  1
  2
  3
  4
  5
  6
  7
  8
  9

 15
 16
 17
 18
 19
 20
 21
 22
 23

Tuple groupings

julia
julia> yearmonth.(tempo)
17520-element Vector{Tuple{Int64, Int64}}:
 (2000, 1)
 (2000, 1)
 (2000, 1)
 (2000, 1)
 (2000, 1)
 (2000, 1)
 (2000, 1)
 (2000, 1)
 (2000, 1)
 (2000, 1)

 (2001, 12)
 (2001, 12)
 (2001, 12)
 (2001, 12)
 (2001, 12)
 (2001, 12)
 (2001, 12)
 (2001, 12)
 (2001, 12)

Grouping and reducing

Let's define an array with a time dimension of the times used above:

julia
julia> A = rand(X(1:0.01:2), Ti(tempo))
101×17520 DimArray{Float64, 2}
├────────────────────────────────┴─────────────────────────────────────── dims ┐
X  Sampled{Float64} 1.0:0.01:2.0 ForwardOrdered Regular Points,
Ti Sampled{DateTime} DateTime("2000-01-01T00:00:00"):Hour(1):DateTime("2001-12-30T23:00:00") ForwardOrdered Regular Points
└──────────────────────────────────────────────────────────────────────────────┘
    2000-01-01T00:00:00   2000-01-01T01:00:002001-12-30T23:00:00
 1.0   0.89757               0.795755                 0.905858
 1.01  0.969026              0.785993                 0.477727
 1.02  0.106472              0.646867                 0.807257
 1.03  0.283631              0.905428                 0.0958593
 ⋮                                                 ⋱  ⋮
 1.96  0.0536623             0.11609                  0.219831
 1.97  0.830655              0.673995                 0.244589
 1.98  0.445628              0.54935                  0.00358622
 1.99  0.571899              0.310328              …  0.355619
 2.0   0.488519              0.359731                 0.328946

Group by month, using the month function:

julia
julia> groups = groupby(A, Ti=>month)
12-element DimGroupByArray{DimArray{Float64,1},1}
├───────────────────────────────────────────────────┴──────────────────── dims ┐
Ti Sampled{Int64} [1, 2, …, 11, 12] ForwardOrdered Irregular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{Symbol, Any} with 1 entry:
  :groupby => :Ti=>month
├────────────────────────────────────────────────────────────────── group dims ┤
X, Ti
└──────────────────────────────────────────────────────────────────────────────┘
  1  101×1488 DimArray
  2  101×1368 DimArray
  3  101×1488 DimArray

 11  101×1440 DimArray
 12  101×1464 DimArray

We can take the mean of each group by broadcasting over them:

julia
julia> mean.(groups)
12-element DimArray{Float64, 1}
├─────────────────────────────────┴────────────────────────────────────── dims ┐
Ti Sampled{Int64} [1, 2, …, 11, 12] ForwardOrdered Irregular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{Symbol, Any} with 1 entry:
  :groupby => :Ti=>month
└──────────────────────────────────────────────────────────────────────────────┘
  1  0.49998
  2  0.499823
  3  0.499881
  4  0.500808

 10  0.499447
 11  0.500349
 12  0.499943

Binning

Sometimes we want to further aggregate our groups after running a function, or just bin the raw data directly. We can use the Bins wrapper to do this.

For quick analysis, we can break our groups into N bins.

julia
julia> groupby(A, Ti=>Bins(month, 4))
4-element DimGroupByArray{DimArray{Float64,1},1}
├──────────────────────────────────────────────────┴───────────────────── dims ┐
Ti Sampled{IntervalSets.Interval{:closed, :open, Float64}} [1.0 .. 3.75275 (closed-open), 3.75275 .. 6.5055 (closed-open), 6.5055 .. 9.25825 (closed-open), 9.25825 .. 12.011 (closed-open)] ForwardOrdered Irregular Intervals{Start}
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{Symbol, Any} with 1 entry:
  :groupby => :Ti=>Bins(month, 4)…
├────────────────────────────────────────────────────────────────── group dims ┤
X, Ti
└──────────────────────────────────────────────────────────────────────────────┘
 1.0 .. 3.75275 (closed-open)     101×4344 DimArray
 3.75275 .. 6.5055 (closed-open)  101×4368 DimArray
 6.5055 .. 9.25825 (closed-open)  101×4416 DimArray
 9.25825 .. 12.011 (closed-open)  101×4392 DimArray

Doing this requires slightly padding the bin edges, so the lookup of the output is less than ideal.

Select by Dimension

We can also select by Dimensions and any objects with dims methods.

Trivially, grouping by an object's own dimension is similar to eachslice:

julia
julia> groupby(A, dims(A, Ti))
17520-element DimGroupByArray{DimArray{Float64,1},1}
├──────────────────────────────────────────────────────┴───────────────── dims ┐
Ti Sampled{DateTime} DateTime("2000-01-01T00:00:00"):Hour(1):DateTime("2001-12-30T23:00:00") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{Symbol, Any} with 1 entry:
  :groupby => :Ti=>[DateTime("2000-01-01T00:00:00"), DateTime("2000-01-01T01:00…
├────────────────────────────────────────────────────────────────── group dims ┤
X, Ti
└──────────────────────────────────────────────────────────────────────────────┘
 2000-01-01T00:00:00  101×1 DimArray
 2000-01-01T01:00:00  101×1 DimArray
 2000-01-01T02:00:00  101×1 DimArray

 2001-12-30T22:00:00  101×1 DimArray
 2001-12-30T23:00:00  101×1 DimArray

TODO: Apply custom function (i.e. normalization) to grouped output.