Skip to content

Group By

DimensionalData.jl provides a groupby function for dimensional grouping. This guide will cover:

  • simple grouping with a function

  • grouping with Bins

  • grouping with another existing AbstractDimArry or Dimension

Grouping functions

Lets look at the kind of functions that can be used to group DateTime. Other types will follow the same principles, but are usually simpler.

First load some packages:

julia
using DimensionalData
using Dates
using Statistics
const DD = DimensionalData
DimensionalData

Now create a demo DateTime range

julia
julia> tempo = range(DateTime(2000), step=Hour(1), length=365*24*2)
Dates.DateTime("2000-01-01T00:00:00"):Dates.Hour(1):Dates.DateTime("2001-12-30T23:00:00")

Lets see how some common functions work.

The hour function will transform values to hour of the day - the integers 0:23

julia
julia> hour.(tempo)
17520-element Vector{Int64}:
  0
  1
  2
  3
  4
  5
  6
  7
  8
  9

 15
 16
 17
 18
 19
 20
 21
 22
 23

Tuple groupings

julia
julia> yearmonth.(tempo)
17520-element Vector{Tuple{Int64, Int64}}:
 (2000, 1)
 (2000, 1)
 (2000, 1)
 (2000, 1)
 (2000, 1)
 (2000, 1)
 (2000, 1)
 (2000, 1)
 (2000, 1)
 (2000, 1)

 (2001, 12)
 (2001, 12)
 (2001, 12)
 (2001, 12)
 (2001, 12)
 (2001, 12)
 (2001, 12)
 (2001, 12)
 (2001, 12)

Grouping and reducing

Lets define an array with a time dimension of the times used above:

julia
julia> A = rand(X(1:0.01:2), Ti(tempo))
╭───────────────────────────────╮
101×17520 DimArray{Float64,2}
├───────────────────────────────┴──────────────────────────────────────── dims ┐
X  Sampled{Float64} 1.0:0.01:2.0 ForwardOrdered Regular Points,
Ti Sampled{Dates.DateTime} Dates.DateTime("2000-01-01T00:00:00"):Dates.Hour(1):Dates.DateTime("2001-12-30T23:00:00") ForwardOrdered Regular Points
└──────────────────────────────────────────────────────────────────────────────┘
    2000-01-01T00:00:00   2000-01-01T01:00:002001-12-30T23:00:00
 1.0   0.591659              0.155785                 0.906121
 1.01  0.260031              0.768952                 0.635174
 1.02  0.339674              0.638798                 0.671752
 1.03  0.501871              0.694452                 0.915702
 ⋮                                                 ⋱
 1.96  0.83551               0.570144                 0.39752
 1.97  0.217929              0.711494                 0.269388
 1.98  0.442426              0.994229                 0.208125
 1.99  0.267023              0.330585                 0.795935
 2.0   0.614788              0.546301              …  0.093981

Group by month, using the month function:

julia
julia> groups = groupby(A, Ti=>month)
╭───────────────────────────────────────────────────╮
12-element DimGroupByArray{DimArray{Float64,1},1}
├───────────────────────────────────────────────────┴───────────── dims ┐
Ti Sampled{Int64} [1, 2, …, 11, 12] ForwardOrdered Irregular Points
├───────────────────────────────────────────────────────────── metadata ┤
  Dict{Symbol, Any} with 1 entry:
  :groupby => :Ti=>month
├─────────────────────────────────────────────────────────── group dims ┤
X, Ti
└───────────────────────────────────────────────────────────────────────┘
  1  101×1488 DimArray
  2  101×1368 DimArray
  3  101×1488 DimArray

 11  101×1440 DimArray
 12  101×1464 DimArray

We can take the mean of each group by broadcasting over them :

julia
julia> mean.(groups)
╭────────────────────────────────╮
12-element DimArray{Float64,1}
├────────────────────────────────┴──────────────────────────────── dims ┐
Ti Sampled{Int64} [1, 2, …, 11, 12] ForwardOrdered Irregular Points
├───────────────────────────────────────────────────────────── metadata ┤
  Dict{Symbol, Any} with 1 entry:
  :groupby => :Ti=>month
└───────────────────────────────────────────────────────────────────────┘
  1  0.500215
  2  0.500742
  3  0.50073
  4  0.500234

 10  0.499835
 11  0.500627
 12  0.500363

Binning

Sometimes we want to further aggregate our groups after running a function, or just bin the raw data directly. We can use the Bins wrapper to do this.

For quick analysis, we can break our groups into N bins.

julia
julia> groupby(A, Ti=>Bins(month, 4))
╭──────────────────────────────────────────────────╮
4-element DimGroupByArray{DimArray{Float64,1},1}
├──────────────────────────────────────────────────┴───────────────────── dims ┐
Ti Sampled{IntervalSets.Interval{:closed, :open, Float64}} [1.0 .. 3.75275 (closed-open), 3.75275 .. 6.5055 (closed-open), 6.5055 .. 9.25825 (closed-open), 9.25825 .. 12.011 (closed-open)] ForwardOrdered Irregular Intervals{Start}
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{Symbol, Any} with 1 entry:
  :groupby => :Ti=>Bins(month, 4)…
├────────────────────────────────────────────────────────────────── group dims ┤
X, Ti
└──────────────────────────────────────────────────────────────────────────────┘
 1.0 .. 3.75275 (closed-open)     101×4344 DimArray
 3.75275 .. 6.5055 (closed-open)  101×4368 DimArray
 6.5055 .. 9.25825 (closed-open)  101×4416 DimArray
 9.25825 .. 12.011 (closed-open)  101×4392 DimArray

Doing this requires slighly padding the bin edges, so the lookup of the output is less than ideal.

Select by Dimension

We can also select by Dimensions and any objects with dims methods.

Trivially, grouping by an objects own dimension is similar to eachslice:

julia
julia> groupby(A, dims(A, Ti))
╭──────────────────────────────────────────────────────╮
17520-element DimGroupByArray{DimArray{Float64,1},1}
├──────────────────────────────────────────────────────┴───────────────── dims ┐
Ti Sampled{Dates.DateTime} Dates.DateTime("2000-01-01T00:00:00"):Dates.Hour(1):Dates.DateTime("2001-12-30T23:00:00") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{Symbol, Any} with 1 entry:
  :groupby => :Ti=>[DateTime("2000-01-01T00:00:00"), DateTime("2000-01-01T01:00…
├────────────────────────────────────────────────────────────────── group dims ┤
X, Ti
└──────────────────────────────────────────────────────────────────────────────┘
 2000-01-01T00:00:00  101×1 DimArray
 2000-01-01T01:00:00  101×1 DimArray
 2000-01-01T02:00:00  101×1 DimArray

 2001-12-30T22:00:00  101×1 DimArray
 2001-12-30T23:00:00  101×1 DimArray

TODO: Apply custom function (i.e. normalization) to grouped output.