Group By
DimensionalData.jl provides a groupby function for dimensional grouping. This guide will cover:
simple grouping with a function
grouping with
Binsgrouping with another existing
AbstractDimArryorDimension
Grouping functions
Lets look at the kind of functions that can be used to group DateTime. Other types will follow the same principles, but are usually simpler.
First load some packages:
using DimensionalData
using Dates
using Statistics
const DD = DimensionalDataDimensionalDataNow create a demo DateTime range
julia> tempo = range(DateTime(2000), step=Hour(1), length=365*24*2)Dates.DateTime("2000-01-01T00:00:00"):Dates.Hour(1):Dates.DateTime("2001-12-30T23:00:00")Lets see how some common functions work.
The hour function will transform values to hour of the day - the integers 0:23
julia> hour.(tempo)17520-element Vector{Int64}:
0
1
2
3
4
5
6
7
8
9
⋮
15
16
17
18
19
20
21
22
23Tuple groupings
julia> yearmonth.(tempo)17520-element Vector{Tuple{Int64, Int64}}:
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
⋮
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)Grouping and reducing
Lets define an array with a time dimension of the times used above:
julia> A = rand(X(1:0.01:2), Ti(tempo))╭───────────────────────────────╮
│ 101×17520 DimArray{Float64,2} │
├───────────────────────────────┴──────────────────────────────────────── dims ┐
↓ X Sampled{Float64} 1.0:0.01:2.0 ForwardOrdered Regular Points,
→ Ti Sampled{Dates.DateTime} Dates.DateTime("2000-01-01T00:00:00"):Dates.Hour(1):Dates.DateTime("2001-12-30T23:00:00") ForwardOrdered Regular Points
└──────────────────────────────────────────────────────────────────────────────┘
↓ → 2000-01-01T00:00:00 2000-01-01T01:00:00 … 2001-12-30T23:00:00
1.0 0.591659 0.155785 0.906121
1.01 0.260031 0.768952 0.635174
1.02 0.339674 0.638798 0.671752
1.03 0.501871 0.694452 0.915702
⋮ ⋱
1.96 0.83551 0.570144 0.39752
1.97 0.217929 0.711494 0.269388
1.98 0.442426 0.994229 0.208125
1.99 0.267023 0.330585 0.795935
2.0 0.614788 0.546301 … 0.093981Group by month, using the month function:
julia> groups = groupby(A, Ti=>month)╭───────────────────────────────────────────────────╮
│ 12-element DimGroupByArray{DimArray{Float64,1},1} │
├───────────────────────────────────────────────────┴───────────── dims ┐
↓ Ti Sampled{Int64} [1, 2, …, 11, 12] ForwardOrdered Irregular Points
├───────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>month
├─────────────────────────────────────────────────────────── group dims ┤
↓ X, → Ti
└───────────────────────────────────────────────────────────────────────┘
1 101×1488 DimArray
2 101×1368 DimArray
3 101×1488 DimArray
⋮
11 101×1440 DimArray
12 101×1464 DimArrayWe can take the mean of each group by broadcasting over them :
julia> mean.(groups)╭────────────────────────────────╮
│ 12-element DimArray{Float64,1} │
├────────────────────────────────┴──────────────────────────────── dims ┐
↓ Ti Sampled{Int64} [1, 2, …, 11, 12] ForwardOrdered Irregular Points
├───────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>month
└───────────────────────────────────────────────────────────────────────┘
1 0.500215
2 0.500742
3 0.50073
4 0.500234
⋮
10 0.499835
11 0.500627
12 0.500363Binning
Sometimes we want to further aggregate our groups after running a function, or just bin the raw data directly. We can use the Bins wrapper to do this.
For quick analysis, we can break our groups into N bins.
julia> groupby(A, Ti=>Bins(month, 4))╭──────────────────────────────────────────────────╮
│ 4-element DimGroupByArray{DimArray{Float64,1},1} │
├──────────────────────────────────────────────────┴───────────────────── dims ┐
↓ Ti Sampled{IntervalSets.Interval{:closed, :open, Float64}} [1.0 .. 3.75275 (closed-open), 3.75275 .. 6.5055 (closed-open), 6.5055 .. 9.25825 (closed-open), 9.25825 .. 12.011 (closed-open)] ForwardOrdered Irregular Intervals{Start}
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>Bins(month, 4)…
├────────────────────────────────────────────────────────────────── group dims ┤
↓ X, → Ti
└──────────────────────────────────────────────────────────────────────────────┘
1.0 .. 3.75275 (closed-open) 101×4344 DimArray
3.75275 .. 6.5055 (closed-open) 101×4368 DimArray
6.5055 .. 9.25825 (closed-open) 101×4416 DimArray
9.25825 .. 12.011 (closed-open) 101×4392 DimArrayDoing this requires slighly padding the bin edges, so the lookup of the output is less than ideal.
Select by Dimension
We can also select by Dimensions and any objects with dims methods.
Trivially, grouping by an objects own dimension is similar to eachslice:
julia> groupby(A, dims(A, Ti))╭──────────────────────────────────────────────────────╮
│ 17520-element DimGroupByArray{DimArray{Float64,1},1} │
├──────────────────────────────────────────────────────┴───────────────── dims ┐
↓ Ti Sampled{Dates.DateTime} Dates.DateTime("2000-01-01T00:00:00"):Dates.Hour(1):Dates.DateTime("2001-12-30T23:00:00") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>[DateTime("2000-01-01T00:00:00"), DateTime("2000-01-01T01:00…
├────────────────────────────────────────────────────────────────── group dims ┤
↓ X, → Ti
└──────────────────────────────────────────────────────────────────────────────┘
2000-01-01T00:00:00 101×1 DimArray
2000-01-01T01:00:00 101×1 DimArray
2000-01-01T02:00:00 101×1 DimArray
⋮
2001-12-30T22:00:00 101×1 DimArray
2001-12-30T23:00:00 101×1 DimArrayTODO: Apply custom function (i.e. normalization) to grouped output.