Group By
DimensionalData.jl provides a groupby
function for dimensional grouping. This guide will cover:
simple grouping with a function
grouping with
Bins
grouping with another existing
AbstractDimArry
orDimension
Grouping functions
Lets look at the kind of functions that can be used to group DateTime
. Other types will follow the same principles, but are usually simpler.
First load some packages:
using DimensionalData
using Dates
using Statistics
const DD = DimensionalData
DimensionalData
Now create a demo DateTime
range
julia> tempo = range(DateTime(2000), step=Hour(1), length=365*24*2)
Dates.DateTime("2000-01-01T00:00:00"):Dates.Hour(1):Dates.DateTime("2001-12-30T23:00:00")
Lets see how some common functions work.
The hour
function will transform values to hour of the day - the integers 0:23
julia> hour.(tempo)
17520-element Vector{Int64}:
0
1
2
3
4
5
6
7
8
9
⋮
15
16
17
18
19
20
21
22
23
Tuple groupings
julia> yearmonth.(tempo)
17520-element Vector{Tuple{Int64, Int64}}:
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
⋮
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
Grouping and reducing
Lets define an array with a time dimension of the times used above:
julia> A = rand(X(1:0.01:2), Ti(tempo))
╭───────────────────────────────╮
│ 101×17520 DimArray{Float64,2} │
├───────────────────────────────┴──────────────────────────────────────── dims ┐
↓ X Sampled{Float64} 1.0:0.01:2.0 ForwardOrdered Regular Points,
→ Ti Sampled{Dates.DateTime} Dates.DateTime("2000-01-01T00:00:00"):Dates.Hour(1):Dates.DateTime("2001-12-30T23:00:00") ForwardOrdered Regular Points
└──────────────────────────────────────────────────────────────────────────────┘
↓ → 2000-01-01T00:00:00 2000-01-01T01:00:00 … 2001-12-30T23:00:00
1.0 0.591659 0.155785 0.906121
1.01 0.260031 0.768952 0.635174
1.02 0.339674 0.638798 0.671752
1.03 0.501871 0.694452 0.915702
⋮ ⋱
1.96 0.83551 0.570144 0.39752
1.97 0.217929 0.711494 0.269388
1.98 0.442426 0.994229 0.208125
1.99 0.267023 0.330585 0.795935
2.0 0.614788 0.546301 … 0.093981
Group by month, using the month
function:
julia> groups = groupby(A, Ti=>month)
╭───────────────────────────────────────────────────╮
│ 12-element DimGroupByArray{DimArray{Float64,1},1} │
├───────────────────────────────────────────────────┴───────────── dims ┐
↓ Ti Sampled{Int64} [1, 2, …, 11, 12] ForwardOrdered Irregular Points
├───────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>month
├─────────────────────────────────────────────────────────── group dims ┤
↓ X, → Ti
└───────────────────────────────────────────────────────────────────────┘
1 101×1488 DimArray
2 101×1368 DimArray
3 101×1488 DimArray
⋮
11 101×1440 DimArray
12 101×1464 DimArray
We can take the mean of each group by broadcasting over them :
julia> mean.(groups)
╭────────────────────────────────╮
│ 12-element DimArray{Float64,1} │
├────────────────────────────────┴──────────────────────────────── dims ┐
↓ Ti Sampled{Int64} [1, 2, …, 11, 12] ForwardOrdered Irregular Points
├───────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>month
└───────────────────────────────────────────────────────────────────────┘
1 0.500215
2 0.500742
3 0.50073
4 0.500234
⋮
10 0.499835
11 0.500627
12 0.500363
Binning
Sometimes we want to further aggregate our groups after running a function, or just bin the raw data directly. We can use the Bins
wrapper to do this.
For quick analysis, we can break our groups into N
bins.
julia> groupby(A, Ti=>Bins(month, 4))
╭──────────────────────────────────────────────────╮
│ 4-element DimGroupByArray{DimArray{Float64,1},1} │
├──────────────────────────────────────────────────┴───────────────────── dims ┐
↓ Ti Sampled{IntervalSets.Interval{:closed, :open, Float64}} [1.0 .. 3.75275 (closed-open), 3.75275 .. 6.5055 (closed-open), 6.5055 .. 9.25825 (closed-open), 9.25825 .. 12.011 (closed-open)] ForwardOrdered Irregular Intervals{Start}
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>Bins(month, 4)…
├────────────────────────────────────────────────────────────────── group dims ┤
↓ X, → Ti
└──────────────────────────────────────────────────────────────────────────────┘
1.0 .. 3.75275 (closed-open) 101×4344 DimArray
3.75275 .. 6.5055 (closed-open) 101×4368 DimArray
6.5055 .. 9.25825 (closed-open) 101×4416 DimArray
9.25825 .. 12.011 (closed-open) 101×4392 DimArray
Doing this requires slighly padding the bin edges, so the lookup of the output is less than ideal.
Select by Dimension
We can also select by Dimension
s and any objects with dims
methods.
Trivially, grouping by an objects own dimension is similar to eachslice
:
julia> groupby(A, dims(A, Ti))
╭──────────────────────────────────────────────────────╮
│ 17520-element DimGroupByArray{DimArray{Float64,1},1} │
├──────────────────────────────────────────────────────┴───────────────── dims ┐
↓ Ti Sampled{Dates.DateTime} Dates.DateTime("2000-01-01T00:00:00"):Dates.Hour(1):Dates.DateTime("2001-12-30T23:00:00") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>[DateTime("2000-01-01T00:00:00"), DateTime("2000-01-01T01:00…
├────────────────────────────────────────────────────────────────── group dims ┤
↓ X, → Ti
└──────────────────────────────────────────────────────────────────────────────┘
2000-01-01T00:00:00 101×1 DimArray
2000-01-01T01:00:00 101×1 DimArray
2000-01-01T02:00:00 101×1 DimArray
⋮
2001-12-30T22:00:00 101×1 DimArray
2001-12-30T23:00:00 101×1 DimArray
TODO: Apply custom function (i.e. normalization) to grouped output.