Group By
DimensionalData.jl provides a groupby
function for dimensional grouping. This guide covers:
simple grouping with a function
grouping with
Bins
grouping with another existing
AbstractDimArray
orDimension
Grouping functions
Let's look at the kind of functions that can be used to group DateTime
. Other types will follow the same principles, but are usually simpler.
First, load some packages:
using DimensionalData
using Dates
using Statistics
const DD = DimensionalData
Now create a demo DateTime
range
julia> tempo = range(DateTime(2000), step=Hour(1), length=365*24*2)
DateTime("2000-01-01T00:00:00"):Hour(1):DateTime("2001-12-30T23:00:00")
Let's see how some common functions work.
The hour
function will transform values to the hour of the day - the integers 0:23
julia> hour.(tempo)
17520-element Vector{Int64}:
0
1
2
3
4
5
6
7
8
9
⋮
15
16
17
18
19
20
21
22
23
Tuple groupings
julia> yearmonth.(tempo)
17520-element Vector{Tuple{Int64, Int64}}:
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
⋮
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
Grouping and reducing
Let's define an array with a time dimension of the times used above:
julia> A = rand(X(1:0.01:2), Ti(tempo))
╭────────────────────────────────╮
│ 101×17520 DimArray{Float64, 2} │
├────────────────────────────────┴─────────────────────────────────────── dims ┐
↓ X Sampled{Float64} 1.0:0.01:2.0 ForwardOrdered Regular Points,
→ Ti Sampled{DateTime} DateTime("2000-01-01T00:00:00"):Hour(1):DateTime("2001-12-30T23:00:00") ForwardOrdered Regular Points
└──────────────────────────────────────────────────────────────────────────────┘
↓ → 2000-01-01T00:00:00 2000-01-01T01:00:00 … 2001-12-30T23:00:00
1.0 0.654537 0.418968 0.677549
1.01 0.664038 0.674881 0.578183
1.02 0.00832284 0.475569 0.454715
1.03 0.639212 0.616635 0.875994
⋮ ⋱ ⋮
1.96 0.0106725 0.846581 0.757228
1.97 0.585756 0.485119 0.299692
1.98 0.471877 0.889153 0.336768
1.99 0.428951 0.312976 … 0.948798
2.0 0.921012 0.397575 0.0897302
Group by month, using the month
function:
julia> groups = groupby(A, Ti=>month)
╭───────────────────────────────────────────────────╮
│ 12-element DimGroupByArray{DimArray{Float64,1},1} │
├───────────────────────────────────────────────────┴──────────────────── dims ┐
↓ Ti Sampled{Int64} [1, 2, …, 11, 12] ForwardOrdered Irregular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>month
├────────────────────────────────────────────────────────────────── group dims ┤
↓ X, → Ti
└──────────────────────────────────────────────────────────────────────────────┘
1 101×1488 DimArray
2 101×1368 DimArray
3 101×1488 DimArray
⋮
11 101×1440 DimArray
12 101×1464 DimArray
We can take the mean of each group by broadcasting over them:
julia> mean.(groups)
╭─────────────────────────────────╮
│ 12-element DimArray{Float64, 1} │
├─────────────────────────────────┴────────────────────────────────────── dims ┐
↓ Ti Sampled{Int64} [1, 2, …, 11, 12] ForwardOrdered Irregular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>month
└──────────────────────────────────────────────────────────────────────────────┘
1 0.500665
2 0.499693
3 0.500331
4 0.499353
⋮
10 0.499069
11 0.500155
12 0.500136
Binning
Sometimes we want to further aggregate our groups after running a function, or just bin the raw data directly. We can use the Bins
wrapper to do this.
For quick analysis, we can break our groups into N
bins.
julia> groupby(A, Ti=>Bins(month, 4))
╭──────────────────────────────────────────────────╮
│ 4-element DimGroupByArray{DimArray{Float64,1},1} │
├──────────────────────────────────────────────────┴───────────────────── dims ┐
↓ Ti Sampled{IntervalSets.Interval{:closed, :open, Float64}} [1.0 .. 3.75275 (closed-open), 3.75275 .. 6.5055 (closed-open), 6.5055 .. 9.25825 (closed-open), 9.25825 .. 12.011 (closed-open)] ForwardOrdered Irregular Intervals{Start}
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>Bins(month, 4)…
├────────────────────────────────────────────────────────────────── group dims ┤
↓ X, → Ti
└──────────────────────────────────────────────────────────────────────────────┘
1.0 .. 3.75275 (closed-open) 101×4344 DimArray
3.75275 .. 6.5055 (closed-open) 101×4368 DimArray
6.5055 .. 9.25825 (closed-open) 101×4416 DimArray
9.25825 .. 12.011 (closed-open) 101×4392 DimArray
Doing this requires slightly padding the bin edges, so the lookup of the output is less than ideal.
Select by Dimension
We can also select by Dimension
s and any objects with dims
methods.
Trivially, grouping by an object's own dimension is similar to eachslice
:
julia> groupby(A, dims(A, Ti))
╭──────────────────────────────────────────────────────╮
│ 17520-element DimGroupByArray{DimArray{Float64,1},1} │
├──────────────────────────────────────────────────────┴───────────────── dims ┐
↓ Ti Sampled{DateTime} DateTime("2000-01-01T00:00:00"):Hour(1):DateTime("2001-12-30T23:00:00") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>[DateTime("2000-01-01T00:00:00"), DateTime("2000-01-01T01:00…
├────────────────────────────────────────────────────────────────── group dims ┤
↓ X, → Ti
└──────────────────────────────────────────────────────────────────────────────┘
2000-01-01T00:00:00 101×1 DimArray
2000-01-01T01:00:00 101×1 DimArray
2000-01-01T02:00:00 101×1 DimArray
⋮
2001-12-30T22:00:00 101×1 DimArray
2001-12-30T23:00:00 101×1 DimArray
TODO: Apply custom function (i.e. normalization) to grouped output.