Group By
DimensionalData.jl provides a groupby
function for dimensional grouping. This guide covers:
simple grouping with a function
grouping with
Bins
grouping with another existing
AbstractDimArray
orDimension
Grouping functions
Let's look at the kind of functions that can be used to group DateTime
. Other types will follow the same principles, but are usually simpler.
First, load some packages:
using DimensionalData
using Dates
using Statistics
const DD = DimensionalData
Now create a demo DateTime
range
julia> tempo = range(DateTime(2000), step=Hour(1), length=365*24*2)
DateTime("2000-01-01T00:00:00"):Hour(1):DateTime("2001-12-30T23:00:00")
Let's see how some common functions work.
The hour
function will transform values to the hour of the day - the integers 0:23
julia> hour.(tempo)
17520-element Vector{Int64}:
0
1
2
3
4
5
6
7
8
9
⋮
15
16
17
18
19
20
21
22
23
Tuple groupings
julia> yearmonth.(tempo)
17520-element Vector{Tuple{Int64, Int64}}:
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
⋮
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
Grouping and reducing
Let's define an array with a time dimension of the times used above:
julia> A = rand(X(1:0.01:2), Ti(tempo))
┌ 101×17520 DimArray{Float64, 2} ┐
├────────────────────────────────┴─────────────────────────────────────── dims ┐
↓ X Sampled{Float64} 1.0:0.01:2.0 ForwardOrdered Regular Points,
→ Ti Sampled{DateTime} DateTime("2000-01-01T00:00:00"):Hour(1):DateTime("2001-12-30T23:00:00") ForwardOrdered Regular Points
└──────────────────────────────────────────────────────────────────────────────┘
↓ → 2000-01-01T00:00:00 2000-01-01T01:00:00 … 2001-12-30T23:00:00
1.0 0.89757 0.795755 0.905858
1.01 0.969026 0.785993 0.477727
1.02 0.106472 0.646867 0.807257
1.03 0.283631 0.905428 0.0958593
⋮ ⋱ ⋮
1.96 0.0536623 0.11609 0.219831
1.97 0.830655 0.673995 0.244589
1.98 0.445628 0.54935 0.00358622
1.99 0.571899 0.310328 … 0.355619
2.0 0.488519 0.359731 0.328946
Group by month, using the month
function:
julia> groups = groupby(A, Ti=>month)
┌ 12-element DimGroupByArray{DimArray{Float64,1},1} ┐
├───────────────────────────────────────────────────┴──────────────────── dims ┐
↓ Ti Sampled{Int64} [1, 2, …, 11, 12] ForwardOrdered Irregular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>month
├────────────────────────────────────────────────────────────────── group dims ┤
↓ X, → Ti
└──────────────────────────────────────────────────────────────────────────────┘
1 101×1488 DimArray
2 101×1368 DimArray
3 101×1488 DimArray
⋮
11 101×1440 DimArray
12 101×1464 DimArray
We can take the mean of each group by broadcasting over them:
julia> mean.(groups)
┌ 12-element DimArray{Float64, 1} ┐
├─────────────────────────────────┴────────────────────────────────────── dims ┐
↓ Ti Sampled{Int64} [1, 2, …, 11, 12] ForwardOrdered Irregular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>month
└──────────────────────────────────────────────────────────────────────────────┘
1 0.49998
2 0.499823
3 0.499881
4 0.500808
⋮
10 0.499447
11 0.500349
12 0.499943
Binning
Sometimes we want to further aggregate our groups after running a function, or just bin the raw data directly. We can use the Bins
wrapper to do this.
For quick analysis, we can break our groups into N
bins.
julia> groupby(A, Ti=>Bins(month, 4))
┌ 4-element DimGroupByArray{DimArray{Float64,1},1} ┐
├──────────────────────────────────────────────────┴───────────────────── dims ┐
↓ Ti Sampled{IntervalSets.Interval{:closed, :open, Float64}} [1.0 .. 3.75275 (closed-open), 3.75275 .. 6.5055 (closed-open), 6.5055 .. 9.25825 (closed-open), 9.25825 .. 12.011 (closed-open)] ForwardOrdered Irregular Intervals{Start}
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>Bins(month, 4)…
├────────────────────────────────────────────────────────────────── group dims ┤
↓ X, → Ti
└──────────────────────────────────────────────────────────────────────────────┘
1.0 .. 3.75275 (closed-open) 101×4344 DimArray
3.75275 .. 6.5055 (closed-open) 101×4368 DimArray
6.5055 .. 9.25825 (closed-open) 101×4416 DimArray
9.25825 .. 12.011 (closed-open) 101×4392 DimArray
Doing this requires slightly padding the bin edges, so the lookup of the output is less than ideal.
Select by Dimension
We can also select by Dimension
s and any objects with dims
methods.
Trivially, grouping by an object's own dimension is similar to eachslice
:
julia> groupby(A, dims(A, Ti))
┌ 17520-element DimGroupByArray{DimArray{Float64,1},1} ┐
├──────────────────────────────────────────────────────┴───────────────── dims ┐
↓ Ti Sampled{DateTime} DateTime("2000-01-01T00:00:00"):Hour(1):DateTime("2001-12-30T23:00:00") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>[DateTime("2000-01-01T00:00:00"), DateTime("2000-01-01T01:00…
├────────────────────────────────────────────────────────────────── group dims ┤
↓ X, → Ti
└──────────────────────────────────────────────────────────────────────────────┘
2000-01-01T00:00:00 101×1 DimArray
2000-01-01T01:00:00 101×1 DimArray
2000-01-01T02:00:00 101×1 DimArray
⋮
2001-12-30T22:00:00 101×1 DimArray
2001-12-30T23:00:00 101×1 DimArray
TODO: Apply custom function (i.e. normalization) to grouped output.