Group By
DimensionalData.jl provides a groupby function for dimensional grouping. This guide covers:
simple grouping with a function
grouping with
Binsgrouping with another existing
AbstractDimArrayorDimension
Grouping functions
Let's look at the kind of functions that can be used to group DateTime. Other types will follow the same principles, but are usually simpler.
First, load some packages:
using DimensionalData
using Dates
using Statistics
const DD = DimensionalDataNow create a demo DateTime range
julia> tempo = range(DateTime(2000), step=Hour(1), length=365*24*2)DateTime("2000-01-01T00:00:00"):Hour(1):DateTime("2001-12-30T23:00:00")Let's see how some common functions work.
The hour function will transform values to the hour of the day - the integers 0:23
julia> hour.(tempo)17520-element Vector{Int64}:
0
1
2
3
4
5
6
7
8
9
⋮
15
16
17
18
19
20
21
22
23Tuple groupings
julia> yearmonth.(tempo)17520-element Vector{Tuple{Int64, Int64}}:
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
(2000, 1)
⋮
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)
(2001, 12)Grouping and reducing
Let's define an array with a time dimension of the times used above:
julia> A = rand(X(1:0.01:2), Ti(tempo))┌ 101×17520 DimArray{Float64, 2} ┐
├────────────────────────────────┴─────────────────────────────────────── dims ┐
↓ X Sampled{Float64} 1.0:0.01:2.0 ForwardOrdered Regular Points,
→ Ti Sampled{DateTime} DateTime("2000-01-01T00:00:00"):Hour(1):DateTime("2001-12-30T23:00:00") ForwardOrdered Regular Points
└──────────────────────────────────────────────────────────────────────────────┘
↓ → 2000-01-01T00:00:00 2000-01-01T01:00:00 … 2001-12-30T23:00:00
1.0 0.89757 0.795755 0.905858
1.01 0.969026 0.785993 0.477727
1.02 0.106472 0.646867 0.807257
1.03 0.283631 0.905428 0.0958593
⋮ ⋱ ⋮
1.97 0.830655 0.673995 0.244589
1.98 0.445628 0.54935 0.00358622
1.99 0.571899 0.310328 … 0.355619
2.0 0.488519 0.359731 0.328946Group by month, using the month function:
julia> groups = groupby(A, Ti=>month)┌ 12-element DimGroupByArray{DimArray{Float64,2},1} ┐
├───────────────────────────────────────────────────┴──────── dims ┐
↓ Ti Sampled{Int64} [1, …, 12] ForwardOrdered Irregular Points
├──────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>month
├───────────────────────────────────────────────────┴── group dims ┐
↓ X, → Ti
└──────────────────────────────────────────────────────────────────┘
1 101×1488 DimArray
2 101×1368 DimArray
⋮
12 101×1464 DimArrayWe can take the mean of each group by broadcasting over them:
julia> mean.(groups)┌ 12-element DimArray{Float64, 1} ┐
├─────────────────────────────────┴────────────────────────── dims ┐
↓ Ti Sampled{Int64} [1, …, 12] ForwardOrdered Irregular Points
├──────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>month
└──────────────────────────────────────────────────────────────────┘
1 0.49998
2 0.499823
3 0.499881
⋮
10 0.499447
11 0.500349
12 0.499943Binning
Sometimes we want to further aggregate our groups after running a function, or just bin the raw data directly. We can use the Bins wrapper to do this.
For quick analysis, we can break our groups into N bins.
julia> groupby(A, Ti=>Bins(month, 4))┌ 4-element DimGroupByArray{DimArray{Float64,2},1} ┐
├──────────────────────────────────────────────────┴───────────────────── dims ┐
↓ Ti Sampled{IntervalSets.Interval{:closed, :open, Float64}} [1.0 .. 3.75275 (closed-open), …, 9.25825 .. 12.011 (closed-open)] ForwardOrdered Irregular Intervals{Start}
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>Bins(month, 4)…
├──────────────────────────────────────────────────┴─────────────── group dims ┐
↓ X, → Ti
└──────────────────────────────────────────────────────────────────────────────┘
1.0 .. 3.75275 (closed-open) 101×4344 DimArray
⋮
9.25825 .. 12.011 (closed-open) 101×4392 DimArrayDoing this requires slightly padding the bin edges, so the lookup of the output is less than ideal.
Select by Dimension
We can also select by Dimensions and any objects with dims methods.
Trivially, grouping by an object's own dimension is similar to eachslice:
julia> groupby(A, dims(A, Ti))┌ 17520-element DimGroupByArray{DimArray{Float64,2},1} ┐
├──────────────────────────────────────────────────────┴───────────────── dims ┐
↓ Ti Sampled{DateTime} DateTime("2000-01-01T00:00:00"):Hour(1):DateTime("2001-12-30T23:00:00") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>[DateTime("2000-01-01T00:00:00"), DateTime("2000-01-01T01:00…
├──────────────────────────────────────────────────────┴─────────── group dims ┐
↓ X, → Ti
└──────────────────────────────────────────────────────────────────────────────┘
2000-01-01T00:00:00 101×1 DimArray
2000-01-01T01:00:00 101×1 DimArray
⋮
2001-12-30T23:00:00 101×1 DimArrayTODO: Apply custom function (i.e. normalization) to grouped output.