Request - wrapdims - fill missing cells with missing
Currently wrapdims fills missing cells with some very large number. Example below.
Could these be replaced by 'missing' or some specified value.
df = DataFrame( x=[1,1,2], y=[1,2,1], data=[10,11,12])
wrapdims(df, :data, :x, :y )
ka(2,2) #=2800920006512
Use Case
I would then use coalesce to fill missings with some value (unique to each cell). Then convert back to a DataFrame.
This could all be done with DataFrame functions (groupby etc) but a KeyedArray is more elegant.
I think you want this:
julia> wrapdims(df, :data, :x, :y, default=missing)
2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
โ x โ 2-element Vector{Int64}
โ y โ 2-element Vector{Int64}
And data, 2ร2 Matrix{Union{Missing, Int64}}:
(1) (2)
(1) 10 11
(2) 12 missing
I don't recall why this isn't the default. Or if the default wants to avoid Union{Missing, Int}, it doesn't throw an error if some entries are undefined.
Thanks:)
Where you have docstrings on the website. Most other packages have docs.
I didn't realise there was documentation for the package. (Which contains the above).
Maybe change it to docs ?
At present there is no website, but docstrings are accessible at the REPL prompt by pressing ?:
help?> wrapdims(df, :data, :x, :y )
wrapdims(table, value, names...; default=undef, sort=false, force=false)
Construct KeyedArray(NamedDimsArray(A,names),keys) from a table matching the Tables.jl
(https://github.com/JuliaData/Tables.jl) API. (It must support both Tables.columns and
Tables.rows.)
The contents of the array is taken from the column value::Symbol of the table. Each symbol in
names specifies a column whose unique entries become the keys along a dimenension of the array.
If there is no row in the table matching a possible set of keys, then this element of the array
is undefined, unless you provide the default keyword. If several rows share the same set of
keys, then by default an ArgumentError is thrown. Keyword force=true will instead cause these
non-unique entries to be overwritten.
See also populate! to fill an existing array in the same manner.
Setting AxisKeys.nameouter() = false will reverse the order of wrappers produced.
Examples
โกโกโกโกโกโกโกโก
julia> using DataFrames, AxisKeys
julia> df = DataFrame("a" => 1:3, "b" => 10:12.0, "c" => ["cat", "dog", "cat"])
3ร3 DataFrame
Row โ a b c
โ Int64 Float64 String
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ 1 10.0 cat
2 โ 2 11.0 dog
3 โ 3 12.0 cat
julia> wrapdims(df, :a, :b, :c; default=missing)
2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
โ b โ 3-element Vector{Float64}
โ c โ 2-element Vector{String}
And data, 3ร2 Matrix{Union{Missing, Int64}}:
("cat") ("dog")
(10.0) 1 missing
(11.0) missing 2
(12.0) 3 missing
julia> wrapdims(df, :a, :b)
1-dimensional NamedDimsArray(KeyedArray(...)) with keys:
โ b โ 3-element Vector{Float64}
And data, 3-element Vector{Union{Missing, Int64}}:
(10.0) 1
(11.0) 2
(12.0) 3
julia> wrapdims(df, :a, :c)
ERROR: ArgumentError: Key ("cat",) is not unique
julia> wrapdims(df, :a, :c, force=true)
1-dimensional NamedDimsArray(KeyedArray(...)) with keys:
โ c โ 2-element Vector{String}
And data, 2-element Vector{Int64}:
("cat") 3
("dog") 2
julia> wrapdims(df, :data, :x, :y, default=missing)
2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
โ x โ 2-element Vector{Int64}
โ y โ 2-element Vector{Int64}
And data, 2ร2 Matrix{Union{Missing, Int64}}:
(1) (2)
(1) 10 11
(2) 12 missing
Would it be possible to add functionality like @rtransform from DataFramesMeta.
Such that you specify a dimension then add new elements (like adding columns to a dataframe).
Below is a model I coded using a KeyedArray, a DataFrame, and finally pseudo code using this proposed functionality
Using KeyedArray
X = wrapdims( data, :rate, :date, :pair, :periodName, :delta )
X = rekey(X, :date => [h1,h2,base] )
y = copy(X)
y = rekey(y, :delta => [10,25,50,75,90])
y(10) .= X(ATM) + X(BF10) + .5X(RR10)
y(25) .= X(ATM) + X(BF25) + .5X(RR25)
y(50) .= X(ATM)
y(75) .= X(ATM) + X(BF25) - .5X(RR25)
y(90) .= X(ATM) + X(BF10) - .5X(RR10)
y(base) .*= y(h2)./ y(h1)
y = y(base)
X = copy(y)
X = rekey(X, :delta => [ATM,BF10,BF25,RR10,RR25] )
X(ATM) .= y(50)
X(BF10) .= -y(50) + .5( y(10) + y(90))
X(BF25) .= -y(50) + .5( y(25) + y(75))
X(RR10) .= y(10) - y(90)
X(RR25) .= y(25) - y(75)
Using DataFrames
@chain data begin
unstack( [:date,:pair,:periodName], :delta, :rate )
@rtransform begin
:ฮ10 = :ATM + :BF10 + .5 * :RR10
:ฮ25 = :ATM + :BF25 + .5 * :RR25
:ฮ50 = :ATM
:ฮ75 = :ATM + :BF25 - .5 * :RR25
:ฮ90 = :ATM + :BF10 - .5 * :RR10
end
@select :pair :date :periodName :ฮ10 :ฮ25 :ฮ50 :ฮ75 :ฮ90
stack( Not([:pair,:date,:periodName]), value_name = :vol, variable_name=:delta )
unstack( [:pair,:delta,:periodName], :hist_point, :vol )
@rtransform :shocked = :Base * :H2 / :H1
@select :pair :delta :periodName :shocked
unstack( [:pair,:periodName], :delta, :shocked)
@rtransform begin
:ATM = :ฮ50
:BF10 = -:ฮ50 + .5( :ฮ10 + :ฮ90 )
:BF25 = -:ฮ50 + .5( :ฮ25 + :ฮ75 )
:RR10 = :ฮ10 - :ฮ90
:RR25 = :ฮ25 - :ฮ75
end
Using Proposed @rtransform on KeyedArray
@chain KA begin
@rtransform (dimension= delta)
:ฮ10 = :ATM + :BF10 + .5 * :RR10
:ฮ25 = :ATM + :BF25 + .5 * :RR25
:ฮ50 = :ATM
:ฮ75 = :ATM + :BF25 - .5 * :RR25
:ฮ90 = :ATM + :BF10 - .5 * :RR10
end
@rtransform (dimension = history )
:shocked = :Base * :H2 / :H1
end
@rtransform (dimension = delta )
:ATM = :ฮ50
:BF10 = -:ฮ50 + .5( :ฮ10 + :ฮ90 )
:BF25 = -:ฮ50 + .5( :ฮ25 + :ฮ75 )
:RR10 = :ฮ10 - :ฮ90
:RR25 = :ฮ25 - :ฮ75
end
end