AxisKeys.jl Request - wrapdims - fill missing cells with missing

Currently wrapdims fills missing cells with some very large number. Example below. Could these be replaced by 'missing' or some specified value.

df = DataFrame( x=[1,1,2], y=[1,2,1], data=[10,11,12])

wrapdims(df, :data, :x, :y )

ka(2,2)   #=2800920006512

Use Case I would then use coalesce to fill missings with some value (unique to each cell). Then convert back to a DataFrame. This could all be done with DataFrame functions (groupby etc) but a KeyedArray is more elegant.

Mar 08 '24 10:03 Lincoln-Hannah

I think you want this:

julia> wrapdims(df, :data, :x, :y, default=missing)
2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
↓   x ∈ 2-element Vector{Int64}
→   y ∈ 2-element Vector{Int64}
And data, 2×2 Matrix{Union{Missing, Int64}}:
      (1)  (2)
 (1)   10   11
 (2)   12     missing

I don't recall why this isn't the default. Or if the default wants to avoid Union{Missing, Int}, it doesn't throw an error if some entries are undefined.

Mar 08 '24 16:03 mcabbott

Thanks:) Where you have docstrings on the website. Most other packages have docs.

I didn't realise there was documentation for the package. (Which contains the above). Maybe change it to docs ?

Mar 09 '24 10:03 lincolnhannah

At present there is no website, but docstrings are accessible at the REPL prompt by pressing ?:

help?> wrapdims(df, :data, :x, :y )
  wrapdims(table, value, names...; default=undef, sort=false, force=false)

  Construct KeyedArray(NamedDimsArray(A,names),keys) from a table matching the Tables.jl
  (https://github.com/JuliaData/Tables.jl) API. (It must support both Tables.columns and
  Tables.rows.)

  The contents of the array is taken from the column value::Symbol of the table. Each symbol in
  names specifies a column whose unique entries become the keys along a dimenension of the array.

  If there is no row in the table matching a possible set of keys, then this element of the array
  is undefined, unless you provide the default keyword. If several rows share the same set of
  keys, then by default an ArgumentError is thrown. Keyword force=true will instead cause these
  non-unique entries to be overwritten.

  See also populate! to fill an existing array in the same manner.

  Setting AxisKeys.nameouter() = false will reverse the order of wrappers produced.

  Examples
  ≡≡≡≡≡≡≡≡

  julia> using DataFrames, AxisKeys
  
  julia> df = DataFrame("a" => 1:3, "b" => 10:12.0, "c" => ["cat", "dog", "cat"])
  3×3 DataFrame
   Row │ a      b        c      
       │ Int64  Float64  String 
  ─────┼────────────────────────
     1 │     1     10.0  cat
     2 │     2     11.0  dog
     3 │     3     12.0  cat
  
  julia> wrapdims(df, :a, :b, :c; default=missing)
  2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
  ↓   b ∈ 3-element Vector{Float64}
  →   c ∈ 2-element Vector{String}
  And data, 3×2 Matrix{Union{Missing, Int64}}:
           ("cat")    ("dog")
   (10.0)   1           missing
   (11.0)    missing   2
   (12.0)   3           missing
  
  julia> wrapdims(df, :a, :b)
  1-dimensional NamedDimsArray(KeyedArray(...)) with keys:
  ↓   b ∈ 3-element Vector{Float64}
  And data, 3-element Vector{Union{Missing, Int64}}:
   (10.0)  1
   (11.0)  2
   (12.0)  3
  
  julia> wrapdims(df, :a, :c)
  ERROR: ArgumentError: Key ("cat",) is not unique
  
  julia> wrapdims(df, :a, :c, force=true)
  1-dimensional NamedDimsArray(KeyedArray(...)) with keys:
  ↓   c ∈ 2-element Vector{String}
  And data, 2-element Vector{Int64}:
   ("cat")  3
   ("dog")  2

julia> wrapdims(df, :data, :x, :y, default=missing)
2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
↓   x ∈ 2-element Vector{Int64}
→   y ∈ 2-element Vector{Int64}
And data, 2×2 Matrix{Union{Missing, Int64}}:
      (1)  (2)
 (1)   10   11
 (2)   12     missing

Mar 09 '24 15:03 mcabbott

Would it be possible to add functionality like @rtransform from DataFramesMeta. Such that you specify a dimension then add new elements (like adding columns to a dataframe). Below is a model I coded using a KeyedArray, a DataFrame, and finally pseudo code using this proposed functionality

Using KeyedArray

X = wrapdims( data, :rate, :date, :pair, :periodName, :delta )
X = rekey(X, :date => [h1,h2,base] )
y = copy(X)
y = rekey(y, :delta => [10,25,50,75,90])

y(10) .= X(ATM)  +  X(BF10)  + .5X(RR10)
y(25) .= X(ATM)  +  X(BF25)  + .5X(RR25)
y(50) .= X(ATM) 
y(75) .= X(ATM)  +  X(BF25)  - .5X(RR25)
y(90) .= X(ATM)  +  X(BF10)  - .5X(RR10)

y(base) .*= y(h2)./ y(h1)

y = y(base)
X = copy(y)
X = rekey(X, :delta => [ATM,BF10,BF25,RR10,RR25] )

X(ATM)  .=  y(50)    
X(BF10) .= -y(50)  + .5( y(10) + y(90))
X(BF25) .= -y(50)  + .5( y(25) + y(75))
X(RR10) .=  y(10)  -     y(90)
X(RR25) .=  y(25)  -     y(75)

Using DataFrames

@chain data begin

    unstack( [:date,:pair,:periodName], :delta, :rate )

    @rtransform begin
        :Δ10 = :ATM + :BF10 + .5 * :RR10
        :Δ25 = :ATM + :BF25 + .5 * :RR25
        :Δ50 = :ATM 
        :Δ75 = :ATM + :BF25 - .5 * :RR25
        :Δ90 = :ATM + :BF10 - .5 * :RR10
    end

    @select :pair :date :periodName :Δ10 :Δ25 :Δ50 :Δ75 :Δ90
    stack( Not([:pair,:date,:periodName]), value_name = :vol, variable_name=:delta )
                                
    unstack( [:pair,:delta,:periodName], :hist_point, :vol )
    @rtransform :shocked = :Base * :H2 / :H1
    @select :pair :delta :periodName :shocked    
    unstack( [:pair,:periodName], :delta, :shocked)
    
    @rtransform begin
        :ATM    = :Δ50 
        :BF10   = -:Δ50 + .5( :Δ10 + :Δ90 )
        :BF25   = -:Δ50 + .5( :Δ25 + :Δ75 )
        :RR10   = :Δ10 - :Δ90 
        :RR25   = :Δ25 - :Δ75
    end

Using Proposed @rtransform on KeyedArray

@chain KA begin

    @rtransform (dimension= delta) 
        :Δ10 = :ATM + :BF10 + .5 * :RR10
        :Δ25 = :ATM + :BF25 + .5 * :RR25
        :Δ50 = :ATM 
        :Δ75 = :ATM + :BF25 - .5 * :RR25
        :Δ90 = :ATM + :BF10 - .5 * :RR10
    end

    @rtransform (dimension = history )
        :shocked = :Base * :H2 / :H1
    end

    @rtransform (dimension = delta )    
        :ATM    = :Δ50 
        :BF10   = -:Δ50 + .5( :Δ10 + :Δ90 )
        :BF25   = -:Δ50 + .5( :Δ25 + :Δ75 )
        :RR10   = :Δ10 - :Δ90 
        :RR25   = :Δ25 - :Δ75
    end

end

Mar 15 '24 09:03 Lincoln-Hannah