AxisKeys.jl icon indicating copy to clipboard operation
AxisKeys.jl copied to clipboard

Request - wrapdims - fill missing cells with missing

Open Lincoln-Hannah opened this issue 1 year ago โ€ข 4 comments

Currently wrapdims fills missing cells with some very large number. Example below. Could these be replaced by 'missing' or some specified value.

df = DataFrame( x=[1,1,2], y=[1,2,1], data=[10,11,12])

wrapdims(df, :data, :x, :y )

ka(2,2)   #=2800920006512

Use Case I would then use coalesce to fill missings with some value (unique to each cell). Then convert back to a DataFrame. This could all be done with DataFrame functions (groupby etc) but a KeyedArray is more elegant.

Lincoln-Hannah avatar Mar 08 '24 10:03 Lincoln-Hannah

I think you want this:

julia> wrapdims(df, :data, :x, :y, default=missing)
2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
โ†“   x โˆˆ 2-element Vector{Int64}
โ†’   y โˆˆ 2-element Vector{Int64}
And data, 2ร—2 Matrix{Union{Missing, Int64}}:
      (1)  (2)
 (1)   10   11
 (2)   12     missing

I don't recall why this isn't the default. Or if the default wants to avoid Union{Missing, Int}, it doesn't throw an error if some entries are undefined.

mcabbott avatar Mar 08 '24 16:03 mcabbott

Thanks:) Where you have docstrings on the website. Most other packages have docs.

I didn't realise there was documentation for the package. (Which contains the above). Maybe change it to docs ?

lincolnhannah avatar Mar 09 '24 10:03 lincolnhannah

At present there is no website, but docstrings are accessible at the REPL prompt by pressing ?:

help?> wrapdims(df, :data, :x, :y )
  wrapdims(table, value, names...; default=undef, sort=false, force=false)

  Construct KeyedArray(NamedDimsArray(A,names),keys) from a table matching the Tables.jl
  (https://github.com/JuliaData/Tables.jl) API. (It must support both Tables.columns and
  Tables.rows.)

  The contents of the array is taken from the column value::Symbol of the table. Each symbol in
  names specifies a column whose unique entries become the keys along a dimenension of the array.

  If there is no row in the table matching a possible set of keys, then this element of the array
  is undefined, unless you provide the default keyword. If several rows share the same set of
  keys, then by default an ArgumentError is thrown. Keyword force=true will instead cause these
  non-unique entries to be overwritten.

  See also populate! to fill an existing array in the same manner.

  Setting AxisKeys.nameouter() = false will reverse the order of wrappers produced.

  Examples
  โ‰กโ‰กโ‰กโ‰กโ‰กโ‰กโ‰กโ‰ก

  julia> using DataFrames, AxisKeys
  
  julia> df = DataFrame("a" => 1:3, "b" => 10:12.0, "c" => ["cat", "dog", "cat"])
  3ร—3 DataFrame
   Row โ”‚ a      b        c      
       โ”‚ Int64  Float64  String 
  โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
     1 โ”‚     1     10.0  cat
     2 โ”‚     2     11.0  dog
     3 โ”‚     3     12.0  cat
  
  julia> wrapdims(df, :a, :b, :c; default=missing)
  2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
  โ†“   b โˆˆ 3-element Vector{Float64}
  โ†’   c โˆˆ 2-element Vector{String}
  And data, 3ร—2 Matrix{Union{Missing, Int64}}:
           ("cat")    ("dog")
   (10.0)   1           missing
   (11.0)    missing   2
   (12.0)   3           missing
  
  julia> wrapdims(df, :a, :b)
  1-dimensional NamedDimsArray(KeyedArray(...)) with keys:
  โ†“   b โˆˆ 3-element Vector{Float64}
  And data, 3-element Vector{Union{Missing, Int64}}:
   (10.0)  1
   (11.0)  2
   (12.0)  3
  
  julia> wrapdims(df, :a, :c)
  ERROR: ArgumentError: Key ("cat",) is not unique
  
  julia> wrapdims(df, :a, :c, force=true)
  1-dimensional NamedDimsArray(KeyedArray(...)) with keys:
  โ†“   c โˆˆ 2-element Vector{String}
  And data, 2-element Vector{Int64}:
   ("cat")  3
   ("dog")  2

julia> wrapdims(df, :data, :x, :y, default=missing)
2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
โ†“   x โˆˆ 2-element Vector{Int64}
โ†’   y โˆˆ 2-element Vector{Int64}
And data, 2ร—2 Matrix{Union{Missing, Int64}}:
      (1)  (2)
 (1)   10   11
 (2)   12     missing

mcabbott avatar Mar 09 '24 15:03 mcabbott

Would it be possible to add functionality like @rtransform from DataFramesMeta. Such that you specify a dimension then add new elements (like adding columns to a dataframe). Below is a model I coded using a KeyedArray, a DataFrame, and finally pseudo code using this proposed functionality

Using KeyedArray

X = wrapdims( data, :rate, :date, :pair, :periodName, :delta )
X = rekey(X, :date => [h1,h2,base] )
y = copy(X)
y = rekey(y, :delta => [10,25,50,75,90])

y(10) .= X(ATM)  +  X(BF10)  + .5X(RR10)
y(25) .= X(ATM)  +  X(BF25)  + .5X(RR25)
y(50) .= X(ATM) 
y(75) .= X(ATM)  +  X(BF25)  - .5X(RR25)
y(90) .= X(ATM)  +  X(BF10)  - .5X(RR10)

y(base) .*= y(h2)./ y(h1)

y = y(base)
X = copy(y)
X = rekey(X, :delta => [ATM,BF10,BF25,RR10,RR25] )

X(ATM)  .=  y(50)    
X(BF10) .= -y(50)  + .5( y(10) + y(90))
X(BF25) .= -y(50)  + .5( y(25) + y(75))
X(RR10) .=  y(10)  -     y(90)
X(RR25) .=  y(25)  -     y(75) 

Using DataFrames

@chain data begin

    unstack( [:date,:pair,:periodName], :delta, :rate )

    @rtransform begin
        :ฮ”10 = :ATM + :BF10 + .5 * :RR10
        :ฮ”25 = :ATM + :BF25 + .5 * :RR25
        :ฮ”50 = :ATM 
        :ฮ”75 = :ATM + :BF25 - .5 * :RR25
        :ฮ”90 = :ATM + :BF10 - .5 * :RR10
    end

    @select :pair :date :periodName :ฮ”10 :ฮ”25 :ฮ”50 :ฮ”75 :ฮ”90
    stack( Not([:pair,:date,:periodName]), value_name = :vol, variable_name=:delta )
                                
    unstack( [:pair,:delta,:periodName], :hist_point, :vol )
    @rtransform :shocked = :Base * :H2 / :H1
    @select :pair :delta :periodName :shocked    
    unstack( [:pair,:periodName], :delta, :shocked)
    
    @rtransform begin
        :ATM    = :ฮ”50 
        :BF10   = -:ฮ”50 + .5( :ฮ”10 + :ฮ”90 )
        :BF25   = -:ฮ”50 + .5( :ฮ”25 + :ฮ”75 )
        :RR10   = :ฮ”10 - :ฮ”90 
        :RR25   = :ฮ”25 - :ฮ”75
    end

Using Proposed @rtransform on KeyedArray

@chain KA begin

    @rtransform (dimension= delta) 
        :ฮ”10 = :ATM + :BF10 + .5 * :RR10
        :ฮ”25 = :ATM + :BF25 + .5 * :RR25
        :ฮ”50 = :ATM 
        :ฮ”75 = :ATM + :BF25 - .5 * :RR25
        :ฮ”90 = :ATM + :BF10 - .5 * :RR10
    end

    @rtransform (dimension = history )
        :shocked = :Base * :H2 / :H1
    end

    @rtransform (dimension = delta )    
        :ATM    = :ฮ”50 
        :BF10   = -:ฮ”50 + .5( :ฮ”10 + :ฮ”90 )
        :BF25   = -:ฮ”50 + .5( :ฮ”25 + :ฮ”75 )
        :RR10   = :ฮ”10 - :ฮ”90 
        :RR25   = :ฮ”25 - :ฮ”75
    end

end

Lincoln-Hannah avatar Mar 15 '24 09:03 Lincoln-Hannah