InMemoryDatasets.jl icon indicating copy to clipboard operation
InMemoryDatasets.jl copied to clipboard

what is the type of ds.A?

Open sprmnt21 opened this issue 3 years ago • 1 comments

ds = Dataset(A = ["a", "b","a", "b"],B=[1,2,3,4])

julia> ds.A==ds[:,:A]
true

julia> typeof(ds.A)
DatasetColumn{Dataset, Vector{Union{Missing, String}}}

julia> typeof(ds[:,:A])
Vector{Union{Missing, String}} (alias for Array{Union{Missing, String}, 1})

julia> ds.A
4-element Vector{Union{Missing, String}}:
 "a"
 "b"
 "a"
 "b"

I had tried to make the concatenation between what I thought were two vectors [ds.A; ds.A]

ulia> [ds.A ; ds.A]
2-element Vector{DatasetColumn{Dataset, Vector{Union{Missing, String}}}}:
 DatasetColumn{Dataset, Vector{Union{Missing, String}}}(1, 4×3 Dataset
 Row │ A         B         C        
     │ identity  identity  identity
     │ String?   String?   String?
─────┼──────────────────────────────
   1 │ a         no        low
   2 │ b         yes       low
   3 │ a         no        hi
   4 │ b         no        hi, Union{Missing, String}["a", "b", "a", "b"])
 DatasetColumn{Dataset, Vector{Union{Missing, String}}}(1, 4×3 Dataset
 Row │ A         B         C        
     │ identity  identity  identity
     │ String?   String?   String?
─────┼──────────────────────────────
   1 │ a         no        low
   2 │ b         yes       low
   3 │ a         no        hi
   4 │ b         no        hi, Union{Missing, String}["a", "b", "a", "b"])


sprmnt21 avatar May 29 '22 11:05 sprmnt21

It is DatasetColumn, a customised structure which wrap a column of a data set. It is there because we want to track any changes to a data set column. Any change of a value of a column can change the following attributes of a data set:

  • last modified time
  • sorting - grouping
  • format
  • ...

Thus, an abstract vector cannot be used for this purpose, and a customised type is used instead. Generally, we recommend ds[:, :A] for extracting columns and/or provided APIs to manipulate columns.

However, if you think a method must be defined for DatasetColumn, you are welcome to open a PR for it. The right location to add such methods is src/abstractdataset/dscol.jl.

Just a side note: for repeating rows you can use repeat! or repeat, and use append! to append data sets.

sl-solution avatar May 30 '22 05:05 sl-solution