Query.jl
Query.jl copied to clipboard
@mutate changes other column type
Repost from https://discourse.julialang.org/t/query-mutate-changes-other-column-type/39292 as it looks like a bug.
This is my smallest code with repro:
contents = """
"5674012","aa66aa66"
"5674012","b036aa66,b036aa67,b036aa68"
""";
batches = CSV.File(IOBuffer(contents); header = ["X1", "Splits"], delim = ',') |> DataFrame;
emptyStringArray = Array{SubString{String},1}()
batchesAny = batches |>
@mutate(Splits = length(_.Splits) > 0 ? split(_.Splits, ',') : emptyStringArray) |> DataFrame
batchesString = batches |>
@mutate(Splits = length(_.Splits) > 0 ? split(_.Splits, ',') : Array{SubString{String},1}()) |> DataFrame
The output is like this:
julia> batchesAny
2×2 DataFrame
│ Row │ X1 │ Splits │
│ │ Any │ Any │
├─────┼─────────┼──────────────────────────────────────┤
│ 1 │ 5674012 │ ["aa66aa66"] │
│ 2 │ 5674012 │ ["b036aa66", "b036aa67", "b036aa68"] │
julia> batchesString
2×2 DataFrame
│ Row │ X1 │ Splits │
│ │ Int64 │ Array{SubString{String},1} │
├─────┼─────────┼──────────────────────────────────────┤
│ 1 │ 5674012 │ ["aa66aa66"] │
│ 2 │ 5674012 │ ["b036aa66", "b036aa67", "b036aa68"] │
What I don’t understand:
-
:Splitscolumn type differs - Any vs. Array{SubString{String},1}. (I just wanted to save memory so I stored the value (that can be repeated) toemptyStringArray.) -
Even if I understand that I made something bad to column
:Splits, I thinkX1's type shouldn't be changed toAnyinbatchesAny.