Support appending arbitrary container
This PR implements append!(::Table, source) for arbitrary source table types:
julia> append!(Table(a=[1], b=[2]), (a=[3], b=[4]))
Table with 2 columns and 2 rows:
a b
┌─────
1 │ 1 2
2 │ 3 4
julia> append!(Table(a=[1], b=[2]), [(a=3, b=4)])
Table with 2 columns and 2 rows:
a b
┌─────
1 │ 1 2
2 │ 3 4
julia> append!(Table(a=[1], b=[2]), DataFrame(a=[3], b=[4]))
Table with 2 columns and 2 rows:
a b
┌─────
1 │ 1 2
2 │ 3 4
julia> append!(Table(a=[1], b=[2]), StructVector((a=[3], b=[4])))
Table with 2 columns and 2 rows:
a b
┌─────
1 │ 1 2
2 │ 3 4
Codecov Report
Merging #57 into master will increase coverage by
0.12%. The diff coverage is72.72%.
@@ Coverage Diff @@
## master #57 +/- ##
==========================================
+ Coverage 67.06% 67.19% +0.12%
==========================================
Files 6 6
Lines 498 509 +11
==========================================
+ Hits 334 342 +8
- Misses 164 167 +3
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/Table.jl | 75.67% <72.72%> (-0.33%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update fc1247a...6c09ca2. Read the comment docs.
That's an interesting point. I think I agree. But a pain point is that there is no easy way to query if iterate(table) is row-wise or not. We can use Tables.rows(table) === table although I'm not sure if the compiler can always const-prop this (also, it'd be bad if Tables.rows is not O(1)). Anyway, maybe something like this?:
isrowiterator(t) =
Tables.istable(t) && Tables.rowaccess(t) && Tables.rows(t) === t
function Base.append!(t::Table, t2)
if isrowiterator(t2) && Tables.columnaccess(t2)
return append_columnaccess!(t, t2)
end
return append_rows!(t, t2)
end
append_row!(t::Table, rows) = # `rows` is a generic iterator
mapfoldl(_asnamedtuple(NamedTuple{columnnames(t)}), push!, rows; init = t)
It's nice that now it naturally supports iterators with unknown eltype.
(BTW, it would be nice to add something like isrowiterator to Tables.jl interface if we do this here.)
With this, I think append!(Table(a=[1], b=[2]), Tables.rows((a=[3], b=[4]))) should still work and it's efficient as it uses append_columnaccess! path.
append!(Table(a=[1], b=[2]), Tables.rows((a=[3], b=[4])))should still work and it's efficient as it usesappend_columnaccess!path
This already works. But we need https://github.com/JuliaData/Tables.jl/pull/126 for the optimization (column-oriented memory access).
I'm not sure I quite understand what append_columnaccess! means in:
if isrowiterator(t2) && Tables.columnaccess(t2)
return append_columnaccess!(t, t2)
end
is it hcat? or still vcat, but column-by-column?
Also note that:
Tables.istable(t) && Tables.rowaccess(t) && Tables.rows(t) === t
doesn't return true for all valid row-iterators. For example, Tables.istable(t) returns false for a Generator of NamedTuples, even though it's a perfectly valid "row table", and works fine when you call Tables.rows on it.
still
vcat, but column-by-column?
Yes, vcat done column-by-column as an optimization.
Also note that:
Tables.istable(t) && Tables.rowaccess(t) && Tables.rows(t) === tdoesn't return true for all valid row-iterators.
As append_columnaccess! is just an optimization, it is OK for isrowiterator to have false negative.
For example,
Tables.istable(t)returns false for a Generator of NamedTuples, even though it's a perfectly valid "row table", and works fine when you callTables.rowson it.
They hit append_rows! branch so it's fine.
Yes, vcat done column-by-column as an optimization.
So why not just always call Tables.columns and append the result columns one by one anyway? Any time you're switching based on rowaccess/columnaccess, that's a bit of a code-smell w/ the Tables.jl API; the point is to use Tables.rows or Tables.columns as is more appropriate for your use-case. For example, that's what DataFrames does.
I thought it's nice to avoid allocating unnecessary objects. If performance is not a concern, I think the cleanest approach here would be to add things row-by-row as there is already push!:
https://github.com/JuliaData/TypedTables.jl/blob/018a2d14d4448f3dd3f3e63321fc3bc09a77ae2b/src/Table.jl#L218
Then I would just check Tables.columnaccess(t) and use the column-based path if so, otherwise fallback to the row-based approach. The default is Tables.columnaccess(x) = false, so it will only be true if a table has explicitly provided support for Tables.columns.
I initially had something like that. But then @andyferris pointed out that it might be better to strictly require that src in append!(::TypedTable.Table, src) to be (semantically) an iterator over rows https://github.com/JuliaData/TypedTables.jl/pull/57#pullrequestreview-332289808. Essentially this boils down to that Tables.istable(x) does not imply anything about how iterate(x) works (or not work). (Just to be clear, I'm not saying this is bad. I think Tables.jl has a slick API that demonstrates how much you can do with a flexible trait-based system in Julia.) By the way, this is why I said Tabels.isrowiterator might be a good addition to Tables.jl interface.