TileDB-Py icon indicating copy to clipboard operation
TileDB-Py copied to clipboard

Update documentation on nullable variable attributes

Open mamscience opened this issue 3 years ago • 3 comments

Hi team,

Could you update the python-section of the documentation? I have no clue how to approach nullable variable attributes without an example . https://docs.tiledb.com/main/how-to/arrays/writing-arrays/nullable-attributes

Many thanks, Michel

mamscience avatar Sep 22 '22 23:09 mamscience

Hi @mamscience, at the moment, we support pandas nullable datatypes transparently in TileDB-Py. We'll get the docs there updated ASAP, but in the meantime please have a look at these tests. We're also planning to add a new API for writing and querying nullable attributes soon (it will accept or return a numpy bool vector). If you have a particular package/API of interest, please let us know, we may not be able to integrate directly in the short term, but we'll keep it in mind to try to maximize interoperability.

ihnorton avatar Sep 23 '22 01:09 ihnorton

Thanks, apparently, changing the numpy type to float and passing "None" also did the trick.

schema = tiledb.ArraySchema(
        domain=dom, sparse=True, attrs=[
            tiledb.Attr(name="a, dtype=np.int32, nullable=True),
            tiledb.Attr(name="b", dtype=np.float32, nullable=True)
             ]
    )

snip

with tiledb.SparseArray(array_name, mode="w") as A:
        I, J = [1, 1, 1], [1, 2, 3]
        a = np.array([60, 65,64])
        b = np.array([120, 122, None])
        A[I, J] = {"val a":a, "val b":b}`

mamscience avatar Sep 23 '22 19:09 mamscience

The underlying issue btw is that I must pass all attributes when writing to array, while (my) sparse arrays are mostly empty. Solving this issue by relaxing the constraints (https://github.com/TileDB-Inc/TileDB/issues/1162#issue-425631917) was proposed before .

But let's say I need to use integer instead of float attribute, how should one approach this? Did you mean that I compile and populate a pandas df and convert it to tiledb array? Or is there another way to create tiledb array with pd datatypes

thanks in advance

mamscience avatar Sep 23 '22 20:09 mamscience