client_python icon indicating copy to clipboard operation
client_python copied to clipboard

Support unsigned int types in features

Open buhrmann opened this issue 2 years ago • 2 comments

Hi, I'm trying to programmatically upload my first training dataset, but even though the docs say you support missing data in features, your validation in fact prevents missing data in integer columns: https://github.com/Arize-ai/client_python/blob/6f678863c8fe0c15132c7d0651776c669b1349e1/arize/pandas/validation/validator.py#L1027

The list of allowed arrow types only contains non-nullable integer types. Is this an oversight or because you don't really support missing data in features?

Also, and perhaps alternatively, since you support manual upload of parquet and arrow files, do you plan to also support these via the Python SDK? My data is in Arrow to begin with, and so that would save me some manual work of converting to pandas, especially since it'll get converted back to Arrow anyway.

buhrmann avatar Jan 05 '24 11:01 buhrmann

Sorry for the confusion, the problem is not in fact with nullable ints, but unsigned integers hahaha

buhrmann avatar Jan 05 '24 11:01 buhrmann

Hi @buhrmann, good catch we did not support unsigned integers. Definitely, we can work on that and include it in a future release soon. As for the support for arrow files via our Python SDK, this is not in our roadmap. Our intention with our Python SDK is to support record-at-a-time ingestion and batch ingestion via Pandas.

If you are interested in ingesting your arrow files directly (as well as avro, parquet, etc), I recommend visiting our docs about our fileimporter tool. You can ingest files directly via Drag & Drop, integrations with cloud storage, as well as table integrations. See the section Sending Data Methods

fjcasti1 avatar Jan 11 '24 05:01 fjcasti1