Pierre Ruyssen

Results 21 comments of Pierre Ruyssen

In with name "doc_id": Dtype int32 do not match

aman2930 is looking into this.

tfds-nightly and tensorflow-datasets 4.8.2 have been released with the fixes for issue mentioned in 1.

We are definitely going to need to differentiate between int8, int16, uint8... and xsd has short, long, unsignedLong, etc. So in that regard xsd seems useful indeed. Looking at numpy...

Segmentation mask: https://github.com/mlcommons/croissant/blob/main/docs/croissant-spec.md#segmentationmask The idea is to be able to specify the segmentation mask either as a sequence of coordinates (polygon) or has image overlays. Bounding box: https://github.com/mlcommons/croissant/blob/main/docs/croissant-spec.md#boundingbox The idea...

If we do implement this as a transform, what datatype would you use? repeated Number? One would still need to look at the transform to understand the kind of data...

After a few offline conversations, the consensus seems to be to do the following: - deprecate `repeated` (boolean, [spec](https://github.com/mlcommons/croissant/blob/main/docs/croissant-spec.md#field), [example](https://github.com/mlcommons/croissant/blob/068fcccf4c2c54f047c59e1b25cc78845ea64c60/datasets/1.0/huggingface-tgqa/metadata.json#L218)) is favor of `isArray` (boolean). - introduce `arrayShape` (list of...

Hi, thank you for reporting! This is definitely a bug. Workaround: add the following arg to your `tfds.load` call: ```py tfds.load(..., download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.ARRAY_RECORD}) ``` We'll look on how to update...