datasets icon indicating copy to clipboard operation
datasets copied to clipboard

How to load a key with values of different tyeps in tfrecord?

Open zw615 opened this issue 2 years ago • 1 comments

What I need help with / What I was wondering I have some third-party generated tfrecord files. I just found there is a specific key that has different value types in these tfrecord files, shown as follows.

key: "similarity"
    value {
      float_list {
        value: 0.3015786111354828
      }
    }
key: "similarity"
    value {
        bytes_list {
          value: ""
        }
    }

When I try to decode this key-value pair in tfrecord, I encounter a problem. I cannot find the suitable type for this key similarity. When I use tf.string or tfds.features.Text() in tfds.features.FeaturesDict for decoding, it returns the error

Data types don't match. Data type: float but expected type: string

When I use tf.float64 in tfds.features.FeaturesDict for decoding, it returns the error

Data types don't match. Data type: string but expected type: float

I wonder if there is anything in tfds.features or tf.train.Example that allows me to decode both float and string?

Or if there is something like tfds.decode.SkipDecoding() that allows me read this key similarity and decide how to decode it afterwards? I am aware that tfds.builder().as_dataset() has that option, but I cannot find one in tf.data.TFRecordDataset. I have tried to simply remove the entry correspondind to the key similarity, but the data read from the tfrecord dataset simply drop the entry similarity.

Thanks a lot!

zw615 avatar Feb 14 '23 21:02 zw615

For that you can simply convert the string to float data type by using the Implicit data type conversion in python.

VishuKalier2003 avatar Apr 04 '23 12:04 VishuKalier2003