java icon indicating copy to clipboard operation
java copied to clipboard

Add abstractions for parsing TFRecord Files using `tf.Example` and `tf.io` ops

Open dhruvrajan opened this issue 5 years ago • 4 comments

System information

  • TensorFlow version (you are using): Latest master of TensorFlow Java
  • Are you willing to contribute it (Yes/No): No (working on other things at the moment)

Describe the feature and the current behavior/state. Currently in Java, we have access to the core tf.io ops such as tf.parseExample, tf.parseSingleExample, tf.decodeRaw etc. In order to serialize TF Record datasets and read in datasets from the tensorflow_datasets buckets, for example, we need to be easily able to use these ops.

In Python, the relevant abstractions built on top of tf.io are defined in parsing_config.py. Specifically it will be very helpful to have abstractions such as:

  • Various feature types: FixedLenFeature, SparseFeature, FixedLenSequenceFeature, etc...
  • The _ParseOpParams class which wraps the parameters to tf.parseExample
  • Standardizing a flow for defining features in a TFRecord file.

See these examples which relate to using the parse-example ops, and reading TFRecord files

Will this change the current api? How?

This will add APIs for serializing / parsing examples to / from TF Record files

Who will benefit with this feature?

Anyone using datasets stored as TFRecord flies from TensorFlow java (for example, to load datasets from the tensorflow_datasets GCP bucket)

Any Other info.

Feel free to get in touch with me anytime to discuss! Happy to help.

dhruvrajan avatar May 18 '20 03:05 dhruvrajan