Add abstractions for parsing TFRecord Files using `tf.Example` and `tf.io` ops
System information
- TensorFlow version (you are using): Latest master of TensorFlow Java
- Are you willing to contribute it (Yes/No): No (working on other things at the moment)
Describe the feature and the current behavior/state.
Currently in Java, we have access to the core tf.io ops such as tf.parseExample, tf.parseSingleExample, tf.decodeRaw etc. In order to serialize TF Record datasets and read in datasets from the tensorflow_datasets buckets, for example, we need to be easily able to use these ops.
In Python, the relevant abstractions built on top of tf.io are defined in parsing_config.py. Specifically it will be very helpful to have abstractions such as:
- Various feature types:
FixedLenFeature,SparseFeature,FixedLenSequenceFeature, etc... - The
_ParseOpParamsclass which wraps the parameters totf.parseExample - Standardizing a flow for defining features in a TFRecord file.
See these examples which relate to using the parse-example ops, and reading TFRecord files
Will this change the current api? How?
This will add APIs for serializing / parsing examples to / from TF Record files
Who will benefit with this feature?
Anyone using datasets stored as TFRecord flies from TensorFlow java (for example, to load datasets from the tensorflow_datasets GCP bucket)
Any Other info.
Feel free to get in touch with me anytime to discuss! Happy to help.