Multi-Config Dataset Loading / Wildcard Config Identifiers
Some datasets, e.g. mnist_corrupted provide various configurations which are in many use-cases all used at the same time (without distinction). Afaik, tfds currently requires to load every dataset independenty and then concatenate them, e.g.
c1_ds = tfds.load("mnist_corrupted/shot_noise")
c2_ds = tfds.load("mnist_corrupted/impulse_noise")
c3_ds = tfds.load("mnist_corrupted/glass_blur")
# ... many, many, many rows :-)
cX_ds = tfds.load("mnist_corrupted/...")
dataset_i_want = some_concat_function((c1_ds,c2_ds, c3_ds, ..., cX_ds))
For tfds with a large number of configs, this snippet can become quite long. Also, it makes it hard to use some of the nice features of load over the full dataset (e.g. shuffle_files or split).
Describe the solution you'd like Some way to use wildcards over tfds configs, e.g.,
# Contains the datasets for all configs, nicely shuffled
dataset_i_want = tfds.load("mnist_corrupted/*", shuffle_files=True)
or
# Contains the shot_noise and impulse_noise config datasets
dataset_i_want = tfds.load("mnist_corrupted/*_noise", shuffle_files=True)
Describe alternatives you've considered
Option 1: Manual concatenations, as shown in the example above.
Option 2: Looping over all BuilderConfigs in the DatasetBuilder class (e.g. MNISTCorrupted.BUILDER_CONFIGS), manually implementing wildcard pattern matching on the config names. However, I do not think the BUILDER_CONFIGS field is documented and probably not guaranteed to exist on all DatasetBuilders? Also, shuffling and splitting is still hard.
I also want to know how to do this. I want to be able to use only specific types of corruption, combine only those and then use that combined dataset as normal. I also want to be able to combine it with my own version of corruption if possible.