Field name silently cropped at 16 chars
Hi,
Consider the following example:
from ffcv.writer import DatasetWriter
from ffcv.fields import NDArrayField, FloatField, StringField, IntField
class Dummy:
def __len__(self):
return 10
def __getitem__(self, id):
return (42,)
ds = Dummy()
writer = DatasetWriter("/tmp/test.beton", {
'this_is_a_very_long_field': IntField(),
}, num_workers=2)
writer.from_indexed_dataset(ds)
loader = Loader('/tmp/test.beton', batch_size=2, num_workers=2, order=OrderOption.SEQUENTIAL)
print(list(loader.pipelines.keys())[0]) # prints 'this_is_a_very_l'
It's unclear to me if the 16 chars limit is intended (I haven't found any mention of that in the doc, but I may have missed it). If it's not, well it's a bug, and if it is, perhaps it would be appropriate to add an assert in the writer?
As it stands it's very confusing because no warning is issued, and I spent quite some time debugging an issue where my loading pipeline was not applied. I had something like pipelines = {"this_is_a_very_long_field": my_pipeline} and nothing was happening. I think this is the result of two fixable issues:
- As mentioned above, since the field name was actually shortened to "this_is_a_very_l" the pipeline didn't work. Throwing an error when trying to create a field name too long (or supporting it) fixes this issue.
- No warning was issued for creating a pipeline for a non-existent field. Adding a check to verify if each field used exists and issuing an error/warning otherwise would fix this issue.
Best
Hello,
This is indeed the case. FFCV file formats has a limit on the length of the field. We could definitely warn users when they use a name that is too long. Are you interested in submitting a pull request ?