Metadata for each column
Feature request
Being able to put some metadata for each column as a string or any other type.
Motivation
I will bring the motivation by an example, lets say we are experimenting with embedding produced by some image encoder network, and we want to iterate through a couple of preprocessing and see which one works better in our downstream task, here as workaround right now what I do is the compute the hash of the preprocessing that the images went through as part of the new columns name, it would be nice to attach some kinda meta data in these scenarios to the each columns. metadata
Your contribution
Maybe we could map another relational like database as the metadata?
Hi! Indeed it would be useful to support this. PyArrow natively supports schema-level and column-level metadata, so implementing this should be straightforward. The API I have in mind would work as follows:
col_feature = Value("string", metadata="Some column-level metadata")
features = Features({"col": col_feature}, metadata="Some schema-level metadata")
WDYT?
Sorry for the late reply, Yes, I think this is the most straight-forward approach with the things that we already have.
@mariosasko Let me know how I can help.
Hi, is this feature to be implemented in the near future? It would be really nice if that would be the case!
Hi, I also need this feature for tell my customer if any of the feature is encrypted with a certain key.