tensorfx icon indicating copy to clipboard operation
tensorfx copied to clipboard

split metadata into multiple files

Open brandondutra opened this issue 8 years ago • 2 comments

Having one file with all the vocabs can be a problem for large examples. I think this was a performance problem with a criteo sample.

It would be nice to have vocab files for each column. So if a "string to int" transforms is needed only for a few categorical columns, the vocab for every column does not need to be loaded.

brandondutra avatar Mar 12 '17 20:03 brandondutra

Yes, agree.

I haven't fully grokked how vocab files work end-to-end ... wrt to setting up a hashtable from a file, so it works at training and prediction time, and how vocabs should be saved within a saved model. Perhaps this can be researched a bit unless you already know...

nikhilk avatar Mar 12 '17 21:03 nikhilk

The structure data package reads the vocab file, and embeds it in the graph with index_table_from_tensor (but I think index_to_string_table_from_file would work fine). The vocab file then does not need to be saved with the exported graph.

brandondutra avatar Mar 12 '17 23:03 brandondutra