cap2vid
cap2vid copied to clipboard
How are words in the captions represented in the BiLSTM model?
They are mapped to unique integers while saving the captions in h5py file. However, before feeding them into the bidirectional are they represented as an embedding/fixed size vector? or the y_encode function in this comment does it?