SoundNet-tensorflow
SoundNet-tensorflow copied to clipboard
Extracting features in pool5
I have read in the paper that the best layer for feature extraction is 'pool5'. However, the feature sizes in that layer are h x w x 256. Any idea how that 3D array has to be process for an SVM as it is said in the paper?
Using our pre-trained model, you can extract discriminative features for natural sound recognition. In our experiments, pool5 seems to work the best with a linear SVM