Trung Ngo
Trung Ngo
The paper: [Deep Speech 2](http://arxiv.org/abs/1512.02595) used _sequence-wise_ normalization for recurrent computation which was proved to substantially improves final generalization error while greatly accelerating training, very deep networks of RNNs on...
The idea comes from keras: https://github.com/fchollet/keras/tree/master/keras/backend Some functions have the same functional, but 2 backends can return different format of results. However, it is not a big issue if we...
Is it necessary to force all Input shapes must be the same in ElementwiseMergeLayer ? This code can be run and provide output normally, however, as soon as I add...
the `Composition` layer never calls `cull` method to enforce the size limit for `diskcache`. There is also no mechanism to clean up the cache after a long development which often...