housebaby comments

Results 18 comments of


                                            housebaby

decoder error use multiple GPUs and multiple threads, with cuda-10.2

@danpovey thanks for your quick reply. Can you give more details on what limits kaldi to support multiple GPUs. We make some modifacation on cu-device.cc and wrap a decoder with...

decoder error use multiple GPUs and multiple threads, with cuda-10.2

@btiplitz thanks a lot. can you give more details on which part of nvidia code need to change? is the code of the lib like libcuda.* libcublas* ?

decoder error use multiple GPUs and multiple threads, with cuda-10.2

> We're currently looking at sharing the allocator, but that's mostly for a single GPU configuration. > For multi-gpu, given the fact that no communication between devices is needed, why...

decoder error use multiple GPUs and multiple threads, with cuda-10.2

@hugovbraun Actually, I have no idea about whether cu-device.* files will still be used in kaldi10

decoder error use multiple GPUs and multiple threads, with cuda-10.2

> We're currently looking at sharing the allocator, but that's mostly for a single GPU configuration. > For multi-gpu, given the fact that no communication between devices is needed, why...

decoder error use multiple GPUs and multiple threads, with cuda-10.2

> @housebaby are you looking to use one GPU per stream? Or one CPU thread per stream? If that's the case, I would strongly suggest to take a look at...

decoder error use multiple GPUs and multiple threads, with cuda-10.2

> @housebaby have you run nvidia-smi on the loading during testing. The 2nd GPU would only help if you max out the gpu. And Dan makes a point on multiple...

decoder error use multiple GPUs and multiple threads, with cuda-10.2

> @housebaby are you looking to use one GPU per stream? Or one CPU thread per stream? If that's the case, I would strongly suggest to take a look at...

decoder error use multiple GPUs and multiple threads, with cuda-10.2

> Several of the parameters effect accuracy, like the lattice so it seems combining those in a performance test is a mistake. The code stuffs data into a queue and...

History state info lost in nnet computer in cuda-decoder, which cause accuray decreases

> You're right, the current neural net context switch mechanism of the online pipeline has been designed for CNN-based networks. > > Regarding relying on the inner state of a...