Roma Dubtsov comments

Results 12 comments of


                                            Roma Dubtsov

Broken(?) links to not-fully-qualified types

Thanks! I'll try to work around it then.

[October 2015] Intel are CPU magicians. But there's no one weird trick....

Hi, I'm one of the developers who worked on this package. I've looked at the [run.sh](https://github.com/soumith/convnet-benchmarks/blob/cpu/intel_optimized_technical_preview_for_multinode_caffe_1.0/run.sh) and the only suggestion I have is to enable OpenMP thread affinity by setting...

[October 2015] Intel are CPU magicians. But there's no one weird trick....

@andravin, > Are other processors (eg i7) affected by AVX2 frequencies, if so where can > we find documentation of the AVX2 frequencies for those processors? Probably the CPU support...

[October 2015] Intel are CPU magicians. But there's no one weird trick....

@ozabluda, Here's some data from a 2xE5-2697v3 machine (sorry, could did not have a desktop machine with a proper OS handy). My colleague timed IntelCaffe on 14 and 28 cores...

[October 2015] Intel are CPU magicians. But there's no one weird trick....

@andravin > > Are other processors (eg i7) affected by AVX2 frequencies, if so where can we find documentation of the AVX2 frequencies for those processors? > > Probably the...

gemm_s8s8s32: latest version is 1.4x slower than v0.21 with Intel MKL on AVX2

Reproduced for 274be82 but not for tip of master. Is there a particular reason you cannot use the latest revision?

gemm_s8s8s32: latest version is 1.4x slower than v0.21 with Intel MKL on AVX2

We certainly want to resolve this. We just are still bikeshedding the solution :) In DNNL we have scratchpads as well, so we are not sure if we need a...

cuBLAS single precision issue

Hello @ww5862. Thanks for the report. A few questions: 1. Can you please post output of test run after setting the environment variable `CUBLASLT_LOG_MASK=64` (e.g. `export CUBLASLT_LOG_MASK=64` in bash)? ([documentation](https://docs.nvidia.com/cuda/cublas/#cublaslt-logging))....

problem using stalonetray with mwm

Hi @gcomes. Thanks for the report. I will certainly look into this, but unfortunately I am not able to dedicate any serious time to this project. The only thing I...

cuBLASLt FP8 batched gemm with bias

This has been fixed starting from cuBLAS 12.6 Update 2 (https://developer.nvidia.com/cuda-12-6-2-download-archive). Closing.