FBGEMM issues

Deepcopy to avoid errors from inplace operators

5

Differential Revision: D38716708

jiecaoyu

fb-exported

cla signed

Add Mx2, Mx4, 2xN, and 4xN avx512 transpose

6

Add Mx2, Mx4, 2xN, and 4xN specific transposes on avx512 to improve the transpose performance of shapes of Mx2, Mx4, 2xN, and 4xN. * When the shape is Mx2 or...

CaoE

cla signed

Clang 14 Build Failure

18

Building FBGEMM with clang 14 fails. The errors look like this: `In file included from /home/juanpaez/FBGEMM/src/EmbeddingSpMDM.cc:11: In file included from /home/juanpaez/FBGEMM/third_party/asmjit/src/asmjit/asmjit.h:27: In file included from /home/juanpaez/FBGEMM/third_party/asmjit/src/asmjit/./core.h:2009: In file included from...

juanpaez22

Change the hardcoded 128 items per warp in embeddingBag to a variable and optimize for ROCm

9

Make @weihanmines's PR https://github.com/ROCmSoftwarePlatform/FBGEMM/pull/13 upstreamable. @sryap, would you please review the PR and consider converting it to a draft? Thank you.

liligwu

cla signed

module: rocm

Adding warmup option to device benchmark

2

Summary: In the current code, we generate a different input batch for each iteration. For training benchmarks, we use global batch size of >= 64K. So, it becomes expensive (in...

mustafaozdal

fb-exported

cla signed

FBGEMM_gpu support for CUDA 11.5

10

We are trying to compile FBGEMM_gpu from source and are running into some errors during the build process that we suspect are related to an unsupported CUDA version. Please let...

chantat

error: comparison of integer expressions of different signedness

2

In the QuantUtilsTest.cc on line 641 I am getting the error "error: comparison of integer expressions of different signedness: ‘std::vector::size_type’ {aka ‘long unsigned int’} and ‘int’ [-Werror=sign-compare] [build] 641 |...

dkbattle

Conanfile.py??

1

Does anyone have a conanfile.py created for this dependency?

dkbattle

Add prototype for optimized dense2jagged kernel for 1 jagged dim

2

Sharing optimized prototype kernel for 2D dense_to_jagged operations using vectorized operations. CC: @mjanderson09

chirayuG-nvidia

cla signed

Enable uvm using p2p (UVM on a pair GPU)

4

Summary: Add the APIs for using UVM where the preferred location is on GPU device instead of on CPU device. Differential Revision: D36657705

jianyuh

fb-exported

cla signed

FBGEMM
FBGEMM copied to clipboard

Metadata

Deepcopy to avoid errors from inplace operators

Add Mx2, Mx4, 2xN, and 4xN avx512 transpose

Clang 14 Build Failure

Change the hardcoded 128 items per warp in embeddingBag to a variable and optimize for ROCm

Adding warmup option to device benchmark

FBGEMM_gpu support for CUDA 11.5

error: comparison of integer expressions of different signedness

Conanfile.py??

Add prototype for optimized dense2jagged kernel for 1 jagged dim

Enable uvm using p2p (UVM on a pair GPU)

← Metadata

Owner

Metadata

FBGEMM FBGEMM copied to clipboard

Metadata

← Metadata

Owner

Metadata

FBGEMM
FBGEMM copied to clipboard