Jianyu Huang

Results 51 issues of Jianyu Huang

**Is your feature request related to a problem? Please describe.** Recently more details about Nvidia's latest H100 GPU are released in https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/ . Tensor Core will support FP8 E4M3 and...

feature request
inactive-30d
inactive-90d

Use "clang-format" (https://clang.llvm.org/docs/ClangFormat.html) to update the format of BLISlab code.

Summary: Add the APIs for using UVM where the preferred location is on GPU device instead of on CPU device. Differential Revision: D36657705

fb-exported
cla signed

Summary: This will be better shared between Trec and HPC. - It's open source so TorchRec can call it from FBGEMM. - Add Codec-based quantized comm support with FP32, FP16,...

fb-exported
cla signed

Summary: Reuse the quantize utils functions and dedup the code. Differential Revision: D37745225

fb-exported
cla signed

Summary: This will be better shared between Trec and HPC. It's open source so TorchRec can call it from FBGEMM. Differential Revision: D37745301

fb-exported
cla signed

Summary: From D35292923 Differential Revision: D36121284

fb-exported
cla signed

Summary: Debug https://fb.workplace.com/groups/210783077585773/permalink/407320664598679/ - Use fbgemm namespace for the update - load CPU ops with "torch.ops.load_library("//deeplearning/fbgemm/fbgemm_gpu:permute_pooled_embedding_ops_cpu")" Differential Revision: D36390480

fb-exported
cla signed

For FBGEMM release v0.1.0

cla signed