YanbingJiang issues

Results 19 issues of


                                            YanbingJiang

Update ideep for NNC post-op

### Description This PR is to add NNC post-op fusion support in ideep for further NNC development. It includes: - element wise post op fusion - conv/matmal/linear + binary post...

triaged

open source

cla signed

intel

Time consumption in primitive creation between conv1d (nwc) and conv2d (block format)

Hi, We found that, the time consumption of primitive creation of conv1d (nwc input) is much higher than that of conv2d (block format), especially the first creation. Though, it has...

question

Add profiling support for benchmark/kernel and benchmark/inference

Adding inference test in benchmark/kernel. Add profile in benchmark/inference.

feature

0 - Priority P0

benchmark

Replace NeighborSampler with NeighborLoader in mag240m

Currently, this PR is a draft PR that contains many print log.

Fix amp_bf16 train with staged_train_test

This PR is to fix the issue of amp_bf16 train with staged_train_test in CPU. Need set `forward_contexts` correctly with `torch.cpu.amp.autocast(dtype=torch.bfloat16)`, otherwise, in staged_train_test, model cannot run into bf16 successfully.

cla signed

[Roadmap WIP] Standardize and increase coverage for TorchBench

## Motivation `TorchBench` is a collection of open-source benchmarks used to evaluate PyTorch performance. It provides a standardized API for benchmark drivers, both for evaluation (eager/jit) and training. Plenty of...

roadmap

Update c++17 for pytorch 2.1.0

This PR is to update C++17 for PyTorch 2.1.0.

int8 Woq raise Codegen Error with `--compile_prefill`

Hi Maintainers @yanboliang @Chillee , I encounter codegen error when using `--compile_prefile` in int8 Woq. Although it can still run, it could be confused to users. Could you please fix...

Optimize Int8 Woq for CPU

This PR is to optimize Int8 Woq both in gpt-fast and mixtral-moe. At the current stage, we use `torch.ops.aten._weight_int8pack_mm` as an workaround. And this workaround will be removed when https://github.com/pytorch/pytorch/pull/120985...

CLA Signed

mmap issue in bf16 of gpt-fast

gpt-fast will use `torch.load` with `mmap=True` to load checkpoints of models. This may help speed up model load time. However, eventually, mmap is not used in bf16, because in https://github.com/pytorch-labs/gpt-fast/blob/main/generate.py#L247,...