Yilong Guo

Results 7 issues of Yilong Guo

## Description Slice `scaled_dot_product_attention` into smaller chunks so that the SDPA of each chunk wouldn't request any allocation larger than the given limit. This was initially designed to work around...

- Adds two new options `--device` and `--enhance-device` to specify devices for HunyuanDiT and DialogGen respectively. Users with multiple GPU devices could offload two models to different devices. - Supports...

The aspect is supported since [OpenCL CPU 2024.2](https://github.com/intel/llvm/releases/download/2024-WW25/oclcpuexp-2024.18.6.0.02_rel.tar.gz)

### Describe the bug ```python import torch import intel_extension_for_pytorch as ipex input = torch.randn(1,3,512,512,device='xpu') torch.nn.functional.interpolate(input, size=(512,512), mode='bicubic', antialias=True) torch.nn.functional.interpolate(input, size=(512,512), mode='bilinear', antialias=True) ``` ``` File "D:\ComfyUI-Arc\python\lib\site-packages\torch\nn\functional.py", line 4027, in interpolate...

ARC
Crash
Escalate

### Describe the bug While submitting consecutive host tasks to an in-order queue without explicit `wait()`, the execution time of each host task explodes as the number of submission increases....

performance
Stale

This commit partially addresses a performance issue observed when submitting consecutive host tasks to an in-order queue without explicit `wait()`. The execution time of each host task was found to...

The atomic tests were incorrectly casting cl_uint values directly to cl_half types using simple C-style casts, which doesn't properly handle half-precision floating-point conversion. This caused incorrect bit patterns when using...

focused review