Yilong Guo issues

Results 7 issues of


                                            Yilong Guo

[IPEX] Slice SDPA into smaller chunks

## Description Slice `scaled_dot_product_attention` into smaller chunks so that the SDPA of each chunk wouldn't request any allocation larger than the given limit. This was initially designed to work around...

Multiple GPUs support and XPU device support

- Adds two new options `--device` and `--enhance-device` to specify devices for HunyuanDiT and DialogGen respectively. Users with multiple GPU devices could offload two models to different devices. - Supports...

[SYCL][CPU] Add ext_oneapi_ballot_group aspect to spir64_x86_64 target

The aspect is supported since [OpenCL CPU 2024.2](https://github.com/intel/llvm/releases/download/2024-WW25/oclcpuexp-2024.18.6.0.02_rel.tar.gz)

NotImplementedError: Could not run 'aten::_upsample_bicubic2d_aa.out' with arguments from the 'XPU' backend.

### Describe the bug ```python import torch import intel_extension_for_pytorch as ipex input = torch.randn(1,3,512,512,device='xpu') torch.nn.functional.interpolate(input, size=(512,512), mode='bicubic', antialias=True) torch.nn.functional.interpolate(input, size=(512,512), mode='bilinear', antialias=True) ``` ``` File "D:\ComfyUI-Arc\python\lib\site-packages\torch\nn\functional.py", line 4027, in interpolate...

ARC

Crash

Escalate

[SYCL][Host Task] Bad performance of consecutively submitted host tasks onto an in-order queue

### Describe the bug While submitting consecutive host tasks to an in-order queue without explicit `wait()`, the execution time of each host task explodes as the number of submission increases....

performance

Stale

[WIP][SYCL][HostTask] Optimize blocked users tracking

This commit partially addresses a performance issue observed when submitting consecutive host tasks to an in-order queue without explicit `wait()`. The execution time of each host task was found to...

c11_atomics: Fix cl_uint --> cl_half conversion on host

The atomic tests were incorrectly casting cl_uint values directly to cl_half types using simple C-style casts, which doesn't properly handle half-precision floating-point conversion. This caused incorrect bit patterns when using...

focused review