James Osborn
James Osborn
Adding a `mixin` works. Otherwise, since there is only one previous definition of `nest`, the call gets bound to a closed symbol. ```Nim type X[T] = object inner: T template...
The error message is correct, but if one isn't aware of the rule, it can be confusing. Adding a suggestion to try mixin if the symbol is closed could be...
The goal was to make sure to print out the last error string if we end up [here](https://github.com/lattice/quda/blob/4ee9b87781efc9b9f52d166738c5d085dfb94d9c/lib/tune.cpp#L934). If `verbosity >= QUDA_DEBUG_VERBOSE` then we would have already printed it out...
It is essentially fully functional. Depending on which version of oneapi and hardware you run with there may be some issues though. It requires Intel SYCL since it uses some...
I've only tested it with dpcpp/icpx.
Thanks for reporting that. This is fixed now. I have successfully tested it on Intel, but had issues on NVIDIA.
I get a bunch of errors like: ptxas error : Entry function '_ZTSZZN4quda6launchINS_9Kernel3DSINS_14dslash_functorENS_18dslash_functor_argINS_19domainWall4DFusedM5ENS_9packShmemELi2ELb0ELb1ELNS_10KernelTypeE5ENS_22DomainWall4DFusedM5ArgIsLi3ELi4EL21QudaReconstructType_s8ELNS_11Dslash5TypeE8EEEEELb0EEESB_EENSt9enable_ifIXntclsr6deviceE14use_kernel_argIT0_EEE11qudaError_tE4typeERKNS_12qudaStream_tERN4sycl3_V18nd_rangeILi3EEERKSE_ENKUlRNSM_7handlerEE_clEST_EUlNSM_7nd_itemILi3EEEE__with_offset' uses too much shared data (0x18000 bytes, 0xc000 max)
I wasn't setting the compute capability before, I'm trying again with sm_80. I'm not sure what else I can change yet.
Yes, it seems it will only use static shared memory: https://github.com/intel/llvm/pull/3329 I'll see what I can get to compile now, and look into setting a limit for it.
Yes, it generally requires the latest version of oneAPI (or intel-llvm). I'm currently testing with 2023.0.0. The issues you are seeing are due to differences in the older version of...