Orochi icon indicating copy to clipboard operation
Orochi copied to clipboard

Minor improvements in Orochi 2.00 for Zen4 Raphael APU support and other suggestions..

Open oscarbg opened this issue 1 year ago • 0 comments

Hi, first, congrats on getting Orochi 2.0 release out! comments:

  1. tested your new Orochi 2.0 release on Zen4 Raphael iGPU and get some errors..

1a)first RadixSort test fails on Zen4 due to number_of_blocks becoming 0 due to APU having only 2 compute units.. fix (ParallelPrimitives/RadixSort.cpp):

258c258,259
< 		number_of_blocks = ( number_of_blocks / base ) * base;
---
> 		number_of_blocks = ( number_of_blocks / base ) * base;
> 		if( number_of_blocks == 0 ) number_of_blocks = 4;

1b)VulkanComputeSimple checking of pciBusID is incorrect for APUs as Vulkan reports 0 and ROCM 0x6b or viceversa.. I fixed it main.cpp with:

< 			if (physicalDevicePCIBusInfoProperties.pciDomain == props.pciDomainID &&
< 				physicalDevicePCIBusInfoProperties.pciBus == props.pciBusID &&
< 				physicalDevicePCIBusInfoProperties.pciDevice == props.pciDeviceID) {
< 				physicalDeviceIndex = i;
< 				break;
---
> 			if( physicalDevicePCIBusInfoProperties.pciDomain == props.pciDomainID &&
> 				// physicalDevicePCIBusInfoProperties.pciBus == props.pciBusID &&
> 				physicalDevicePCIBusInfoProperties.pciDevice == props.pciDeviceID )
> 			{
> 				if( physicalDevicePCIBusInfoProperties.pciBus > 16 ) // integrated!!
> 				{
> 					if( props.pciBusID == 0 )
> 					{
> 						physicalDeviceIndex = i;
> 						break;
> 					}
> 				}
> 				else
> 				{
> 					if( physicalDevicePCIBusInfoProperties.pciBus == props.pciBusID )
> 					{
> 						physicalDeviceIndex = i;
> 						break;
> 					}
> 				}
  1. can you fix running samples OK by default after a premake5 build?.. I mean running executables directly from Orochi-2.00/dist/bin/Debug or Orochi-2.00/dist/bin/Release folders.. as they seem to try to expect to find kernels or sample data like textures in "../" folder so basically you have to options:

2a) copy or move Debug or Release folder in Orochi-2.0/ folder (out of dist/bin) so samples find required data or

2b) recreate/copy needed files from Orochi-2.0 folder to Orochi-2.00/dist/bin like for example copying:

Orochi-2.00/Test/Texture/texture_test_kernel.hpp to: Orochi-2.00/dist/bin/Test/Texture/texture_test_kernel.hpp

3)add Linux support to VulkanComputeSimple RadixSort (Orochi-2.00/Test/VulkanComputeSimple and Orochi-2.00/Test/RadixSort).. in fact support it's there.. only minor fixes as I added and is very simple.. for Vulkan premake5.lua:

--    buildoptions { "/wd4244" }
      buildoptions { "--std=c++14" }
--      links{ "Pop" }
--      links{ "kernel32", "user32", "gdi32", "winspool", "comdlg32", "advapi32", "shell32", "ole32", "oleaut32", "uuid", "odbc32", "odbccp32", "version" }

  1. finally would be nice if WMMA sample get added NV support.. seems AMD only ATM.. I started with adding to the kernel "#ifdef CUDA_ARCH" code:
#ifdef __CUDA_ARCH__
#include <mma.h>
//https://developer.nvidia.com/blog/programming-tensor-cores-cuda-9/
using namespace nvcuda;
...
wmma::fragment<wmma::matrix_a, 16, 16, 16, half, wmma::col_major> a_frag2;
	wmma::fragment<wmma::matrix_b, 16, 16, 16, half, wmma::col_major> b_frag2;
	wmma::fragment<wmma::accumulator, 16, 16, 16, float> acc_frag2;
	wmma::fragment<wmma::accumulator, 16, 16, 16, float> c_frag2;
..
#ifdef __CUDA_ARCH__
	nvcuda::wmma::mma_sync( c_frag2, a_frag2, b_frag2, c_frag2 );
#else
	c_frag = __builtin_amdgcn_wmma_f16_16x16x16_f16_w32( a_frag, b_frag, c_frag, false );
#endif

#endif

oscarbg avatar Apr 05 '24 04:04 oscarbg