clink icon indicating copy to clipboard operation
clink copied to clipboard

cuGetProcAddress not implement

Open chaunceyjiang opened this issue 1 year ago • 5 comments

cuGetProcAddress not implement

I encountered the aforementioned error, and I also tried to implement it myself, but was continuously unsuccessful.

If you have some spare time, could you help implement this function? Thank you very much.

server

// cuda.cc


void Cuda::DispatchCuGetProcAddress(CudaRequest* request, RenderResponse* response) {
  CL_ASSERT(request->param_count == 5);
  auto symbol = (const char*)CUDA_REQUEST_EXTEND(request);
  CL_LOG("params1 :%s", symbol);
  CL_LOG("params2 :%d", (int)request->params[2]);
  CL_LOG("params3 :%d", (cuuint64_t)request->params[3]);

  response->header.size = sizeof(void*) + sizeof(CUdriverProcAddressQueryResult);
  response->data.resize(response->header.size);

  void** fn = (void**)response->data.data();
  CUdriverProcAddressQueryResult* ret = (CUdriverProcAddressQueryResult*)FIELD_OFFSET(fn);

  response->header.result = cuGetProcAddress(symbol, bin, (int)request->params[2], (cuuint64_t)request->params[3], ret);
    CL_LOG("-----");
  CL_LOG("ret1: %d", fn);
  CL_LOG("ret1: %d", *fn);
  CL_LOG("ret2: %d", *ret);
}




---
void Cuda::Dispatch(WorkerItem* item) {
  auto request = (CudaRequest*)item->request.data.data();
  CL_LOG("call api=%s param_count=%d", GetCudaFunctionName(request->api_index), request->param_count);
  
  CL_ASSERT(request->version == version_);
  Render::Dispatch(item);

  switch (request->api_index) {
    case CUGETPROCADDRESS:
        DispatchCuGetProcAddress(request, &item->response);
        break;
...
...

client

// render.cpp

CUresult Render::PrepareRequest(RenderRequest* request) {
  auto cuda = (CudaRequest*)request->datas[0].data();

  switch (cuda->api_index) {
  case CUGETPROCADDRESS: {
      auto symbol = (char*)cuda->params[0];
      if (symbol) {
          std::string_view name_sv(symbol, strlen(symbol) + 1);
          request->header.size += (uint32_t)name_sv.size();
          request->datas.emplace_back(std::move(name_sv));
      }
      break;
  }


// -----

CUresult Render::HandleResponse(RenderRequest* request, RenderResponse* response) {
  auto cuda = (CudaRequest*)request->datas[0].data();
  auto result = (CUresult)response->header.result;

  if (result != CUDA_SUCCESS) {
    CL_ERROR("handle error for api=%s result=%d", GetCudaFunctionName(cuda->api_index), result);
    goto end;
  }

  switch (cuda->api_index) {
    case CUGETPROCADDRESS:
      auto ret = (uint64_t*)response->data.data();
      auto fn = (void*)ret[0];
      auto size = (CUdriverProcAddressQueryResult)ret[1];
      *(void**)cuda->params[1] = fn;
      if (cuda->params[4]) {
        *(CUdriverProcAddressQueryResult*)cuda->params[4] = size;
      }
     break;

chaunceyjiang avatar Mar 05 '24 03:03 chaunceyjiang

My question is, the server returns to the client a pointer (fn**) pointing to a pointer (fn*). This pointer belongs to the server and cannot be used by the client.

chaunceyjiang avatar Mar 05 '24 07:03 chaunceyjiang

https://developer.nvidia.com/blog/exploring-the-new-features-of-cuda-11-3/

CUDA 11.3 also introduces a new driver and runtime API to query memory addresses for driver API functions. Previously, there was no direct way to obtain function pointers to the CUDA driver symbols. To do so, you had to call into dlopen, dlsym, or GetProcAddress. This feature implements a new driver API, cuGetProcAddress, and the corresponding new runtime API cudaGetDriverEntryPoint.

chaunceyjiang avatar Mar 05 '24 10:03 chaunceyjiang

Sorry for my late reply. If you want to implement cuGetProcAddress, do it on the local side, no need to pass it to server. For example, when you hook cuGetProcAddress, analyse the params of it when it be called, and return the target function address of what you have implemented in clink.

nooodles2023 avatar May 29 '24 10:05 nooodles2023

I have completed a new project related to remote CUDA. It is much faster than Clink. I patched the kernel function parameters for cuLaunchKernel and implemented a new protocol to transfer CUDA requests and responses. It is set to launch soon. Please stay tuned for updates and announcements

nooodles2023 avatar May 29 '24 10:05 nooodles2023

I have completed a new project related to remote CUDA. It is much faster than Clink. I patched the kernel function parameters for cuLaunchKernel and implemented a new protocol to transfer CUDA requests and responses.

Amazing!! I will keep an eye on it. Also, are you going to make it open source? If possible, I would like to contribute as well.

chaunceyjiang avatar May 29 '24 10:05 chaunceyjiang