cuda-quantum C++ segfault when passing callable kernel to another kernel from library

Take the following files

// lib.h 
#include "cudaq.h"

void kernel(cudaq::qvector<>& q);

// lib.cpp 
#include "lib.h"

__qpu__ void kernel(cudaq::qvector<>& q) {
    x(q[0]);
}

and

// user.cpp
#include "lib.h"

__qpu__ void userKernel(const std::function<void(cudaq::qvector<> &)> &init) {
  cudaq::qvector q(2);
  init(q);
}

int main() { userKernel(kernel); }

Compile and link with the following

nvq++ --enable-mlir -fPIC -c lib.cpp -o lib.o
nvq++ --enable-mlir -fPIC lib.o user.cpp 
# The run 
CUDAQ_LOG_LEVEL=info ./a.out

This results in a segmentation fault.

Can anyone else reproduce this? I would be very thankful for anyone's help on this one. This kind of pattern will be a primary feature of future downstream libraries.

Another variation would be this

// lib.h 
#include "cudaq.h"

std::function<void(cudaq::qvector<>&)> get_kernel();//cudaq::qvector<>& q);

// lib.cpp
#include "lib.h"

__qpu__ void kernel(cudaq::qvector<> &q) { x(q[0]); }

std::function<void(cudaq::qvector<> &)> get_kernel() { return kernel; }

#include "lib.h"

__qpu__ void userKernel(const std::function<void(cudaq::qvector<> &)> &init) {
  cudaq::qvector q(2);
  init(q);
}

int main() { userKernel(get_kernel()); }

Aug 19 '24 23:08 amccaskey

I am able to reproduce a segmentation fault with this first example.

root@ea401e2-lcedt:/workspaces/cuda-quantum/examples/cpp# CUDAQ_LOG_LEVEL=info ./a.out 
[2024-08-20 00:13:39.899] [info] [PluginUtils.h:24] Requesting N5cudaq16quantum_platformE plugin via symbol name getQuantumPlatform.
[2024-08-20 00:13:39.899] [info] [PluginUtils.h:36] Successfully loaded the plugin.
[2024-08-20 00:13:39.899] [info] [PluginUtils.h:24] Requesting N5nvqir16CircuitSimulatorE plugin via symbol name getCircuitSimulator.
[2024-08-20 00:13:39.899] [info] [PluginUtils.h:36] Successfully loaded the plugin.
[2024-08-20 00:13:39.942] [info] [NVQIR.cpp:82] Creating the custatevec-fp32 backend.
[2024-08-20 00:13:39.942] [info] [CircuitSimulator.h:901] Allocating 2 new qubits.
[2024-08-20 00:13:39.942] [info] [CuStateVecCircuitSimulator.cpp:170] GPU 0 Allocating new qubit array of size 2.
Segmentation fault (core dumped)

Aug 20 '24 00:08 sacpis

This could be a bridge issue in handling the const std::function<void(cudaq::qvector<> &)> &init argument.

Looking at the generated code:

define void @_Z10userKernelRKSt8functionIFvRN5cudaq7qvectorILm2EEEEE({ i8*, i8* } %0) local_unnamed_addr

as compared to the LLVM one:

define linkonce_odr dso_preemptable void @_Z10userKernelRKSt8functionIFvRN5cudaq7qvectorILm2EEEEE(ptr noundef nonnull align 8 dereferenceable(32) %init) #5 personality ptr @__gxx_personality_v0 !dbg !3373

For some reason the argument is interpreted as a pair of pointers? This wrong argument assumption will crash the argsCreator later.

Compiling the app in library mode (lib.o was still compiled with MLIR mode) is okay; hence it's likely the problem.

@schweitzpgi Do we support std::function arguments yet?

Aug 20 '24 04:08 1tnguyen

@schweitzpgi I see this in ConvertCCToLLVM.cpp

void cudaq::opt::populateCCTypeConversions(LLVMTypeConverter *converter) {
  converter->addConversion([](cc::CallableType type) {
    return lambdaAsPairOfPointers(type.getContext());
  });
  ...
}

Looks like this is setup for just lambdas?

Aug 21 '24 12:08 amccaskey

This is also interesting

define { i8*, i64 } @function_kernel_to_sample._Z16kernel_to_sampleRKSt8functionIFvRN5cudaq7qvectorILm2EEEEE.thunk(i8* nocapture readnone %0, i1 %1) {
  %3 = tail call %Array* @__quantum__rt__qubit_allocate_array(i64 2)
  unreachable
}
define i64 @function_kernel_to_sample._Z16kernel_to_sampleRKSt8functionIFvRN5cudaq7qvectorILm2EEEEE.argsCreator(i8** nocapture readnone %0, i8** nocapture writeonly %1) #2 {
...

Just a guess, but could this be why we see a seg fault in the argsCreator function? The thunk is getting called by altLaunchKernel, and we hit this unreachable line, with the next spot in memory the argscreator ???

Aug 21 '24 12:08 amccaskey

Here's a test repo for all this

https://github.com/amccaskey/test_cudaq_cpp_py_integration

mkdir build && cd build 
cmake .. -G Ninja -DCUDAQ_DIR=/path/to/cudaq/lib/cmake/cudaq -DCMAKE_BUILD_TYPE=Debug 
ninja 
PYTHONPATH=/path/to/cudaq:$PWD gdb --args python3-dbg test.py

Aug 21 '24 12:08 amccaskey

Thanks for the heads-up. I'll add this to my list to look at.

Aug 21 '24 17:08 schweitzpgi

This may be interesting.

% nvq++ --enable-mlir -fkernel-exec-kind=2 -fPIC -g -c lib.cpp -o lib.o
% nvq++ --enable-mlir -fkernel-exec-kind=2 -g -fPIC  lib.o user.cpp
% ./a.out
terminate called after throwing an instance of 'std::runtime_error'
  what():  Wrong kernel launch point: Attempt to launch kernel in streamlined for JIT mode on local simulated QPU. This is not supported.
Aborted
%

Aug 23 '24 21:08 schweitzpgi