cuda-quantum Out of scope kernel

Required prerequisites

[X] Make sure you've read the documentation. Your issue may be addressed there.
[X] Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
[X] If possible, make a PR with a failing test to give us a starting point to work on!

Describe the bug

import cudaq 

def ghz(n): 
            
    kernel = cudaq.make_kernel()

    qubits = kernel.qalloc(n)

    kernel.h(qubits[0])
        
    for i in range(n): 
        if i+1==n: 
            break 
        else: 
            kernel.cx(qubits[i], qubits[i+1])
            
    kernel.mz(qubits)
    
    return kernel 

cudaq.set_target("nvidia")
cudaq.sample(ghz(4)).dump()

The code above executes fine however changing the target yields an error:


cudaq.set_target("nvidia-mqpu")
cudaq.sample_async(ghz(4), qpu_id = 0).get().dump()

Segmentation fault (core dumped)

The fix is the following:

def ghz(kernel, n): 
    qubits = kernel.qalloc(n)
    kernel.h(qubits[0])
    for i in range(n): 
        if i+1==n: 
            break 
        else: 
            kernel.cx(qubits[i], qubits[i+1])
    kernel.mz(qubits)
    return kernel 


kernel = cudaq.make_kernel()
cudaq.sample_async(ghz(kernel, 10), qpu_id = 0).get().dump()

The kernel must be created outside of the function and passed in as an argument. This may be due to kernel going out of scope.

Not sure if there is something we need to fix here but curious to note it down in case we can think of a workaround.

Steps to reproduce the bug

NA

Expected behavior

NA

Is this a regression? If it is, put the last known working version (or commit) here.

Not a regression

Environment

CUDA Quantum version:
Python version:
C++ compiler:
Operating system:

Suggestions

No response

Feb 13 '24 09:02 zohimchandani

This needs confirmation if the issue still persists after the update to the new Python support.

Apr 02 '24 12:04 bettinaheim

Using the new Python syntax, I rewrote the kernel as follows and it works (verified with version 0.7.0)

import cudaq

@cudaq.kernel
def ghz(n: int):
    qubits = cudaq.qvector(n)
    h(qubits[0])
    for i in range(n):
        if i + 1 == n:
            break
        else:
            cx(qubits[i], qubits[i + 1])
    mz(qubits)


cudaq.set_target("nvidia")
cudaq.sample(ghz, 4).dump()

cudaq.set_target("nvidia-mqpu")
cudaq.sample_async(ghz, 4, qpu_id=0).get().dump()

Sample output:

{ 0000:524 1111:476 }
{ 0000:505 1111:495 }

Apr 02 '24 16:04 khalatepradnya

Using the new Python syntax, I rewrote the kernel as follows and it works (verified with version 0.7.0)

import cudaq

@cudaq.kernel
def ghz(n: int):
    qubits = cudaq.qvector(n)
    h(qubits[0])
    for i in range(n):
        if i + 1 == n:
            break
        else:
            cx(qubits[i], qubits[i + 1])
    mz(qubits)


cudaq.set_target("nvidia")
cudaq.sample(ghz, 4).dump()

cudaq.set_target("nvidia-mqpu")
cudaq.sample_async(ghz, 4, qpu_id=0).get().dump()

Sample output:

{ 0000:524 1111:476 }
{ 0000:505 1111:495 }

@khalatepradnya would you mind trying the original breaking code with the new kernel builder implementation too? The main issue before was that the kernel_builder was a C++ data type passed to Python as a unique_ptr, and when the instance went out of scope it was deleted, thus causing issues for parent scope python code.

Apr 02 '24 16:04 amccaskey

@khalatepradnya would you mind trying the original breaking code with the new kernel builder implementation too? The main issue before was that the kernel_builder was a C++ data type passed to Python as a unique_ptr, and when the instance went out of scope it was deleted, thus causing issues for parent scope python code.

Yes, I tried that code as is with version 0.7.0. It fails and gives Segmentation Fault.

For instance, the following snippet:

import cudaq


def ghz(n):
    kernel = cudaq.make_kernel()
    qubits = kernel.qalloc(n)
    kernel.h(qubits[0])
    for i in range(n):
        if i + 1 == n:
            break
        else:
            kernel.cx(qubits[i], qubits[i + 1])
    kernel.mz(qubits)
    return kernel


cudaq.set_target("nvidia")
cudaq.sample(ghz(4)).dump()

cudaq.set_target("qpp-cpu")
cudaq.sample(ghz(4)).dump()

cudaq.set_target("nvidia-mqpu")
cudaq.sample_async(ghz(4), qpu_id=0).get().dump()

gives:

{ 0000:498 1111:502 }
{ 0000:510 1111:490 }
Segmentation fault

Apr 02 '24 19:04 khalatepradnya

This is definitely a scoping issue. The following works fine

cudaq.set_target("nvidia-mqpu")
k = ghz(4)
cudaq.sample_async(k, qpu_id=0).get().dump()

since the kernel is stored to k and hence has a live user during sample_async execution.

cudaq.sample_async(ghz(4), qpu_id=0).get().dump()

works if I manually increase the reference count on the input py::object

diff --git a/python/runtime/cudaq/algorithms/py_sample_async.cpp b/python/runtime/cudaq/algorithms/py_sample_async.cpp
index bcee2020f..fc77ef209 100644
--- a/python/runtime/cudaq/algorithms/py_sample_async.cpp
+++ b/python/runtime/cudaq/algorithms/py_sample_async.cpp
@@ -48,8 +48,9 @@ for more information on this programming pattern.)#")
 
   mod.def(
       "sample_async",
-      [&](py::object &kernel, py::args args, std::size_t shots,
+      [&](py::object kernel, py::args args, std::size_t shots,
           std::size_t qpu_id) {
+        kernel.inc_ref();
         auto &platform = cudaq::get_platform();
         if (py::hasattr(kernel, "compile"))
           kernel.attr("compile")();
@@ -78,8 +79,8 @@ for more information on this programming pattern.)#")
         // Should only have C++ going on here, safe to release the GIL
         py::gil_scoped_release release;
         return cudaq::details::runSamplingAsync(
-            [argData, kernelName, kernelMod, shots,
-             hasQubitMeasurementFeedback]() mutable {
+            [argData, kernelName, kernelMod, shots, hasQubitMeasurementFeedback,
+             &kernel]() mutable {
               static std::size_t localShots = 0;
               pyAltLaunchKernel(kernelName, kernelMod, *argData, {});
               // delete the raw arg data pointer.
@@ -87,6 +88,8 @@ for more information on this programming pattern.)#")
                 if (localShots == shots - 1)
                   delete argData;
                 localShots++;
+
+                kernel.dec_ref();
               }
             },
             platform, kernelName, shots, qpu_id);

Apr 03 '24 16:04 amccaskey

This is definitely a scoping issue. The following works fine
cudaq.set_target("nvidia-mqpu")
k = ghz(4)
cudaq.sample_async(k, qpu_id=0).get().dump()
since the kernel is stored to k and hence has a live user during sample_async execution.
cudaq.sample_async(ghz(4), qpu_id=0).get().dump()
works if I manually increase the reference count on the input py::object ...

Wouldn't this apply to observe_async as well?

Apr 03 '24 17:04 khalatepradnya

I would think so.

Apr 03 '24 17:04 amccaskey