Out of scope kernel
Required prerequisites
- [X] Make sure you've read the documentation. Your issue may be addressed there.
- [X] Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
- [X] If possible, make a PR with a failing test to give us a starting point to work on!
Describe the bug
import cudaq
def ghz(n):
kernel = cudaq.make_kernel()
qubits = kernel.qalloc(n)
kernel.h(qubits[0])
for i in range(n):
if i+1==n:
break
else:
kernel.cx(qubits[i], qubits[i+1])
kernel.mz(qubits)
return kernel
cudaq.set_target("nvidia")
cudaq.sample(ghz(4)).dump()
The code above executes fine however changing the target yields an error:
cudaq.set_target("nvidia-mqpu")
cudaq.sample_async(ghz(4), qpu_id = 0).get().dump()
Segmentation fault (core dumped)
The fix is the following:
def ghz(kernel, n):
qubits = kernel.qalloc(n)
kernel.h(qubits[0])
for i in range(n):
if i+1==n:
break
else:
kernel.cx(qubits[i], qubits[i+1])
kernel.mz(qubits)
return kernel
kernel = cudaq.make_kernel()
cudaq.sample_async(ghz(kernel, 10), qpu_id = 0).get().dump()
The kernel must be created outside of the function and passed in as an argument. This may be due to kernel going out of scope.
Not sure if there is something we need to fix here but curious to note it down in case we can think of a workaround.
Steps to reproduce the bug
NA
Expected behavior
NA
Is this a regression? If it is, put the last known working version (or commit) here.
Not a regression
Environment
- CUDA Quantum version:
- Python version:
- C++ compiler:
- Operating system:
Suggestions
No response
This needs confirmation if the issue still persists after the update to the new Python support.
Using the new Python syntax, I rewrote the kernel as follows and it works (verified with version 0.7.0)
import cudaq
@cudaq.kernel
def ghz(n: int):
qubits = cudaq.qvector(n)
h(qubits[0])
for i in range(n):
if i + 1 == n:
break
else:
cx(qubits[i], qubits[i + 1])
mz(qubits)
cudaq.set_target("nvidia")
cudaq.sample(ghz, 4).dump()
cudaq.set_target("nvidia-mqpu")
cudaq.sample_async(ghz, 4, qpu_id=0).get().dump()
Sample output:
{ 0000:524 1111:476 }
{ 0000:505 1111:495 }
Using the new Python syntax, I rewrote the kernel as follows and it works (verified with version 0.7.0)
import cudaq @cudaq.kernel def ghz(n: int): qubits = cudaq.qvector(n) h(qubits[0]) for i in range(n): if i + 1 == n: break else: cx(qubits[i], qubits[i + 1]) mz(qubits) cudaq.set_target("nvidia") cudaq.sample(ghz, 4).dump() cudaq.set_target("nvidia-mqpu") cudaq.sample_async(ghz, 4, qpu_id=0).get().dump()Sample output:
{ 0000:524 1111:476 } { 0000:505 1111:495 }
@khalatepradnya would you mind trying the original breaking code with the new kernel builder implementation too? The main issue before was that the kernel_builder was a C++ data type passed to Python as a unique_ptr, and when the instance went out of scope it was deleted, thus causing issues for parent scope python code.
@khalatepradnya would you mind trying the original breaking code with the new kernel builder implementation too? The main issue before was that the kernel_builder was a C++ data type passed to Python as a unique_ptr, and when the instance went out of scope it was deleted, thus causing issues for parent scope python code.
Yes, I tried that code as is with version 0.7.0. It fails and gives Segmentation Fault.
For instance, the following snippet:
import cudaq
def ghz(n):
kernel = cudaq.make_kernel()
qubits = kernel.qalloc(n)
kernel.h(qubits[0])
for i in range(n):
if i + 1 == n:
break
else:
kernel.cx(qubits[i], qubits[i + 1])
kernel.mz(qubits)
return kernel
cudaq.set_target("nvidia")
cudaq.sample(ghz(4)).dump()
cudaq.set_target("qpp-cpu")
cudaq.sample(ghz(4)).dump()
cudaq.set_target("nvidia-mqpu")
cudaq.sample_async(ghz(4), qpu_id=0).get().dump()
gives:
{ 0000:498 1111:502 }
{ 0000:510 1111:490 }
Segmentation fault
This is definitely a scoping issue. The following works fine
cudaq.set_target("nvidia-mqpu")
k = ghz(4)
cudaq.sample_async(k, qpu_id=0).get().dump()
since the kernel is stored to k and hence has a live user during sample_async execution.
cudaq.sample_async(ghz(4), qpu_id=0).get().dump()
works if I manually increase the reference count on the input py::object
diff --git a/python/runtime/cudaq/algorithms/py_sample_async.cpp b/python/runtime/cudaq/algorithms/py_sample_async.cpp
index bcee2020f..fc77ef209 100644
--- a/python/runtime/cudaq/algorithms/py_sample_async.cpp
+++ b/python/runtime/cudaq/algorithms/py_sample_async.cpp
@@ -48,8 +48,9 @@ for more information on this programming pattern.)#")
mod.def(
"sample_async",
- [&](py::object &kernel, py::args args, std::size_t shots,
+ [&](py::object kernel, py::args args, std::size_t shots,
std::size_t qpu_id) {
+ kernel.inc_ref();
auto &platform = cudaq::get_platform();
if (py::hasattr(kernel, "compile"))
kernel.attr("compile")();
@@ -78,8 +79,8 @@ for more information on this programming pattern.)#")
// Should only have C++ going on here, safe to release the GIL
py::gil_scoped_release release;
return cudaq::details::runSamplingAsync(
- [argData, kernelName, kernelMod, shots,
- hasQubitMeasurementFeedback]() mutable {
+ [argData, kernelName, kernelMod, shots, hasQubitMeasurementFeedback,
+ &kernel]() mutable {
static std::size_t localShots = 0;
pyAltLaunchKernel(kernelName, kernelMod, *argData, {});
// delete the raw arg data pointer.
@@ -87,6 +88,8 @@ for more information on this programming pattern.)#")
if (localShots == shots - 1)
delete argData;
localShots++;
+
+ kernel.dec_ref();
}
},
platform, kernelName, shots, qpu_id);
This is definitely a scoping issue. The following works fine
cudaq.set_target("nvidia-mqpu") k = ghz(4) cudaq.sample_async(k, qpu_id=0).get().dump()since the kernel is stored to
kand hence has a live user during sample_async execution.cudaq.sample_async(ghz(4), qpu_id=0).get().dump()works if I manually increase the reference count on the input
py::object...
Wouldn't this apply to observe_async as well?
I would think so.