[RFC] [Language] Quantum allocation with state initialization
TODO:
- [x] Library mode implementation (@amccaskey)
- [x] Python support for kernel_builder (@amccaskey)
- [ ] Python support for ast_bridge kernels (@annagrin @amccaskey)
- [x] C++ Bridge Support for qvector / qubit initialization (@schweitzpgi)
- [x] C++ kernel_builder support for qalloc (@amccaskey)
- [x] Validate C++ kernel_builder approach (check Alex on his work, @schweitzpgi)
- [x] Error checking on number of elements in MLIR Verifier (@schweitzpgi )
- [x] Simulation subclass work (implementing
CircuitSimulator::addQubitsToState(with data)), need kron-prod on GPU (@anthony-santana) - [x] Qubit initializer list
- [ ] Check
vector<complex>kernel input works end-to-end (@anthony-santana) - [ ] House-keeping: tests, python tests errors (@anthony-santana, @annagrin)
- [ ] Examples (@anthony-santana)
- [ ] Documentation page (@anthony-santana @schweitzpgi @amccaskey)
- [ ] cudaq::state input (@1tnguyen and @schweitzpgi, requires #1467)
- [ ] Density Matrix and TensorNet backends updates will require #1467
- [ ] Implement original
from_statedecomposition in MLIR (@boschmitt) - [ ] Simulation scalar type work - compile time error messages for incompatible object files
Example with #1467
kernel(cudaq::state& initState) {
cudaq::qvector q = initState;
}
def kernel(initState : cudaq.state):
q = cudaq.qvector(initState)`
I propose we update the language to support quantum allocation with user-provided initial state specification. This should supersede functions like from_state(...) on the kernel_builder.
C++:
New constructors
qubit::qubit(const vector<complex<double>>&);
qubit::qubit(const initializer_list<complex<double>>&);
qvector::qvector(const vector<complex<double>>&);
qvector::qvector(const initializer_list<complex<double>>&);
New builder method
QuakeValue qalloc(vector<complex<double>> &)
Python
The Python builder would be similar as in the following.
v = [0., 1., 1., 0.]
qubits = kernel.qalloc(v)
@cudaq.kernel
def test(vec : list[complex]):
q = cudaq.qvector(vec)
...
C++ Usage
The following snippet demonstrates what this might look like:
__qpu__ auto test0() {
// Init from state vector
cudaq::qubit q = {0., 1.};
return mz(q);
}
__qpu__ auto test1() {
// Init from predefined state vectors
cudaq::qubit q = cudaq::ket::one;
return mz(q);
}
__qpu__ void test2() {
// Init from state vector
cudaq::qubit q = {M_SQRT1_2, M_SQRT1_2};
}
__qpu__ void test3() {
// Init from state vector
cudaq::qvector q = {M_SQRT1_2, 0., 0., M_SQRT1_2};
}
__qpu__ void test4(const std::vector<cudaq::complex> &state) {
// State vector from host
cudaq::qvector q = state;
}
void useBuilder() {
std::vector<cudaq::complex> state{M_SQRT1_2, 0., 0., M_SQRT1_2};
{
// (deferred) qubit allocation from concrete state vector
auto kernel = cudaq::make_kernel();
auto qubitsInitialized = kernel.qalloc(state);
}
{
// kernel parameterized on input state data
auto [kernel, inState] = cudaq::make_kernel<std::vector<cudaq::complex>>();
auto qubitsInitialized = kernel.qalloc(inState);
cudaq::sample(kernel, state).dump();
}
}
Python usage
Vectors of complex or floating-point numbers
Notes
- Implicit conversion from a list of float to a list of complex is allowed on argument passing.
- Automatic conversion of initializer elements will happen if the precision of the numbers in
qvectorinitializer does not match the current simulation precision- Emit warning on conversion due to performance concerns, recommend using
cudaq.amplitudesorcudaq.complex.
- Emit warning on conversion due to performance concerns, recommend using
# Passing complex vectors as params
c = [.70710678 + 0j, 0., 0., 0.70710678]
@cudaq.kernel
def kernel(vec: list[complex]):
q = cudaq.qvector(vec)
# Capturing complex vectors
c = [.70710678 + 0j, 0., 0., 0.70710678]
@cudaq.kernel
def kernel():
q = cudaq.qvector(c)
# Capturing complex vectors and converting to
# numpy array inside the kernel
c = [.70710678 + 0j, 0., 0., 0.70710678]
@cudaq.kernel
def kernel():
q = cudaq.qvector(np.array(c))
# Creating complex arrays inside kernels
@cudaq.kernel
def kernel():
q = cudaq.qvector([1.0 + 0j, 0., 0., 1.])
Numpy arrays
# From np array created inside a kernel with a complex dtype
c = [.70710678 + 0j, 0., 0., 0.70710678]
@cudaq.kernel
def kernel(vec: list[complex]):
q = cudaq.qvector(np.array(vec, dtype=complex))
c = [.70710678 + 0j, 0., 0., 0.70710678]
@cudaq.kernel
def kernel(vec: list[complex]):
q = cudaq.qvector(np.array(vec, dtype=np.complex64))
# Using precision-agnostic API
c = [.70710678 + 0j, 0., 0., 0.70710678]
@cudaq.kernel
def kernel(vec: list[complex]):
q = cudaq.qvector(np.array(vec, dtype=cudaq.complex()))
c = cudaq.amplitudes([.70710678, 0., 0., 0.70710678])
@cudaq.kernel
def kernel(vec: list[complex]):
q = cudaq.qvector(vec)
# Passing np arrays as params
c = np.array(c, dtype=cudaq.complex())
@cudaq.kernel
def kernel(vec: np.array):
q = cudaq.qvector(vec)
c = np.array(c, dtype=cudaq.complex())
@cudaq.kernel
def kernel(vec: np.ndarray):
q = cudaq.qvector(vec)
c = np.array(c, dtype=cudaq.complex())
@cudaq.kernel
def kernel(vec: np.ndarray[any, complex]):
q = cudaq.qvector(vec)
For library-mode / simulation we pass the state data along to NVQIR. For physical backends, we can replace runtime state data with the result of a circuit synthesis pass (like the current implementation in from_state(...).
Thanks @amccaskey for proposing this.
I have a clarification question regarding the semantics of a cudaq::qvector's state. For example, what will be the return value of the following kernel?
__qpu__ bool foo() {
// Init from state vector
cudaq::qvector q = {0., 1., 0., 0.};
return cudaq::mz(q[0]);
}
I see two possibilities:
- It returns
true.- Rationale: Index
1in the initializer list corresponds to state|1>(or, in binary,|0b01>---the state is interpreted as a number). Ifq[0]corresponds to the least significant (qu)bit and the state is interpreted as a number, then the state ofq[0]is|0b1>and thusmz(q[0])returnstrue.
- Rationale: Index
- It returns
false:- Rationale: Index
1in the initializer list corresponds to state|0>|1>(or|0,1>---here I could have used the short syntax|01>but I want to make the point that this state should not be interpreted as a number, but as bitstring---or vector of bits). Henceq[0]is|0>andmz(q[0])returnsfalse.
- Rationale: Index
Thinking a bit forward, it seems to me that the second option is more appropriate. Eventually, we can define a quantum integer type, say cudaq::qint, in which the state must be interpreted as a number:
__qpu__ bool foo() {
// Init from state vector
cudaq::qint q = {0., 1., 0., 0.};
return cudaq::mz(q[0]);
}
In this case, the kernel must return true.
To make the API future-proof, we could also consider adding an optional bit-ordering vector argument (similar to custatevec API).
cudaq::qvector q({0., 1., 0., 0.}, {0, 1}); => q[0] should be |1>
cudaq::qvector q({0., 1., 0., 0.}, {1, 0}); => q[1] should be |1>
The default when none provided could be one of those two endian conventions, e.g., LSB.
Thanks @amccaskey for proposing this.
I have a clarification question regarding the semantics of a
cudaq::qvector's state. For example, what will be the return value of the following kernel?__qpu__ bool foo() { // Init from state vector cudaq::qvector q = {0., 1., 0., 0.}; return cudaq::mz(q[0]); }I see two possibilities:
It returns
true.
- Rationale: Index
1in the initializer list corresponds to state|1>(or, in binary,|0b01>---the state is interpreted as a number). Ifq[0]corresponds to the least significant (qu)bit and the state is interpreted as a number, then the state ofq[0]is|0b1>and thusmz(q[0])returnstrue.It returns
false:
- Rationale: Index
1in the initializer list corresponds to state|0>|1>(or|0,1>---here I could have used the short syntax|01>but I want to make the point that this state should not be interpreted as a number, but as bitstring---or vector of bits). Henceq[0]is|0>andmz(q[0])returnsfalse.Thinking a bit forward, it seems to me that the second option is more appropriate. Eventually, we can define a quantum integer type, say
cudaq::qint, in which the state must be interpreted as a number:__qpu__ bool foo() { // Init from state vector cudaq::qint q = {0., 1., 0., 0.}; return cudaq::mz(q[0]); }In this case, the kernel must return
true.
@boschmitt I prefer we go with bullet 2.
@boschmitt for your qint example, I was hoping to support cudaq::qint q = 4; instead of the initializer list. Do you foresee any gotchas there?
One thing to add, it will likely be good to update the cudaq::state definition to be backend specific, and allow it as input to a CUDA Quantum kernel. If it is backend specific, we can have the sub-type hold a GPU device pointer and avoid copying the large data vector from device to host.
__qpu__ void test4(const cudaq::state &state) {
// Input state could wrap GPU device pointer
cudaq::qvector q = state;
... build off initial state ...
}
void useTest4() {
auto initStateGen = [](...) __qpu__ { ... };
auto intState = cudaq::get_state(initStateGen, ...);
cudaq::sample(test4, initState).dump();
}
I was hoping to support
cudaq::qint q = 4;instead of the initializer list. Do you foresee any gotchas there?
Would this be interpreted as the bitstring 1,0,0? You would need to know how many leading zeros are needed, so maybe an additional constructor parameter that is nQubits.
If the goal is to construct states restricted to the computational basis, I would think rather than qint we could add qvector(const std::vector<bool>&);. Here the vector is of length nQubits, rather than 2**nQubits, and the construction is just specified by the bitstring.
@boschmitt for your
qintexample, I was hoping to supportcudaq::qint q = 4;instead of the initializer list. Do you foresee any gotchas there?
We can certainly support it, but we would still have to define what it means with respect to a state vector. There will be more question to answer in order to support this idea. For example:
- How many qubits
cudaq::qint q = 4creates? A fixed number, say 8, or the minimum necessary to represent4? - Would we allow users to easily access the qubits, e.g., using
q[0]? If we do, what wouldq[0]return? - Would the user be able to create a
cudaq::qintin which the state is a superposition of different integers ? If we allow, how the indices on the initializer list relate to the integers represented by the state?
Let me try to rephrase my questions: If we have a set of qubits we can try to initialize this set using a state vector, then the we need clarity on:
- How does the index of the state vector relates to the state, e.g. given a 3 qubit state vector, does index
1corresponds to to the state represented as bitstring|001>? - Depending on the type, e.g
cudaq::vectorofcudaq::qint, does the interpretation of|001>changes? For example, if the type iscudaq::qvectorwe interpret the state as|0, 0, 1>andq[0]state is|0>; if the type iscudaq::qintwe interpret the state as|0b001>andq[0]is|1>.
@boschmitt for your
qintexample, I was hoping to supportcudaq::qint q = 4;instead of the initializer list. Do you foresee any gotchas there?We can certainly support it, but we would still have to define what it means with respect to a state vector. There will be more question to answer in order to support this idea. For example:
- How many qubits
cudaq::qint q = 4creates? A fixed number, say 8, or the minimum necessary to represent4?- Would we allow users to easily access the qubits, e.g., using
q[0]? If we do, what wouldq[0]return?- Would the user be able to create a
cudaq::qintin which the state is a superposition of different integers ? If we allow, how the indices on the initializer list relate to the integers represented by the state?Let me try to rephrase my questions: If we have a set of qubits we can try to initialize this set using a state vector, then the we need clarity on:
- How does the index of the state vector relates to the state, e.g. given a 3 qubit state vector, does index
1corresponds to to the state represented as bitstring|001>?- Depending on the type, e.g
cudaq::vectorofcudaq::qint, does the interpretation of|001>changes? For example, if the type iscudaq::qvectorwe interpret the state as|0, 0, 1>andq[0]state is|0>; if the type iscudaq::qintwe interpret the state as|0b001>andq[0]is|1>.
I guess qint may be a bit beyond this RFC, but to answer your first question - for qint we might want a template parameter for the size of the qubit register qint<N> and then typedefs for common ones.
I guess
qintmay be a bit beyond this RFC
I agree.
The main point for which I asked clarification is with regards of how the semantics of the state vector relates to the type, cudaq::qvector, and accessing individual qubits. I provided two takes on it and it seems there is a preference for the second. The cudaq::qint digression is just a thought experiment to see how our decision will stand the test of time and possible CUDA Quantum evolutions.
qubit initializer list
See PR #1461
@schweitzpgi I think we can probably start thinking about MLIR support and QIR lowering for a quake.state type to support
__qpu__ void kernel(cudaq::state inState) {
cudaq::qvector q = inState;
...
}
in anticipation of #1467. I think we can just treat this like Clang would and lower to a ptr and update the InitializeStateOp lowering to invoke a new NVQIR function.
Some notes about the ownership semantics for cudaq::state after implementation experimentation in https://github.com/NVIDIA/cuda-quantum/pull/1542 for discussion:
-
Currently,
CircuitSimulatorandSimulationStateimplementations often assume ownership of the underlying memory resources (exchanged atgetSimulationState). -
User-facing
cudaq::stateclass (the result ofget_state) would ideally hold theSimulationStatein a shared ownership manner (to make passing the state around more conveniently + Python bindings, etc.) -
Passing this
cudaq::stateback to a quantum kernel forqvectorallocations can be implemented in a couple of ways:
(1) Adopting reference semantics e.g.,
__qpu__ void kernel(cudaq::state inState) {
cudaq::qvector q{inState};
...
}
// The above would be equivalent to this.
__qpu__ void kernel1(cudaq::state& inState) {
cudaq::qvector q{inState};
...
}
auto myState = cudaq::get_state(some_kernel);
kernel(myState); // or kernel1(myState);
// ==> myState is updated as a result of kernel execution
-
Simulators to cope with both ownership scenarios for the underlying data (e.g.,
qpp::ket, device memory, etc.) For example, subsequentgetSimulationStatemay need to perform a copy if the current state is not owned by the simulator. -
A referenced
statethat was involved in a sub-state allocation needs consideration:
__qpu__ void kernel(cudaq::state inState) {
cudaq::qvector my_vec(N); // default init
...
// Adding some qubits in a state
cudaq::qvector q{inState};
...
}
auto myState = cudaq::get_state(some_kernel);
kernel(myState);
// should myState be updated to reflect the whole simulator's state (+N qubits)
// or stay the same (i.e., the user needs to call a `get_state` explicitly to get the new state)?
(2) Adopting move semantics
- The user needs to explicitly move the state to pass it to the simulator (giving it back to the simulator).
__qpu__ void kernel(cudaq::state&& inState) {
cudaq::qvector q{std::move(inState)};
...
}
auto myState = cudaq::get_state(some_kernel);
kernel(std::move(myState));
// User has passed the ownership of the state to the kernel.
- We could (theoretically) also distinguish/support different
qvectorallocation signatures:
qvector(cudaq::state &&initState) ==> move
qvector(const cudaq::state &initState) => copy state (inside the simulator)
qvector(cudaq::state initState) ==> copy state (by the state)
-
The user needs to call
get_stateafterward to get the new state (the state is moved out of the simulator back to the user). -
The obvious downside is the boilerplate associated with
std::moveto make sure thestateis passing around most efficiently.
Building on Thien's comment above on move semantics, the qvector class is itself not claiming any ownership over the state object. The qvector ctor does pass the state object to the execution manager.
For performance reasons, let the user decide what happens to the state object used in the qvector ctor.
Move semantics
This seems straightforward. The cudaq::state object is moved into the execution manager, not copied.
cudaq::state state = ...;
qvector q(std::move(state));
// the variable `state` is dead/invalid at this point
In this case, this code can be optimized a bit since state is dead and no reference counting or destructing need take place.
Reference copy semantics
If the cudaq::state class can be a "reference wrapper". In that case a "copy" is shallow and only copies the pointer. The data itself is fully shared and gets reclaimed when the last reference goes out-of-scope. This adds some overhead and possibly leaks state information in a less intuitive way. See the following example.
cudaq::state state = ...;
qvector q1(state); // calls qvector(cudaq::state);
// the variable `state` still has a reference to the state information
...
// the `state` information, while it can clearly be referenced may have _changed_ in the code above
qvector q2(state); // Surprise? q2 does not have the same initial state as q1!
In the updated code, we'd add the full set of qvector constructor signatures from a state, e.g.,
qvector(const cudaq::state &initState);
qvector(cudaq::state &initState);
qvector(cudaq::state &&initState);
qvector(cudaq::state initState);
In particular, the const & signature would propagate to the simulator and ask it to create its copy of the state.
In the above example, adding const would make the reference state constant (if it was the intent).
const cudaq::state state = ...;
qvector q1(state); // calls qvector(const cudaq::state&);
// the simulator will make a copy of the state to do simulation
...
// the `state` information would still be the same
qvector q2(state);
Would we like to support the following python cases?
# Passing np arrays as params
c = np.array(c, dtype=cudaq.complex())
@cudaq.kernel
def kernel(vec: np.array):
q = cudaq.qvector(vec)
c = np.array(c, dtype=cudaq.complex())
@cudaq.kernel
def kernel(vec: np.ndarray):
q = cudaq.qvector(vec)
c = np.array(c, dtype=cudaq.complex())
@cudaq.kernel
def kernel(vec: np.ndarray[any, complex]):
q = cudaq.qvector(vec)
Closing this - any remaining work is tracked separately.