cuda-quantum [RFC] [Language] Quantum allocation with state initialization

TODO:

[x] Library mode implementation (@amccaskey)
[x] Python support for kernel_builder (@amccaskey)
[ ] Python support for ast_bridge kernels (@annagrin @amccaskey)
[x] C++ Bridge Support for qvector / qubit initialization (@schweitzpgi)
[x] C++ kernel_builder support for qalloc (@amccaskey)
[x] Validate C++ kernel_builder approach (check Alex on his work, @schweitzpgi)
[x] Error checking on number of elements in MLIR Verifier (@schweitzpgi )
[x] Simulation subclass work (implementing CircuitSimulator::addQubitsToState(with data)), need kron-prod on GPU (@anthony-santana)
[x] Qubit initializer list
[ ] Check vector<complex> kernel input works end-to-end (@anthony-santana)
[ ] House-keeping: tests, python tests errors (@anthony-santana, @annagrin)
[ ] Examples (@anthony-santana)
[ ] Documentation page (@anthony-santana @schweitzpgi @amccaskey)
[ ] cudaq::state input (@1tnguyen and @schweitzpgi, requires #1467)
[ ] Density Matrix and TensorNet backends updates will require #1467
[ ] Implement original from_state decomposition in MLIR (@boschmitt)
[ ] Simulation scalar type work - compile time error messages for incompatible object files

Example with #1467

kernel(cudaq::state& initState) {
  cudaq::qvector q = initState;
}

def kernel(initState : cudaq.state):
   q = cudaq.qvector(initState)`

I propose we update the language to support quantum allocation with user-provided initial state specification. This should supersede functions like from_state(...) on the kernel_builder.

C++:

New constructors

  qubit::qubit(const vector<complex<double>>&);
  qubit::qubit(const initializer_list<complex<double>>&);
  qvector::qvector(const vector<complex<double>>&);
  qvector::qvector(const initializer_list<complex<double>>&);

New builder method

  QuakeValue qalloc(vector<complex<double>> &)

Python

The Python builder would be similar as in the following.

  v = [0., 1., 1., 0.]
  qubits = kernel.qalloc(v)

@cudaq.kernel 
def test(vec : list[complex]):
   q = cudaq.qvector(vec)
   ...

C++ Usage

The following snippet demonstrates what this might look like:

__qpu__ auto test0() {
  // Init from state vector
  cudaq::qubit q = {0., 1.};
  return mz(q);
}

__qpu__ auto test1() {
  // Init from predefined state vectors
  cudaq::qubit q = cudaq::ket::one;
  return mz(q);
}

__qpu__ void test2() { 
  // Init from state vector
  cudaq::qubit q = {M_SQRT1_2, M_SQRT1_2}; 
}

__qpu__ void test3() { 
  // Init from state vector 
  cudaq::qvector q = {M_SQRT1_2, 0., 0., M_SQRT1_2}; 
}

__qpu__ void test4(const std::vector<cudaq::complex> &state) {
  // State vector from host 
  cudaq::qvector q = state;
}

void useBuilder() {
  std::vector<cudaq::complex> state{M_SQRT1_2, 0., 0., M_SQRT1_2}; 

  {
    // (deferred) qubit allocation from concrete state vector
    auto kernel = cudaq::make_kernel();
    auto qubitsInitialized = kernel.qalloc(state);
  }
  {
    // kernel parameterized on input state data
    auto [kernel, inState] = cudaq::make_kernel<std::vector<cudaq::complex>>();
    auto qubitsInitialized = kernel.qalloc(inState); 
   
    cudaq::sample(kernel, state).dump();
  }
}

Python usage

Vectors of complex or floating-point numbers

Notes

Implicit conversion from a list of float to a list of complex is allowed on argument passing.
Automatic conversion of initializer elements will happen if the precision of the numbers in qvector initializer does not match the current simulation precision
- Emit warning on conversion due to performance concerns, recommend using cudaq.amplitudes or cudaq.complex.

# Passing complex vectors as params
c = [.70710678 + 0j, 0., 0., 0.70710678]
@cudaq.kernel
def kernel(vec: list[complex]):
    q = cudaq.qvector(vec)

# Capturing complex vectors
c = [.70710678 + 0j, 0., 0., 0.70710678]
@cudaq.kernel
def kernel():
    q = cudaq.qvector(c)

# Capturing complex vectors and converting to 
# numpy array inside the kernel
c = [.70710678 + 0j, 0., 0., 0.70710678]
@cudaq.kernel
def kernel():
    q = cudaq.qvector(np.array(c))

# Creating complex arrays inside kernels
@cudaq.kernel
def kernel():
    q = cudaq.qvector([1.0 + 0j, 0., 0., 1.])

Numpy arrays

# From np array created inside a kernel with a complex dtype
c = [.70710678 + 0j, 0., 0., 0.70710678]
@cudaq.kernel
def kernel(vec: list[complex]):
    q = cudaq.qvector(np.array(vec, dtype=complex))

c = [.70710678 + 0j, 0., 0., 0.70710678]
@cudaq.kernel
def kernel(vec: list[complex]):
    q = cudaq.qvector(np.array(vec, dtype=np.complex64))

# Using precision-agnostic API
c = [.70710678 + 0j, 0., 0., 0.70710678]
@cudaq.kernel
def kernel(vec: list[complex]):
    q = cudaq.qvector(np.array(vec, dtype=cudaq.complex()))

c = cudaq.amplitudes([.70710678, 0., 0., 0.70710678])
@cudaq.kernel
def kernel(vec: list[complex]):
    q = cudaq.qvector(vec)


# Passing np arrays as params
c = np.array(c, dtype=cudaq.complex())
@cudaq.kernel
def kernel(vec: np.array):
    q = cudaq.qvector(vec)

c = np.array(c, dtype=cudaq.complex())
@cudaq.kernel
def kernel(vec: np.ndarray):
    q = cudaq.qvector(vec)

c = np.array(c, dtype=cudaq.complex())
@cudaq.kernel
def kernel(vec: np.ndarray[any, complex]):
    q = cudaq.qvector(vec)

For library-mode / simulation we pass the state data along to NVQIR. For physical backends, we can replace runtime state data with the result of a circuit synthesis pass (like the current implementation in from_state(...).

Jan 11 '24 19:01 amccaskey

Thanks @amccaskey for proposing this.

I have a clarification question regarding the semantics of a cudaq::qvector's state. For example, what will be the return value of the following kernel?

__qpu__ bool foo() { 
  // Init from state vector 
  cudaq::qvector q = {0., 1., 0., 0.};
  return cudaq::mz(q[0]);
}

I see two possibilities:

It returns true.
- Rationale: Index 1 in the initializer list corresponds to state |1> (or, in binary, |0b01>---the state is interpreted as a number). If q[0] corresponds to the least significant (qu)bit and the state is interpreted as a number, then the state of q[0] is |0b1> and thus mz(q[0]) returns true.
It returns false:
- Rationale: Index 1 in the initializer list corresponds to state |0>|1> (or |0,1>---here I could have used the short syntax |01> but I want to make the point that this state should not be interpreted as a number, but as bitstring---or vector of bits). Hence q[0] is |0> and mz(q[0]) returns false.

Thinking a bit forward, it seems to me that the second option is more appropriate. Eventually, we can define a quantum integer type, say cudaq::qint, in which the state must be interpreted as a number:

__qpu__ bool foo() { 
  // Init from state vector 
  cudaq::qint q = {0., 1., 0., 0.};
  return cudaq::mz(q[0]);
}

In this case, the kernel must return true.

Jan 17 '24 00:01 boschmitt

To make the API future-proof, we could also consider adding an optional bit-ordering vector argument (similar to custatevec API).

cudaq::qvector q({0., 1., 0., 0.}, {0, 1}); => q[0] should be |1> cudaq::qvector q({0., 1., 0., 0.}, {1, 0}); => q[1] should be |1>

The default when none provided could be one of those two endian conventions, e.g., LSB.

Jan 17 '24 02:01 1tnguyen

Thanks @amccaskey for proposing this.

I have a clarification question regarding the semantics of a cudaq::qvector's state. For example, what will be the return value of the following kernel?
__qpu__ bool foo() { 
  // Init from state vector 
  cudaq::qvector q = {0., 1., 0., 0.};
  return cudaq::mz(q[0]);
}
I see two possibilities:

It returns true.

Rationale: Index 1 in the initializer list corresponds to state |1> (or, in binary, |0b01>---the state is interpreted as a number). If q[0] corresponds to the least significant (qu)bit and the state is interpreted as a number, then the state of q[0] is |0b1> and thus mz(q[0]) returns true.

It returns false:

Rationale: Index 1 in the initializer list corresponds to state |0>|1> (or |0,1>---here I could have used the short syntax |01> but I want to make the point that this state should not be interpreted as a number, but as bitstring---or vector of bits). Hence q[0] is |0> and mz(q[0]) returns false.

Thinking a bit forward, it seems to me that the second option is more appropriate. Eventually, we can define a quantum integer type, say cudaq::qint, in which the state must be interpreted as a number:
__qpu__ bool foo() { 
  // Init from state vector 
  cudaq::qint q = {0., 1., 0., 0.};
  return cudaq::mz(q[0]);
}
In this case, the kernel must return true.

@boschmitt I prefer we go with bullet 2.

Jan 17 '24 13:01 amccaskey

@boschmitt for your qint example, I was hoping to support cudaq::qint q = 4; instead of the initializer list. Do you foresee any gotchas there?

Jan 17 '24 15:01 amccaskey

One thing to add, it will likely be good to update the cudaq::state definition to be backend specific, and allow it as input to a CUDA Quantum kernel. If it is backend specific, we can have the sub-type hold a GPU device pointer and avoid copying the large data vector from device to host.

__qpu__ void test4(const cudaq::state &state) {
  // Input state could wrap GPU device pointer 
  cudaq::qvector q = state;
  ... build off initial state ... 
}

void useTest4() {
  auto initStateGen = [](...) __qpu__ { ... }; 
  auto intState = cudaq::get_state(initStateGen, ...); 
  cudaq::sample(test4, initState).dump();
}

Jan 17 '24 20:01 amccaskey

I was hoping to support cudaq::qint q = 4; instead of the initializer list. Do you foresee any gotchas there?

Would this be interpreted as the bitstring 1,0,0? You would need to know how many leading zeros are needed, so maybe an additional constructor parameter that is nQubits.

If the goal is to construct states restricted to the computational basis, I would think rather than qint we could add qvector(const std::vector<bool>&);. Here the vector is of length nQubits, rather than 2**nQubits, and the construction is just specified by the bitstring.

Jan 17 '24 23:01 justinlietz

@boschmitt for your qint example, I was hoping to support cudaq::qint q = 4; instead of the initializer list. Do you foresee any gotchas there?

We can certainly support it, but we would still have to define what it means with respect to a state vector. There will be more question to answer in order to support this idea. For example:

How many qubits cudaq::qint q = 4 creates? A fixed number, say 8, or the minimum necessary to represent 4?
Would we allow users to easily access the qubits, e.g., using q[0]? If we do, what would q[0] return?
Would the user be able to create a cudaq::qint in which the state is a superposition of different integers ? If we allow, how the indices on the initializer list relate to the integers represented by the state?

Let me try to rephrase my questions: If we have a set of qubits we can try to initialize this set using a state vector, then the we need clarity on:

How does the index of the state vector relates to the state, e.g. given a 3 qubit state vector, does index 1 corresponds to to the state represented as bitstring |001>?
Depending on the type, e.g cudaq::vector of cudaq::qint, does the interpretation of |001> changes? For example, if the type is cudaq::qvector we interpret the state as |0, 0, 1> and q[0] state is |0>; if the type is cudaq::qint we interpret the state as |0b001> and q[0] is |1>.

Jan 17 '24 23:01 boschmitt

@boschmitt for your qint example, I was hoping to support cudaq::qint q = 4; instead of the initializer list. Do you foresee any gotchas there?

We can certainly support it, but we would still have to define what it means with respect to a state vector. There will be more question to answer in order to support this idea. For example:

How many qubits cudaq::qint q = 4 creates? A fixed number, say 8, or the minimum necessary to represent 4?

Would we allow users to easily access the qubits, e.g., using q[0]? If we do, what would q[0] return?

Would the user be able to create a cudaq::qint in which the state is a superposition of different integers ? If we allow, how the indices on the initializer list relate to the integers represented by the state?

Let me try to rephrase my questions: If we have a set of qubits we can try to initialize this set using a state vector, then the we need clarity on:

How does the index of the state vector relates to the state, e.g. given a 3 qubit state vector, does index 1 corresponds to to the state represented as bitstring |001>?

Depending on the type, e.g cudaq::vector of cudaq::qint, does the interpretation of |001> changes? For example, if the type is cudaq::qvector we interpret the state as |0, 0, 1> and q[0] state is |0>; if the type is cudaq::qint we interpret the state as |0b001> and q[0] is |1>.

I guess qint may be a bit beyond this RFC, but to answer your first question - for qint we might want a template parameter for the size of the qubit register qint<N> and then typedefs for common ones.

Jan 18 '24 13:01 amccaskey

I guess qint may be a bit beyond this RFC

I agree.

The main point for which I asked clarification is with regards of how the semantics of the state vector relates to the type, cudaq::qvector, and accessing individual qubits. I provided two takes on it and it seems there is a preference for the second. The cudaq::qint digression is just a thought experiment to see how our decision will stand the test of time and possible CUDA Quantum evolutions.

Jan 18 '24 17:01 boschmitt

qubit initializer list

See PR #1461

Apr 02 '24 20:04 schweitzpgi

@schweitzpgi I think we can probably start thinking about MLIR support and QIR lowering for a quake.state type to support

__qpu__ void kernel(cudaq::state inState) { 
  cudaq::qvector q = inState;
  ...
}

in anticipation of #1467. I think we can just treat this like Clang would and lower to a ptr and update the InitializeStateOp lowering to invoke a new NVQIR function.

Apr 10 '24 12:04 amccaskey

Some notes about the ownership semantics for cudaq::state after implementation experimentation in https://github.com/NVIDIA/cuda-quantum/pull/1542 for discussion:

Currently, CircuitSimulator and SimulationState implementations often assume ownership of the underlying memory resources (exchanged at getSimulationState).
User-facing cudaq::state class (the result of get_state) would ideally hold the SimulationState in a shared ownership manner (to make passing the state around more conveniently + Python bindings, etc.)
Passing this cudaq::state back to a quantum kernel for qvector allocations can be implemented in a couple of ways:

(1) Adopting reference semantics e.g.,

__qpu__ void kernel(cudaq::state inState) { 
  cudaq::qvector q{inState};
  ...
}
// The above would be equivalent to this.
__qpu__ void kernel1(cudaq::state& inState) { 
  cudaq::qvector q{inState};
  ...
}

auto myState = cudaq::get_state(some_kernel);
kernel(myState); // or  kernel1(myState);
// ==> myState is updated as a result of kernel execution

Simulators to cope with both ownership scenarios for the underlying data (e.g., qpp::ket, device memory, etc.) For example, subsequent getSimulationState may need to perform a copy if the current state is not owned by the simulator.
A referenced state that was involved in a sub-state allocation needs consideration:

__qpu__ void kernel(cudaq::state inState) { 
  cudaq::qvector my_vec(N); // default init 
  ...
  // Adding some qubits in a state
  cudaq::qvector q{inState};
  ...
}


auto myState = cudaq::get_state(some_kernel);
kernel(myState); 
// should myState be updated to reflect the whole simulator's state (+N qubits)
// or stay the same (i.e., the user needs to call a `get_state` explicitly to get the new state)?

(2) Adopting move semantics

The user needs to explicitly move the state to pass it to the simulator (giving it back to the simulator).

__qpu__ void kernel(cudaq::state&& inState) { 
  cudaq::qvector q{std::move(inState)};
  ...
}


auto myState = cudaq::get_state(some_kernel);
kernel(std::move(myState)); 
// User has passed the ownership of the state to the kernel.

We could (theoretically) also distinguish/support different qvector allocation signatures:

 qvector(cudaq::state &&initState) ==> move
 qvector(const cudaq::state &initState) => copy state (inside the simulator)
 qvector(cudaq::state initState) ==> copy state (by the state)

The user needs to call get_state afterward to get the new state (the state is moved out of the simulator back to the user).
The obvious downside is the boilerplate associated with std::move to make sure the state is passing around most efficiently.

Apr 22 '24 03:04 1tnguyen

Building on Thien's comment above on move semantics, the qvector class is itself not claiming any ownership over the state object. The qvector ctor does pass the state object to the execution manager.

For performance reasons, let the user decide what happens to the state object used in the qvector ctor.

Move semantics

This seems straightforward. The cudaq::state object is moved into the execution manager, not copied.

cudaq::state state = ...;
qvector q(std::move(state));
// the variable `state` is dead/invalid at this point

In this case, this code can be optimized a bit since state is dead and no reference counting or destructing need take place.

Reference copy semantics

If the cudaq::state class can be a "reference wrapper". In that case a "copy" is shallow and only copies the pointer. The data itself is fully shared and gets reclaimed when the last reference goes out-of-scope. This adds some overhead and possibly leaks state information in a less intuitive way. See the following example.

cudaq::state state = ...;
qvector q1(state);  // calls qvector(cudaq::state);
// the variable `state` still has a reference to the state information
...
// the `state` information, while it can clearly be referenced may have _changed_ in the code above
qvector q2(state); // Surprise? q2 does not have the same initial state as q1!

Apr 23 '24 18:04 schweitzpgi

In the updated code, we'd add the full set of qvector constructor signatures from a state, e.g.,

qvector(const cudaq::state &initState); 
qvector(cudaq::state &initState); 
qvector(cudaq::state &&initState);
qvector(cudaq::state initState);

In particular, the const & signature would propagate to the simulator and ask it to create its copy of the state.

In the above example, adding const would make the reference state constant (if it was the intent).

const cudaq::state state = ...;
qvector q1(state);  // calls qvector(const cudaq::state&);
// the simulator will make a copy of the state to do simulation
...
// the `state` information would still be the same
qvector q2(state);

Apr 23 '24 18:04 1tnguyen

Would we like to support the following python cases?

# Passing np arrays as params
c = np.array(c, dtype=cudaq.complex())
@cudaq.kernel
def kernel(vec: np.array):
    q = cudaq.qvector(vec)

c = np.array(c, dtype=cudaq.complex())
@cudaq.kernel
def kernel(vec: np.ndarray):
    q = cudaq.qvector(vec)

c = np.array(c, dtype=cudaq.complex())
@cudaq.kernel
def kernel(vec: np.ndarray[any, complex]):
    q = cudaq.qvector(vec)

May 15 '24 18:05 annagrin

Closing this - any remaining work is tracked separately.

Aug 07 '24 08:08 bettinaheim