Table of Contents
QuEST can be used in your C or C++ code, simply by including
#include <QuEST.h>
Independent of which platform you'll run your simulation on (multicore CPUS, a GPU, or over a network), your QuEST code will look the same, compile with the same makefile, and use the same API.
Here's a simulation of a very simple circuit which measures .
#include <QuEST.h>
int main(int narg, char *varg[]) {
// load QuEST
QuESTEnv env = createQuESTEnv();
// create 2 qubits in the hadamard state
Qureg qubits = createQureg(2, env);
initPlusState(qubits);
// apply circuit
hadamard(qubits, 0);
controlledNot(qubits, 0, 1);
measure(qubits, 1);
// unload QuEST
destroyQureg(qubits, env);
destroyQuESTEnv(env);
return 0;
}
Of course, this code doesn't output anything!
Let's walk through a more sophisticated circuit.
We first construct a quest environment, which abstracts away any preparation of multithreading, distribution or GPU-acceleration strategies.
QuESTEnv env = createQuESTEnv();
We then create a quantum register, in this case of 3 qubits.
Qureg qubits = createQureg(3, env);
and set it to be in the zero state.
initZeroState(qubits);
We can create multiple Qureg
instances, and QuEST will sort out allocating memory for the state-vectors, even over networks! Note we can replace createQureg
with createDensityQureg
, a more powerful density matrix representation which can store mixed states!
We're now ready to apply some gates to our qubits, which in this case have indices 0, 1 and 2. When applying a gate, we pass along which quantum register to operate upon.
hadamard(qubits, 0);
controlledNot(qubits, 0, 1);
rotateY(qubits, 2, .1);
Some gates allow us to specify a general number of control qubits
multiControlledPhaseGate(qubits, (int []){0, 1, 2}, 3);
We can specify general single-qubit unitary operations as 2x2 matrices
// sqrt(X) with a pi/4 global phase
ComplexMatrix2 u;
u.r0c0 = (Complex) {.real=.5, .imag= .5};
u.r0c1 = (Complex) {.real=.5, .imag=-.5};
u.r1c0 = (Complex) {.real=.5, .imag=-.5};
u.r1c1 = (Complex) {.real=.5, .imag= .5};
unitary(qubits, 0, u);
or more compactly, foregoing the global phase factor,
Complex a, b;
a.real = .5; a.imag = .5;
b.real = .5; b.imag = -.5;
compactUnitary(qubits, 1, a, b);
or even more compactly, as a rotation around an arbitrary axis on the Bloch-sphere
Vector v;
v.x = 1; v.y = 0; v.z = 0;
rotateAroundAxis(qubits, 2, 3.14/2, v);
We can controlled-apply general unitaries
controlledCompactUnitary(qubits, 0, 1, a, b);
even with multiple control qubits!
multiControlledUnitary(qubits, (int []){0, 1}, 2, 2, u);
What has this done to the probability of the basis state |111> = |7>?
qreal prob = getProbAmp(qubits, 7);
printf("Probability amplitude of |111>: %lf\n", prob);
Here, qreal
is a floating point number (e.g. double
). The state-vector is stored as qreal
s so that we can change its precision without any recoding, by changing PRECISION
in the makefile
How probable is measuring our final qubit (2) in outcome 1
?
prob = calcProbOfOutcome(qubits, 2, 1);
printf("Probability of qubit 2 being in state 1: %f\n", prob);
Let's measure the first qubit, randomly collapsing it to 0 or 1
int outcome = measure(qubits, 0);
printf("Qubit 0 was measured in state %d\n", outcome);
and now measure our final qubit, while also learning of the probability of its outcome.
outcome = measureWithStats(qubits, 2, &prob);
printf("Qubit 2 collapsed to %d with probability %f\n", outcome, prob);
At the conclusion of our circuit, we should free up the memory used by our state-vector.
destroyQureg(qubits, env);
destroyQuESTEnv(env);
The effect of the code above is to simulate the below circuit
and after compiling (see section below), gives psuedo-random output
Probability amplitude of |111>: 0.498751 Probability of qubit 2 being in state 1: 0.749178 Qubit 0 was measured in state 1 Qubit 2 collapsed to 1 with probability 0.998752
Probability amplitude of |111>: 0.498751 Probability of qubit 2 being in state 1: 0.749178 Qubit 0 was measured in state 0 Qubit 2 collapsed to 1 with probability 0.499604
QuEST uses the Mersenne Twister algorithm to generate random numbers used for randomly collapsing the state-vector. The user can seed this RNG using seedQuEST(arrayOfSeeds, arrayLength)
, otherwise QuEST will by default (through seedQuESTDefault()
) create a seed from the current time, the process id, and the hostname.
To compile, copy the makefile into the same folder as your circuit code. Adjust the User Settings section to configure compilation. You'll need to set
# name of the executable to create
EXE = myExecutable
# space-separated names (no file type) of all user source files (.c or .cpp) in the root directory
SOURCES = myCode1 myCode2
# path to QuEST library from root directory
QUEST_DIR = path/to/QuEST
Next, indicate which compiler you wish to use. For example, to use the default compiler on OSX:
# compiler to use, which should support both C and C++, to be wrapped by GPU/MPI compilers
COMPILER = clang
# type of above compiler, one of {GNU, INTEL, CLANG}, used for setting compiler flags
COMPILER_TYPE = CLANG
To compile your code to run on multicore/multi-CPU systems, and/or for distributed systems, or on GPUs, simply set the appropriate variables
# hardwares to target: 1 means use, 0 means don't use
MULTITHREADED = 0
DISTRIBUTED = 0
GPUACCELERATED = 0
Note that using multithreading requires an OpenMP compatible compiler (e.g. GCC 4.9), using distribution requires an MPI compiler (mpicc
)is installed on your system, and GPU acceleration requires a CUDA compiler (nvcc
). We've made a comprehensive list of compatible compilers which you can view here. This does not change your COMPILER
setting - the makefile will choose the appropriate MPI and CUDA wrappers automatically.
Note also that GPU users must additionally specify the the Compute Capability of their GPU, which can be looked up at the NVIDIA website
GPU_COMPUTE_CAPABILITY = 30An incorrect Compute Capability will lead to drastically incorrect computations. You can check if you've set the right Compute Capability by running the unit tests via
cd tests
then./runTests.sh
.
You can additionally customise the precision with which the state-vector is stored.
# whether to use single, double or quad floating point precision in the state-vector {1,2,4}
PRECISION = 2
Using greater precision means more precise computation but at the expense of additional memory requirements and runtime. Checking results are unchanged when altaring the precision can be a great test that your calculations are sufficiently precise.
You're now ready to compile your code by entering
make
at the terminal, in the directory of your code. For the above example, this performs
gcc -O2 -std=c99 -mavx -Wall -fopenmp -c path/to/QuEST/CPU/QuEST.c
gcc -O2 -std=c99 -mavx -Wall -fopenmp -c path/to/QuEST/mt19937ar.c
gcc -O2 -std=c99 -mavx -Wall -fopenmp -c myCode1.c
gcc -O2 -std=c99 -mavx -Wall -fopenmp -c myCode2.c
gcc -O2 -std=c99 -mavx -Wall -fopenmp -c path/to/QuEST/CPU/QuEST_env_local.c
gcc -O2 -std=c99 -mavx -Wall -fopenmp -o myExecutable QuEST.o mt19937ar.o myCode1.o myCode2.o QuEST_env_local.o -lm
You can then call your code
./myExecutable
If you enabled multithreading when compiling, you can control how many threads your code uses by setting OMP_NUM_THREADS
, ideally to the number of available cores on your machine
export OMP_NUM_THREADS=8
./myExecutable
QuEST will automatically allocate work between the given number of threads to speedup your simulation.
If you compiled in distributed mode, your code can be run over a network (here, over 8 machines) using
mpirun -np 8 ./myExecutable
This will, if enabled, also utilise multithreading on each node with as many threads set in OMP_NUM_THREADS
.
If you compiled for a GPU connected to your system, simply run
./myExecutable
as normal!
There are no special requirements for running QuEST through job submission systems. Just call ./myExecutable
as you would any other binary.
For example, the above code can be split over 4 MPI nodes (each with 8 cores) by setting DISTRIBUTED = 1
(and MULTITHREADED = 1
) in the makefile, and writing a SLURM submission script
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=1
module purge
module load mvapich2
make clean
make
export OMP_NUM_THREADS=8
mpirun ./myExecutable
or a PBS submission script like
#PBS -l select=4:ncpus=8
make clean
make
export OMP_NUM_THREADS=8
aprun -n 4 -d 8 -cc numa_node ./myExecutable
Running QuEST on a GPU partition is similarly easy in SLURM
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1
#SBATCH --partition=gpu ## name may vary
module purge
module load cuda ## name may vary
make clean
make
./myExecutable
On each platform, there is no change to our source code or our QuEST interface. We simply recompile, and QuEST will utilise the available hardware (a GPU, shared-memory or distributed CPUs) to speedup our code.
Note that parallelising with MPI (DISTRIBUTED = 1
) will mean all code in your source file will be repeated on every node. To execute some code (e.g. printing) only on one node, do
if (env.rank == 0)
printf("Only one node executes this print!");
Such conditions are valid and always satisfied in code run on a single node.