Part V | Page 1 of 13

Variational Quantum Algorithms

Hybrid quantum-classical methods for the NISQ era

5.1 The Variational Principle

The variational principle states that for any trial state $|\psi(\vec{\theta})\rangle$, the expectation value of the Hamiltonian is an upper bound to the ground state energy:

$$E(\vec{\theta}) = \langle\psi(\vec{\theta})|\hat{H}|\psi(\vec{\theta})\rangle \geq E_0$$

This principle is the foundation of variational quantum algorithms: use a quantum computer to prepare parameterized states and evaluate $E(\vec{\theta})$, and use a classical optimizer to minimize $E(\vec{\theta})$ over the parameter space. This hybrid quantum-classical loop is naturally suited to NISQ (Noisy Intermediate-Scale Quantum) devices because:

  • - Circuits can be shallow (reducing decoherence effects)
  • - The classical optimizer can partially compensate for hardware noise
  • - No quantum error correction is required
  • - The approach is heuristic -- no formal guarantee of polynomial speedup, but promising empirical results

5.2 Parameterized Quantum Circuits (Ansatze)

A parameterized quantum circuit (PQC) implements a unitary $U(\vec{\theta})$ that depends on continuous parameters $\vec{\theta} = (\theta_1, \theta_2, \ldots, \theta_p)$:

$$|\psi(\vec{\theta})\rangle = U(\vec{\theta})|0\rangle^{\otimes n} = \prod_{l=1}^{L} U_l(\theta_l) W_l |0\rangle^{\otimes n}$$

where $U_l(\theta_l) = e^{-i\theta_l G_l}$ are parameterized gates (rotations) and $W_l$ are fixed entangling layers. Common ansatz families:

  • Hardware-efficient ansatz:Alternating layers of single-qubit rotations and entangling gates native to the hardware. Pro: low circuit depth. Con: may not capture the right physics; susceptible to barren plateaus.
  • UCCSD (Unitary Coupled Cluster):Chemistry-inspired ansatz: $U(\vec{\theta}) = e^{T(\vec{\theta}) - T^\dagger(\vec{\theta})}$where $T = T_1 + T_2$ includes single and double excitations. Physically motivated but requires deep circuits after Trotterization.
  • QAOA ansatz:Problem-specific for combinatorial optimization:$U(\vec{\gamma}, \vec{\beta}) = \prod_{l=1}^{p} e^{-i\beta_l H_M} e^{-i\gamma_l H_C}$
  • ADAPT-VQE:Iteratively grows the ansatz by selecting the operator from a pool that maximizes the energy gradient. Achieves chemical accuracy with shallower circuits than fixed ansatze.

5.3 Variational Quantum Eigensolver (VQE)

VQE (Peruzzo et al., 2014) finds the ground state energy of a Hamiltonian by minimizing the variational energy. The Hamiltonian is decomposed into a sum of Pauli strings:

$$\hat{H} = \sum_i \alpha_i P_i, \quad P_i \in \{I, X, Y, Z\}^{\otimes n}$$

Each term $\langle\psi(\vec{\theta})|P_i|\psi(\vec{\theta})\rangle$ is estimated independently on the quantum computer by measuring in the appropriate Pauli basis. The full energy is reconstructed classically:

$$E(\vec{\theta}) = \sum_i \alpha_i \langle P_i \rangle_{\vec{\theta}}$$

The VQE Loop

  1. Choose initial parameters $\vec{\theta}_0$
  2. Quantum step: Prepare $|\psi(\vec{\theta})\rangle$ on the quantum processor
  3. Measurement: Estimate $\langle P_i\rangle$ for each Pauli term via repeated shots
  4. Classical step: Compute $E(\vec{\theta}) = \sum_i \alpha_i \langle P_i\rangle$
  5. Optimize: Update $\vec{\theta}$ using a classical optimizer (COBYLA, L-BFGS-B, SPSA, etc.)
  6. Repeat from step 2 until convergence

Measurement Cost

To estimate $E(\vec{\theta})$ to precision $\epsilon$, the number of measurements (shots) scales as:

$$N_{\text{shots}} = O\left(\frac{(\sum_i |\alpha_i|)^2}{\epsilon^2}\right)$$

Grouping commuting Pauli terms reduces the number of distinct measurement circuits needed. Techniques like classical shadows and randomized measurements further reduce overhead.

5.4 Qiskit: VQE for H2 Molecule

Let us implement a simplified VQE to find the ground state energy of the $\text{H}_2$ molecule Hamiltonian in the STO-3G basis (2-qubit reduction):

Python
script.py55 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

5.5 The Parameter Shift Rule

To optimize VQE, we need gradients of $E(\vec{\theta})$. For gates of the form $U(\theta) = e^{-i\theta G/2}$ where $G^2 = I$ (e.g., Pauli rotations), the exact gradient can be computed on quantum hardware using the parameter shift rule:

$$\frac{\partial E}{\partial \theta_i} = \frac{E(\theta_i + \pi/2) - E(\theta_i - \pi/2)}{2}$$

This requires only two circuit evaluations per parameter (at shifted parameter values), giving exact gradients without finite-difference approximation errors.

Proof Sketch

For a single parameter $\theta$ with generator $G$ satisfying $G^2 = I$:

$$e^{-i\theta G/2} = \cos(\theta/2) I - i\sin(\theta/2) G$$

Since $E(\theta)$ depends on $\theta$ only through $\cos(\theta/2)$ and $\sin(\theta/2)$, it has the form $E(\theta) = A + B\cos\theta + C\sin\theta$. Evaluating at $\theta \pm \pi/2$:

$$\frac{E(\theta + \pi/2) - E(\theta - \pi/2)}{2} = C\cos\theta - B\sin\theta = \frac{dE}{d\theta}$$

Gradient-Based Optimizers

  • Adam: Adaptive learning rate, widely used in ML; works well with noisy gradients
  • SPSA: Simultaneous perturbation; requires only 2 function evaluations regardless of parameter count
  • Natural gradient (QNG): Uses the quantum Fisher information matrix to account for parameter space geometry
  • COBYLA: Gradient-free; robust to noise but slower convergence

5.6 Quantum Approximate Optimization Algorithm (QAOA)

QAOA (Farhi et al., 2014) is a variational algorithm for combinatorial optimization problems. Given a cost function encoded as a Hamiltonian $H_C$ (diagonal in the computational basis), QAOA prepares the state:

$$|\vec{\gamma}, \vec{\beta}\rangle = \prod_{l=1}^{p} e^{-i\beta_l H_M} e^{-i\gamma_l H_C} |+\rangle^{\otimes n}$$

where $H_M = \sum_i X_i$ is the mixer Hamiltonian and $p$ is the circuit depth parameter. As $p \to \infty$, QAOA converges to the optimal solution (by the quantum adiabatic theorem).

QAOA for MaxCut

The canonical QAOA application is MaxCut on a graph $G = (V, E)$. The cost Hamiltonian is:

$$H_C = \frac{1}{2}\sum_{(i,j) \in E} (I - Z_i Z_j)$$

The cost operator $e^{-i\gamma H_C}$ decomposes into commuting $ZZ$ rotations on each edge:

$$e^{-i\gamma H_C} = \prod_{(i,j) \in E} e^{-i\gamma(I - Z_iZ_j)/2}$$

Performance Guarantees

  • - $p = 1$: QAOA achieves approximation ratio $\geq 0.6924$ for MaxCut on 3-regular graphs
  • - Classical best known: Goemans-Williamson gives $\approx 0.878$
  • - As $p \to \infty$: QAOA converges to the exact optimum
  • - Whether finite-depth QAOA can beat classical algorithms remains open

5.7 The Barren Plateau Problem

A barren plateau is a region of the parameter landscape where the gradient vanishes exponentially with system size, making optimization intractable. McClean et al. (2018) proved that for random parameterized circuits:

$$\text{Var}\left[\frac{\partial E}{\partial \theta_i}\right] \leq F(n) \quad \text{where } F(n) \in O(2^{-n})$$

This means that to detect a non-zero gradient with constant probability, one needs $O(2^n)$ measurement shots -- exponential overhead that destroys any quantum advantage.

Causes of Barren Plateaus

  • Expressibility: Circuits forming approximate 2-designs (exploring too much of the unitary group) have exponentially vanishing gradients.
  • Entanglement: Highly entangling circuits cause local subsystems to approach the maximally mixed state, flattening the energy landscape.
  • Global cost functions: Cost functions involving measurements on many qubits simultaneously are more prone to barren plateaus than local ones.
  • Noise-induced: Hardware noise can also cause barren plateaus, even for shallow circuits (Wang et al., 2021).

Mitigation Strategies

  • - Problem-inspired ansatze: Use physical structure (e.g., UCCSD) rather than random circuits
  • - Layer-wise training: Train one layer at a time, keeping others fixed
  • - Local cost functions: Replace global observables with sums of local terms
  • - Initialization strategies: Start with identity-like circuits or classically pre-optimized parameters
  • - Overparameterization: Sufficient redundancy in parameters can create favorable gradient landscapes

5.8 Quantum Error Mitigation

For NISQ devices without full error correction, error mitigation techniques reduce the impact of noise on expectation value estimates at the cost of increased measurement overhead:

  • Zero-noise extrapolation (ZNE):Run circuits at multiple noise levels (by stretching pulse durations) and extrapolate to the zero-noise limit. Simple and widely applicable.
  • Probabilistic error cancellation (PEC):Represent the ideal operation as a quasi-probability distribution over noisy operations. Exact but requires full noise characterization; overhead scales as $O(e^{2\gamma})$ where $\gamma$ is the total circuit noise.
  • Clifford data regression (CDR):Learn the noise model by running near-Clifford circuits (classically simulable) and apply the learned correction to target circuit results.
  • Measurement error mitigation:Calibrate the readout error matrix and apply its inverse to correct measurement distributions.

IBM's 2023 demonstration of utility-scale quantum computation (127 qubits, 60 layers of CNOT gates) used ZNE and PEC to obtain accurate Ising model observables, showing that error mitigation can extend the reach of NISQ devices to classically challenging computations.

5.9 Quantum Kernel Methods

An alternative to variational circuits: use the quantum computer to compute a kernel function between data points, then perform classical machine learning in the resulting feature space:

$$\kappa(x_i, x_j) = |\langle\phi(x_i)|\phi(x_j)\rangle|^2 = |\langle 0|U^\dagger(x_i) U(x_j)|0\rangle|^2$$

The quantum feature map $U(x): x \mapsto |\phi(x)\rangle$ embeds classical data into an exponentially large Hilbert space. If the kernel is classically hard to compute and relevant to the learning problem, this provides a quantum advantage for classification and regression tasks.