21 October 2021
###
Robust Quantum Neural Networks

Quantum Machine Learning (QML) models of the variational type often consist of a data embedding, followed by a parametrized quantum circuit often called Quantum Neural Network (QNN). So far, QNNs have played a significant role in quantum machine learning applications. However, when it comes to physical implementation, especially in current NISQ (Noisy Intermediate Scale Devices) devices, QNNs face efficiency limitations like the high error rate of NISQ systems, and degraded accuracy compared to noise-free simulation. The same QNN on different hardware platforms has distinctly different performance/accuracy, mainly due to different gate error rates, which severely limits the performance.

So far, various types of noise mitigation techniques have been proposed to reduce the impact of noise when running quantum circuits on NISQ devices. However, these are often either generic techniques that do not leverage the advantages of QNNs, and are therefore only applicable to the inference stage, or they are specific to other algorithms or applications only, such as quantum chemistry. This paper proposes a QNN-specific noise aware framework called RoQNN (Robust QNN) that optimizes QNN robustness in both training and inference stages, boosts the intrinsic robustness of QNN parameters, and improves accuracy on real quantum machines.

The architecture of a QNN considered in this work consists of multiple blocks each having three components i) an encoder which encodes the classical values to quantum states with rotation gates, ii) trainable quantum layers containing parameterized gates to perform certain ML tasks, and iii) measurement of each qubit for obtaining a classical value. The measurement outcomes of one block are passed to the next block, and the architecture works in a three-stage pipeline. First, the measurement outcome distribution of each qubit is normalized across input samples during both training and inference to compensate for information loss. Second, noise is injected to the QNN training process by performing error gate insertion. Injecting noises into neural network training helps in obtaining a smoothed loss landscape for better generalization. Hence, by emulating the real noisy environment when deploying NNs, noise-injection-based training significantly boosts the noise-robustness. This is done by iteratively sampling error gates, inserting them to QNN, and updating weights during training. Finally, post-measurement quantization is carried out to quantize the measurement outcomes to discrete values to effectively correct for quantum noise-induced errors. The quantization corrects errors by value clamping, thus avoiding cascaded error accumulation. Moreover, by sparsifying the parameter space, quantization reduces the NN complexity as a regularization mechanism that mitigates potential overfitting issues.

The experimentation was carried out on 8 prototypical ML classification tasks, including MNIST, Fashion and CIFAR.

In terms of the hardware implementation, IBMQ quantum computers were used, and compilation was done via Qiskit (IBM) APIs. All experiments were run using the maximum offered 8192 shots. The experimental results show that RoQNN can improve accuracy by 20-40% for typical classification tasks and demonstrates high accuracy with pure quantum parameters on real quantum hardware. Furthermore, the work compares the noise-free measurement result distribution of 4 qubits with their noisy counterparts MNIST-4 on two devices. It is shown that the post-measurement normalization reduces the mismatch between two distributions with improved SNR on each qubit and each individual measurement outcome.

The overall results demonstrate that RoQNN improves the accuracy of considered baseline designs, irrespective of the QNN model size and design space. Both noise injection and quantization individually improve the overall accuracy of the model by 9%, however, combining these two techniques delivers an accuracy gain of 17%, hence better performance. Another crucial advantage of the proposed model is scalability since the training cost, post-measurement normalization and quantization are linearly scalable with the number of qubits. However, the real challenge lies in optimizing the level of noise injected, because large noise can affect the stability and hence the accuracy of the training process. On the other hand, low noise level doesn’t contribute to improving the model's robustness. Such an architecture can be employed by other QML models, following an analysis of the noise level that is inserted in the quantum system. Further exploration of such QML models should be investigated.

So far, various types of noise mitigation techniques have been proposed to reduce the impact of noise when running quantum circuits on NISQ devices. However, these are often either generic techniques that do not leverage the advantages of QNNs, and are therefore only applicable to the inference stage, or they are specific to other algorithms or applications only, such as quantum chemistry. This paper proposes a QNN-specific noise aware framework called RoQNN (Robust QNN) that optimizes QNN robustness in both training and inference stages, boosts the intrinsic robustness of QNN parameters, and improves accuracy on real quantum machines.

The architecture of a QNN considered in this work consists of multiple blocks each having three components i) an encoder which encodes the classical values to quantum states with rotation gates, ii) trainable quantum layers containing parameterized gates to perform certain ML tasks, and iii) measurement of each qubit for obtaining a classical value. The measurement outcomes of one block are passed to the next block, and the architecture works in a three-stage pipeline. First, the measurement outcome distribution of each qubit is normalized across input samples during both training and inference to compensate for information loss. Second, noise is injected to the QNN training process by performing error gate insertion. Injecting noises into neural network training helps in obtaining a smoothed loss landscape for better generalization. Hence, by emulating the real noisy environment when deploying NNs, noise-injection-based training significantly boosts the noise-robustness. This is done by iteratively sampling error gates, inserting them to QNN, and updating weights during training. Finally, post-measurement quantization is carried out to quantize the measurement outcomes to discrete values to effectively correct for quantum noise-induced errors. The quantization corrects errors by value clamping, thus avoiding cascaded error accumulation. Moreover, by sparsifying the parameter space, quantization reduces the NN complexity as a regularization mechanism that mitigates potential overfitting issues.

The experimentation was carried out on 8 prototypical ML classification tasks, including MNIST, Fashion and CIFAR.

In terms of the hardware implementation, IBMQ quantum computers were used, and compilation was done via Qiskit (IBM) APIs. All experiments were run using the maximum offered 8192 shots. The experimental results show that RoQNN can improve accuracy by 20-40% for typical classification tasks and demonstrates high accuracy with pure quantum parameters on real quantum hardware. Furthermore, the work compares the noise-free measurement result distribution of 4 qubits with their noisy counterparts MNIST-4 on two devices. It is shown that the post-measurement normalization reduces the mismatch between two distributions with improved SNR on each qubit and each individual measurement outcome.

The overall results demonstrate that RoQNN improves the accuracy of considered baseline designs, irrespective of the QNN model size and design space. Both noise injection and quantization individually improve the overall accuracy of the model by 9%, however, combining these two techniques delivers an accuracy gain of 17%, hence better performance. Another crucial advantage of the proposed model is scalability since the training cost, post-measurement normalization and quantization are linearly scalable with the number of qubits. However, the real challenge lies in optimizing the level of noise injected, because large noise can affect the stability and hence the accuracy of the training process. On the other hand, low noise level doesn’t contribute to improving the model's robustness. Such an architecture can be employed by other QML models, following an analysis of the noise level that is inserted in the quantum system. Further exploration of such QML models should be investigated.

Tagged under

11 October 2021
###
Learnability of Quantum Computer Distributions

Apart from supervised learning, an important research focus in Machine Learning (ML) is unsupervised learning, including generative modeling paradigms. There, a type of oracle access to an unknown target distribution is given, usually (but not always) in the form of an unlabeled dataset, and the learning algorithm returns output, an approximate generator for that target distribution. Typical examples of such generative modeling are generative adversarial networks (GANs), variational auto-encoders and normalizing flows. At the same time, the development of quantum equivalent models and algorithms for generative modelling, such as quantum circuit Born machines (QCBMs), quantum GANs and quantum Hamiltonian-based models, have also been proposed and implemented within the field of Quantum Machine Learning (QML).

In this work, the authors study the learnability of the output distributions of local quantum circuits within the Probably Approximately Correct (PAC) framework for probabilistic modelling. The problem of density modelling has also been studied, in addition to providing results for generative modelling. The objective is to generate new samples from a target distribution i.e. an efficient generator, and to output with high probability a sufficiently accurate algorithm for evaluating the probabilities of events with respect to the target distribution i.e. an efficient evaluator. Both of these probabilistic modelling problems are studied with respect to two different models of access to the unknown target distribution. The first model is called the sample model where the learner has access to a sample oracle which provides samples from the unknown target distribution. The second model is a statistical query (SQ) model, which is a strict restriction of the sample model. In this case, the learners only have access to the approximate averaged statistical properties of the unknown target distribution instead of samples from the target distribution.

The work provides a query complexity lower bound for SQ-learning the output distributions of local Clifford circuits, in particular how it scales with respect to the circuit depth d. With respect to linear depth, the lower bound is exponential, while a super-polynomial lower bound is achieved at sub-linear circuit depths. In case of sufficiently highly unstructured problems like the SQ model, it is difficult to estimate the original target distribution, given the statistical averages of the sampled data, even for classical algorithms. This is also in accordance with the results in the case of output distributions of local Clifford circuits, as the distributions cannot be sample efficiently PAC learned in the SQ model. However, for the sample model, there might be an underlying data structure to the model that can be easily learnt, which can be exploited, allowing such model to be sample and computationally efficient. Due to the fact that the class of distributions generated by local Clifford circuits saturates at some circuit depth, it is shown that output distributions of local Clifford circuits of any depth are computationally efficiently PAC learnable in the sample model. Furthermore, a tradeoff is established between circuit depth and query complexity, i.e. if the circuit depth is less than the number of qubits, there exists some difficulty in implementing some global Clifford unitaries.

The work also shows that learning to sample from the output distribution of a quantum circuit given SQ oracle access to the distribution can be hard, even when such sampling can be done classically in an efficient manner when given a circuit description. However, there exist computational problems which are hard in the sample model yet easy in the SQ model, implying that hardness in SQ model cannot be correlated with that of the sample model. Also, the concept class of Born distributions corresponding to local Clifford circuits is both sample- and time- efficiently PAC learnable in the sample model by a classical learning algorithm despite being hard in the SQ model.

Overall, this work provides insights on PAC learnability of the Born distributions associated with local quantum circuits. The main results establish the hardness of learning the output distributions of super-logarithmic depth Clifford circuits, however a potential next step is to explore whether the output distributions of logarithmic depth Clifford circuits are efficiently SQ-learnable. This can be done by examining the tightness of the query complexity lower bounds. In the same direction, learnability of free-fermion distributions and match-gate circuits can be explored by exploiting the proposed algorithms. Moreover, another interesting direction to explore is the average case PAC learnability of local quantum circuit output distributions i.e. whether a randomly drawn distribution from the concept class will be efficiently PAC learnable, if all distributions in the concept class are efficiently learnable. Finally, improving the robustness of learnability results with respect to noise can enhance the efficiency of the learning algorithms.

In this work, the authors study the learnability of the output distributions of local quantum circuits within the Probably Approximately Correct (PAC) framework for probabilistic modelling. The problem of density modelling has also been studied, in addition to providing results for generative modelling. The objective is to generate new samples from a target distribution i.e. an efficient generator, and to output with high probability a sufficiently accurate algorithm for evaluating the probabilities of events with respect to the target distribution i.e. an efficient evaluator. Both of these probabilistic modelling problems are studied with respect to two different models of access to the unknown target distribution. The first model is called the sample model where the learner has access to a sample oracle which provides samples from the unknown target distribution. The second model is a statistical query (SQ) model, which is a strict restriction of the sample model. In this case, the learners only have access to the approximate averaged statistical properties of the unknown target distribution instead of samples from the target distribution.

The work provides a query complexity lower bound for SQ-learning the output distributions of local Clifford circuits, in particular how it scales with respect to the circuit depth d. With respect to linear depth, the lower bound is exponential, while a super-polynomial lower bound is achieved at sub-linear circuit depths. In case of sufficiently highly unstructured problems like the SQ model, it is difficult to estimate the original target distribution, given the statistical averages of the sampled data, even for classical algorithms. This is also in accordance with the results in the case of output distributions of local Clifford circuits, as the distributions cannot be sample efficiently PAC learned in the SQ model. However, for the sample model, there might be an underlying data structure to the model that can be easily learnt, which can be exploited, allowing such model to be sample and computationally efficient. Due to the fact that the class of distributions generated by local Clifford circuits saturates at some circuit depth, it is shown that output distributions of local Clifford circuits of any depth are computationally efficiently PAC learnable in the sample model. Furthermore, a tradeoff is established between circuit depth and query complexity, i.e. if the circuit depth is less than the number of qubits, there exists some difficulty in implementing some global Clifford unitaries.

The work also shows that learning to sample from the output distribution of a quantum circuit given SQ oracle access to the distribution can be hard, even when such sampling can be done classically in an efficient manner when given a circuit description. However, there exist computational problems which are hard in the sample model yet easy in the SQ model, implying that hardness in SQ model cannot be correlated with that of the sample model. Also, the concept class of Born distributions corresponding to local Clifford circuits is both sample- and time- efficiently PAC learnable in the sample model by a classical learning algorithm despite being hard in the SQ model.

Overall, this work provides insights on PAC learnability of the Born distributions associated with local quantum circuits. The main results establish the hardness of learning the output distributions of super-logarithmic depth Clifford circuits, however a potential next step is to explore whether the output distributions of logarithmic depth Clifford circuits are efficiently SQ-learnable. This can be done by examining the tightness of the query complexity lower bounds. In the same direction, learnability of free-fermion distributions and match-gate circuits can be explored by exploiting the proposed algorithms. Moreover, another interesting direction to explore is the average case PAC learnability of local quantum circuit output distributions i.e. whether a randomly drawn distribution from the concept class will be efficiently PAC learnable, if all distributions in the concept class are efficiently learnable. Finally, improving the robustness of learnability results with respect to noise can enhance the efficiency of the learning algorithms.

Tagged under

06 October 2021
###
Local Minima in Quantum Neural Networks

A Quantum Neural Network (QNN) is a parameterized quantum circuit that is tuned by parameterized unitary operations, such as gates. QNNs can be seen as the quantum analogues of classical Neural Networks (NNs), and are typically used to provide speedups to hard computational Machine Learning (ML) problems. QNNs have also provided promising improvements to quantum chemistry and material science problems. Similarly to the classical case, the efficiency of QNN applications depends on a combination of the expressivity and the effectiveness of the training procedure, which optimizes a loss function in terms of the read-outs and the QNN parameters for specific applications. QNN training is often a non-deterministic polynomial time problem.

So far, the investigation of training QNNs has been a trial-and-error approach by empirically comparing the performance of standard classical optimizers on training QNNs’ loss functions. These studies are restricted to small system sizes due to the limited access to quantum machines of reasonable sizes and the exponential cost in simulating them classically. Most of the available results involve special cases of QNNs such as the quantum approximate optimization algorithm (QAOA) or extremely over-parameterized cases. However, one prominent result is that random initialization of parameters will lead to vanishing gradients for much smaller size QNNs than classical Neural Networks. The challenge in training QNNs mainly arises from the highly non-convex nature of the corresponding loss functions.

In this paper, the authors conducted a quantitative investigation on the landscape of loss functions for QNNs as a way to study their training issue. The work focuses on understanding the properties of local minima of loss functions such as i) the number of local minima depending on the architecture of QNNs, and ii) whether these local minima are benign or spurious ones, meaning that they are either close to the global minima or saddle points that can be escaped, or they are truly bad local minima that will hinder the training procedure. This work also provides theoretical proofs in order to find spurious local minima for under-parametrized QNNs. The authors identify a general condition of under-parameterized QNNs, in the sense that a dataset can be constructed such that the number of spurious local minima scales exponentially with the number of parameters. This conceptual paradox could be explained by interference affecting the qubits. Interference replaces the role of non-linear activation (in classical NNs) in creating bad local minima for QNNs. Therefore, training a dataset with simple variants of gradient-based methods is hard. Based on the proposed construction, training is almost optimal in terms of the dependence of the number of local minima on the number of parameters, since an almost matching upper bound is calculated. This upper bound also demonstrates a clear separation between QNNs and classical NNs. The authors find an upper bound 2p for the number of local minima for a QNN circuit with p parameters.

Alongside theoretical results, the authors investigate the practical performance of the common optimizers on the construction with p-parameters involving p-qubits. Unlike theoretical results, the experimental data suggests that the existence of spurious local minima can affect the optimization process, which increases the difficulty in training. The experiments being implemented are small enough that were run on an Intel Core i7-7700HQ Processor (2.80GHz) with 16G memory and the training is simulated with PyTorch.

The considered QNN instances are trained with three popular optimizers: Adam, RMSProp, and L-BFGS. The first two methods are variants of vanilla gradient descent with adaptive learning rate while the last method involves implementation of the approximate Newton method that utilizes second-order gradient information. The results show that for all instances and optimizers, under random initialization, the optimizations converge to local minima with non-negligible suboptimality (i.e., different from the global one by a non-negligible amount) with high probability. Also, as the number of bad local minima grows exponentially with the number of parameters in the construction, the success probability should also decay exponentially.

The overall results demonstrate the practical difficulty in training under-parameterized QNNs with gradient-based methods highlighting that gradient-based black-box methods may not potentially solve QNNs efficiently. A potential scope of work in the future could be designing a QNN architecture with a benign landscape when the data distribution is known. Based on the known fact that the landscape for optimizing a variational quantum ansatz can be benign when sufficiently parameterized, it could be beneficial to explore the change in the landscape with the increasing number of parameters, provided that the system size is fixed. Another direction of interest would be to consider alternative loss functions, such as those constructed from sums of QNN outputs with same architectures but different settings (a common setting in solving differential equations with QNNs for example). Furthermore, exploring algorithms beyond gradient-based methods that can solve the optimization problem efficiently and provably with minimum losses may lead to great improvements.

So far, the investigation of training QNNs has been a trial-and-error approach by empirically comparing the performance of standard classical optimizers on training QNNs’ loss functions. These studies are restricted to small system sizes due to the limited access to quantum machines of reasonable sizes and the exponential cost in simulating them classically. Most of the available results involve special cases of QNNs such as the quantum approximate optimization algorithm (QAOA) or extremely over-parameterized cases. However, one prominent result is that random initialization of parameters will lead to vanishing gradients for much smaller size QNNs than classical Neural Networks. The challenge in training QNNs mainly arises from the highly non-convex nature of the corresponding loss functions.

In this paper, the authors conducted a quantitative investigation on the landscape of loss functions for QNNs as a way to study their training issue. The work focuses on understanding the properties of local minima of loss functions such as i) the number of local minima depending on the architecture of QNNs, and ii) whether these local minima are benign or spurious ones, meaning that they are either close to the global minima or saddle points that can be escaped, or they are truly bad local minima that will hinder the training procedure. This work also provides theoretical proofs in order to find spurious local minima for under-parametrized QNNs. The authors identify a general condition of under-parameterized QNNs, in the sense that a dataset can be constructed such that the number of spurious local minima scales exponentially with the number of parameters. This conceptual paradox could be explained by interference affecting the qubits. Interference replaces the role of non-linear activation (in classical NNs) in creating bad local minima for QNNs. Therefore, training a dataset with simple variants of gradient-based methods is hard. Based on the proposed construction, training is almost optimal in terms of the dependence of the number of local minima on the number of parameters, since an almost matching upper bound is calculated. This upper bound also demonstrates a clear separation between QNNs and classical NNs. The authors find an upper bound 2p for the number of local minima for a QNN circuit with p parameters.

Alongside theoretical results, the authors investigate the practical performance of the common optimizers on the construction with p-parameters involving p-qubits. Unlike theoretical results, the experimental data suggests that the existence of spurious local minima can affect the optimization process, which increases the difficulty in training. The experiments being implemented are small enough that were run on an Intel Core i7-7700HQ Processor (2.80GHz) with 16G memory and the training is simulated with PyTorch.

The considered QNN instances are trained with three popular optimizers: Adam, RMSProp, and L-BFGS. The first two methods are variants of vanilla gradient descent with adaptive learning rate while the last method involves implementation of the approximate Newton method that utilizes second-order gradient information. The results show that for all instances and optimizers, under random initialization, the optimizations converge to local minima with non-negligible suboptimality (i.e., different from the global one by a non-negligible amount) with high probability. Also, as the number of bad local minima grows exponentially with the number of parameters in the construction, the success probability should also decay exponentially.

The overall results demonstrate the practical difficulty in training under-parameterized QNNs with gradient-based methods highlighting that gradient-based black-box methods may not potentially solve QNNs efficiently. A potential scope of work in the future could be designing a QNN architecture with a benign landscape when the data distribution is known. Based on the known fact that the landscape for optimizing a variational quantum ansatz can be benign when sufficiently parameterized, it could be beneficial to explore the change in the landscape with the increasing number of parameters, provided that the system size is fixed. Another direction of interest would be to consider alternative loss functions, such as those constructed from sums of QNN outputs with same architectures but different settings (a common setting in solving differential equations with QNNs for example). Furthermore, exploring algorithms beyond gradient-based methods that can solve the optimization problem efficiently and provably with minimum losses may lead to great improvements.

Tagged under

25 September 2021
###
Overparametrizing quantum neural networks

Quantum Machine Learning (QML) models often consist of a data embedding followed by a parametrized quantum circuit often called Quantum Neural Network (QNN). So far, QNNs have played a significant role in quantum machine learning applications in function approximation, handling big data, modelling social networks, associative memory devices, and automated control systems. However, the training of such NNs requires optimizing the QNN’s parameters, which is a non-deterministic polynomial time problem.

The authors in this work employ the technique of over-parameterization, which is the regime where one trains a NN with a larger capacity than required to represent the distribution of the training data. Therefore, the QNN has more parameters to explore all relevant directions in the Hilbert space. Over-parametrizing a NN can improve its performance by improving the trainability and reducing generalization errors, leading to faster convergence. In this work, the authors provide a theoretical framework for the over-parametrization of QNNs and describe its correspondence to a computational phase transition, where the trainability of QNNs is vastly improved due to a landscape with less local minima. The work proposes three theorems to analyze the over-parameterization phenomenon followed by numerical verifications.

The first theorem derives an upper bound as a sufficient condition for over-parametrization. This indicates that for a general type of periodic-structured QNNs, an over-parameterized regime can be reached by increasing the number of parameters past some threshold critical value. This value is related to the dimension of the Dynamical Lie Algebra (DLA) associated with the generators of the QNN. The second theorem provides an operational meaning to the over-parameterization definition in terms of the model’s capacity i.e. increasing the number of parameters can never increase the model capacity beyond the upper bound derived in the first theorem. Lastly, the third theorem shows that the maximum number of relevant directions around the global minima of the optimization problem is always smaller than the upper bound, hence imposing a maximal rank on the Hessian when evaluated at the solution.

For numerical demonstration, three different optimization tasks were considered: i) finding the groundstate energy with a Variational Quantum Eigensolver (VQE), ii) decomposing a unitary to quantum gates through unitary compilation, and iii) compressing information with quantum autoencoding. For VQE, the problem sizes of n = 4, 6, 8, 10 qubits were considered for ansatzes with different depths L with both open and closed boundary conditions. After averaging over 50 random parameter initializations, it is observed that the loss function decreases exponentially with each optimization step when the number of layers is large enough. Similarly for unitary compilation, the numerical results suggest that as the depth of the circuit increases, the convergence towards the global optimum improves dramatically until reaching a saturation point. Lastly, the results for quantum autoencoding were obtained for a system of n = 4 qubits and for the same hardware efficient ansatz used for unitary compilation. Here, it was again observed that an over-parameterized QNN is able to accurately reach the global optima in a few iterations. The value of the obtained rank in contrast to the dimension of the DLA suggests that over-parameterization can be achieved with a number of parameters that is much smaller than the upper-bound.

It is noteworthy to mention a crucial difference between the results obtained for the Hamiltonian variational ansatz (theoretical) and for the hardware efficient ansatz (numerical). In the case of Hamiltonian variational ansatz, the dimension of the DLA scaled polynomially with n, whereas for the hardware efficient ansatz the dimension of the DLA grows exponentially with the system size. This implies that the system becomes over-parameterized at a polynomial number of parameters for the first case, while in the latter case one requires an exponentially large number of parameters. Since most ansatzes used for QNNs are ultimately hardware efficient ansatzes which exhibit barren plateaus, this may require an exponential number of parameters to be over-parameterized, which requires more effort in designing scalable and trainable ansatzes. Finally, a take-away message of this work is that QNN architectures with polynomially large DLAs will have more favorable loss landscapes due to the lack of barren plateaus.

One direction of future research would be to consider more complicated loss functions consisting of functions of different QNN contributions, such as those used in Quantum Circuit Learning for supervised learning or Differentiable Quantum Circuits for solving differential equations. It remains an open question whether such loss function would exhibit similar traits as those single-Hamiltonian minimizations considered here.

The authors in this work employ the technique of over-parameterization, which is the regime where one trains a NN with a larger capacity than required to represent the distribution of the training data. Therefore, the QNN has more parameters to explore all relevant directions in the Hilbert space. Over-parametrizing a NN can improve its performance by improving the trainability and reducing generalization errors, leading to faster convergence. In this work, the authors provide a theoretical framework for the over-parametrization of QNNs and describe its correspondence to a computational phase transition, where the trainability of QNNs is vastly improved due to a landscape with less local minima. The work proposes three theorems to analyze the over-parameterization phenomenon followed by numerical verifications.

The first theorem derives an upper bound as a sufficient condition for over-parametrization. This indicates that for a general type of periodic-structured QNNs, an over-parameterized regime can be reached by increasing the number of parameters past some threshold critical value. This value is related to the dimension of the Dynamical Lie Algebra (DLA) associated with the generators of the QNN. The second theorem provides an operational meaning to the over-parameterization definition in terms of the model’s capacity i.e. increasing the number of parameters can never increase the model capacity beyond the upper bound derived in the first theorem. Lastly, the third theorem shows that the maximum number of relevant directions around the global minima of the optimization problem is always smaller than the upper bound, hence imposing a maximal rank on the Hessian when evaluated at the solution.

For numerical demonstration, three different optimization tasks were considered: i) finding the groundstate energy with a Variational Quantum Eigensolver (VQE), ii) decomposing a unitary to quantum gates through unitary compilation, and iii) compressing information with quantum autoencoding. For VQE, the problem sizes of n = 4, 6, 8, 10 qubits were considered for ansatzes with different depths L with both open and closed boundary conditions. After averaging over 50 random parameter initializations, it is observed that the loss function decreases exponentially with each optimization step when the number of layers is large enough. Similarly for unitary compilation, the numerical results suggest that as the depth of the circuit increases, the convergence towards the global optimum improves dramatically until reaching a saturation point. Lastly, the results for quantum autoencoding were obtained for a system of n = 4 qubits and for the same hardware efficient ansatz used for unitary compilation. Here, it was again observed that an over-parameterized QNN is able to accurately reach the global optima in a few iterations. The value of the obtained rank in contrast to the dimension of the DLA suggests that over-parameterization can be achieved with a number of parameters that is much smaller than the upper-bound.

It is noteworthy to mention a crucial difference between the results obtained for the Hamiltonian variational ansatz (theoretical) and for the hardware efficient ansatz (numerical). In the case of Hamiltonian variational ansatz, the dimension of the DLA scaled polynomially with n, whereas for the hardware efficient ansatz the dimension of the DLA grows exponentially with the system size. This implies that the system becomes over-parameterized at a polynomial number of parameters for the first case, while in the latter case one requires an exponentially large number of parameters. Since most ansatzes used for QNNs are ultimately hardware efficient ansatzes which exhibit barren plateaus, this may require an exponential number of parameters to be over-parameterized, which requires more effort in designing scalable and trainable ansatzes. Finally, a take-away message of this work is that QNN architectures with polynomially large DLAs will have more favorable loss landscapes due to the lack of barren plateaus.

One direction of future research would be to consider more complicated loss functions consisting of functions of different QNN contributions, such as those used in Quantum Circuit Learning for supervised learning or Differentiable Quantum Circuits for solving differential equations. It remains an open question whether such loss function would exhibit similar traits as those single-Hamiltonian minimizations considered here.

Tagged under

21 September 2021
###
Measurement-Based Quantum Computation

There exist many computational models that achieve quantum computation. The most commonly used model is the circuit quantum computational model, which involves an initialization process of qubits, a sequence of quantum gates and a readout (measurement) of qubit state in the computational basis (logical 0/1 state). Another common computational model is the adiabatic quantum computational model, which uses time dependent, smoothly or adiabatically varied Hamiltonians instead of gates. However, both rely on the unitary property of either quantum gates or Hamiltonian evolution. In contrast to these models, measurement-based quantum computation (MBQC) utilizes measurement to achieve emulation of unitary circuits. Since measurement destroys coherence in quantum states, unitary operation is achieved via entanglement. This paper is a review of past and current research concerning MBQC along with the theoretical and experimental progress that has been made.

The origins of MBQC can be traced into the so-called one-way quantum computer, which was based on single-qubit measurements and relied on the mapping of a quantum circuit to a measurement pattern in the cluster state. For any quantum circuit in the standard circuit model, a measurement pattern can be obtained on all the qubits of the cluster state, which indicates the theoretical equivalency of the different computational models. Computation is carried out by the execution of the measurement pattern with possible adaptation which consumes entanglement as a resource. Hence, the amount of entanglement decreases with single-qubit measurement, making the computation “one-way”. MBQC continued to evolve mainly through variants arising from the principles of the one-way quantum computer, such as the teleportation-based, state-transfer-based, and correlation-space approaches.

The teleportation-based approach involves the joint measurement of 2 or more qubits at the same time, which in turns creates entanglement. Subsequently, this generated entanglement will be used to perform gate operations. Depending on the measurement outcomes, a correcting operation completes the teleportation and recovers the unknown state. By using this approach, it is possible to perform universal quantum computation using only measurements and quantum memory without the need of a prior entangled resource state.

On the other hand, the state-transfer scheme uses only single-qubit and two-qubit observables with only two possible outcomes (0 and 1). The basic setup consists of an observable which represents a two-outcome measurement that projects onto the +1 and -1 subspace of the observable eigenstates. In contrast to teleportation, this approach uses only two qubits to transfer a one qubit state. Also, arbitrary one-qubit gates can be implemented by rotating the observables in the state transfer.

Lastly, the correlation-space approach started by interpreting the one-way computer in terms of a tensor network of valence bonds or as projected entangled-pairs states. In that scenario, computation takes place at the virtual qubits and uses teleportation. This approach led to the development of the correlation-space MBQC, which exploits the tensor-network structure of the states, such as the one-dimensional matrix-product states as well as the two-dimensional projected-entangled-pair states.

A potential next step is to explore other resource states beyond cluster states. So far, universal clusters have been shown on regular lattices such as the triangular, hexagonal, and kagome lattices. The universality also holds for cases where the lattice is not perfectly regular. In case of the faulty square-lattice cluster state, the universality depends on the connectivity of the lattice. It has been shown that universal MBQC can also be achieved with ground states of short-ranged interacting Hamiltonians as potential resource states and Affleck-Kennedy-Lieb-Tasaki (AKLT) states. Similarly, symmetry-protected topological states, quantum computational phases of matter and thermal states are explored for MBQC. Another important parameter for MBQC is the amount of entanglement in the universal resource states which, if it's too high, can affect the speedup of the computation. An interesting application of MBQC in the one-way computer is blind computation where a server takes the instruction of measurement axes and reports the outcomes to a client that intends to run some quantum computation without disclosing the executed quantum circuit. That could be useful in future secure cloud-based quantum computation for example.

Efforts of incorporating quantum error correction that will lead to fault-tolerant quantum computation with MBQC, have also been investigated. Such fault-tolerant schemes have been achieved by exploiting a three-dimensional cluster state where each two-dimensional slice is used to simulate the surface code. In order to complete the universality, additional gates can be inserted by a process known as magic-state distillation. The error threshold in this estimation is as high as 0.75%, which is much higher (higher tolerance of errors) than other estimates.

Recent experimental progress in the development of cluster states is utilizing cold atoms trapped in an optical lattice. In addition to bosonic cold atoms, the use of cluster-state generation with trapped fermionic atoms using interplay of the spin-orbit coupling and superexchange interaction also increases the coherence time. Furthermore, entanglement generation between two neutral atoms via the Rydberg blockade have been demonstrated experimentally, which can be used to directly create a cluster state of an array of atoms. Small-size cluster and graph states have also been realized experimentally using solid-state and quantum-dot emitters, also in other physical systems, such as in trapped ions and superconducting qubits. However, generation of resource states beyond cluster states is harder. So far, 1-D tensor-network states used in the correlation-space approach have been realized.

Measurement based quantum computation is a useful alternative to other more popular models of quantum computation. It has already been proven that it is equivalent to the other models and that it is feasible to implement for quantum communication problems. Various interesting applications have been examined and error correction protocols have been successfully applied. Also, proof-of-principle experimental implementations such as photonic, continuous-variable, trapped atoms and ions, and superconducting systems have been devised to realize the MBQC, however, each platforms has its own challenges. Hence, more work is needed to benchmark these platforms and overcome physical and technical challenges.

The origins of MBQC can be traced into the so-called one-way quantum computer, which was based on single-qubit measurements and relied on the mapping of a quantum circuit to a measurement pattern in the cluster state. For any quantum circuit in the standard circuit model, a measurement pattern can be obtained on all the qubits of the cluster state, which indicates the theoretical equivalency of the different computational models. Computation is carried out by the execution of the measurement pattern with possible adaptation which consumes entanglement as a resource. Hence, the amount of entanglement decreases with single-qubit measurement, making the computation “one-way”. MBQC continued to evolve mainly through variants arising from the principles of the one-way quantum computer, such as the teleportation-based, state-transfer-based, and correlation-space approaches.

The teleportation-based approach involves the joint measurement of 2 or more qubits at the same time, which in turns creates entanglement. Subsequently, this generated entanglement will be used to perform gate operations. Depending on the measurement outcomes, a correcting operation completes the teleportation and recovers the unknown state. By using this approach, it is possible to perform universal quantum computation using only measurements and quantum memory without the need of a prior entangled resource state.

On the other hand, the state-transfer scheme uses only single-qubit and two-qubit observables with only two possible outcomes (0 and 1). The basic setup consists of an observable which represents a two-outcome measurement that projects onto the +1 and -1 subspace of the observable eigenstates. In contrast to teleportation, this approach uses only two qubits to transfer a one qubit state. Also, arbitrary one-qubit gates can be implemented by rotating the observables in the state transfer.

Lastly, the correlation-space approach started by interpreting the one-way computer in terms of a tensor network of valence bonds or as projected entangled-pairs states. In that scenario, computation takes place at the virtual qubits and uses teleportation. This approach led to the development of the correlation-space MBQC, which exploits the tensor-network structure of the states, such as the one-dimensional matrix-product states as well as the two-dimensional projected-entangled-pair states.

A potential next step is to explore other resource states beyond cluster states. So far, universal clusters have been shown on regular lattices such as the triangular, hexagonal, and kagome lattices. The universality also holds for cases where the lattice is not perfectly regular. In case of the faulty square-lattice cluster state, the universality depends on the connectivity of the lattice. It has been shown that universal MBQC can also be achieved with ground states of short-ranged interacting Hamiltonians as potential resource states and Affleck-Kennedy-Lieb-Tasaki (AKLT) states. Similarly, symmetry-protected topological states, quantum computational phases of matter and thermal states are explored for MBQC. Another important parameter for MBQC is the amount of entanglement in the universal resource states which, if it's too high, can affect the speedup of the computation. An interesting application of MBQC in the one-way computer is blind computation where a server takes the instruction of measurement axes and reports the outcomes to a client that intends to run some quantum computation without disclosing the executed quantum circuit. That could be useful in future secure cloud-based quantum computation for example.

Efforts of incorporating quantum error correction that will lead to fault-tolerant quantum computation with MBQC, have also been investigated. Such fault-tolerant schemes have been achieved by exploiting a three-dimensional cluster state where each two-dimensional slice is used to simulate the surface code. In order to complete the universality, additional gates can be inserted by a process known as magic-state distillation. The error threshold in this estimation is as high as 0.75%, which is much higher (higher tolerance of errors) than other estimates.

Recent experimental progress in the development of cluster states is utilizing cold atoms trapped in an optical lattice. In addition to bosonic cold atoms, the use of cluster-state generation with trapped fermionic atoms using interplay of the spin-orbit coupling and superexchange interaction also increases the coherence time. Furthermore, entanglement generation between two neutral atoms via the Rydberg blockade have been demonstrated experimentally, which can be used to directly create a cluster state of an array of atoms. Small-size cluster and graph states have also been realized experimentally using solid-state and quantum-dot emitters, also in other physical systems, such as in trapped ions and superconducting qubits. However, generation of resource states beyond cluster states is harder. So far, 1-D tensor-network states used in the correlation-space approach have been realized.

Measurement based quantum computation is a useful alternative to other more popular models of quantum computation. It has already been proven that it is equivalent to the other models and that it is feasible to implement for quantum communication problems. Various interesting applications have been examined and error correction protocols have been successfully applied. Also, proof-of-principle experimental implementations such as photonic, continuous-variable, trapped atoms and ions, and superconducting systems have been devised to realize the MBQC, however, each platforms has its own challenges. Hence, more work is needed to benchmark these platforms and overcome physical and technical challenges.

Tagged under

11 September 2021
###
Quantum Classifiers

Recently, there has been extensive research into quantum machine learning models and quantum algorithms that rival their classical counterparts. Some notable examples are image recognition advancements, automated driven cars, identification of skin cancers, and prediction of protein structures. However, one of the most well-known branches of machine learning is classification, which is widely applied in commercial and academic applications ranging from face recognition and recommendation systems to earthquake detection, disease diagnosis and further handling complex classification tasks.

Previous studies on quantum classifiers have extended popular classical classification algorithms to the quantum domain such as the quantum support vector machine, quantum nearest neighbor algorithms, and quantum decision tree classifiers, exhibiting the potential of providing quadratic or even exponential speedups.

This review paper introduces some important algorithms and models for quantum classifiers including i) quantum support vector machines, ii) quantum decision tree, iii) quantum nearest neighbors algorithms, iv) quantum annealing based classifiers, and v) variational quantum classifiers. The experimental advancements of the considered classifiers are also discussed along with theoretical progress.

So far, quantum kernel methods used in quantum support vector machines are proven to be the most successful to handle classification tasks. In that scenario, the quantum computer can be used to map the classical data to a Hilbert state space and estimate the inner products between those states to obtain the kernel matrix, which is further handled by classical computers.

Another strong candidate for classification is the quantum decision tree model, that uses von Neumann entropy for choosing the attributes to split the nodes. Recently proposed binary quantum decision tree classifier, named Q-tree, is based on a probabilistic approach where a quantum computer can be used to accomplish the probabilistic traversal of the decision tree via measurements, while tree inductions and predictions of query data can be integrated into this framework as well.

For the case of the quantum nearest neighbors algorithm, the authors discuss two algorithms. First is the quantum version of the nearest centroid algorithm which shows an exponential speedup for both the size of the training set and the dimension of the vectors. However, this is inefficient in few situations like when the centroid of the subset lies in the space of other subsets. The issue can be overcome by the second proposed algorithm, which includes a quantum random access memory, amplitude estimation, and the Dürr Høyer minimization algorithm. This algorithm can be implemented for the quantum nearest neighbors algorithm with a quadratic speedup over its classical counterparts.

Another type of classifier is the quantum annealing classifier. As the name suggests, quantum annealing is used for classification tasks reaching high levels of generalization and physical interpretability for various applications such as anomaly detection, Higgs boson classifications, classifying and ranking binding affinities, etc. In quantum annealing, two Hamiltonians are considered: a simple and easy to prepare Hamiltonian with a known ground state, and a problem Hamiltonian whose ground state represents the solution of the problem. By slowly evolving the simple Hamiltonian towards the problem Hamiltonian, one can find the solution of the problem. Quantum annealing uses quantum tunneling to explore the problem space and to avoid local minima.

The last type of quantum classifier that is discussed are variational quantum classifiers (VQCs), which are a subclass of quantum neural networks. As such, the parameters in a classification task will be optimized during the training process to minimize a predefined loss function. Many structures and protocols for VQCs have already been proposed, mainly addressing issues like gradient descent evaluation, neural network structure (convolutional, recurrent, etc.), avoiding barren plateaus, and vulnerability to adversarial attacks.

In terms of experimental advancements and proof-of-concept implementations to physical hardware, the authors discuss the following implementations. The distance-based quantum classifier was demonstrated in the superconducting platform of IBM Quantum Experience with a classic-quantum hybrid approach. The implementations for quantum nearest neighbors algorithms can be carried out using single photons acting as qubits generated through spontaneous parametric down-conversion. A nearest centroid classifier has been experimentally demonstrated on a eleven-qubit trapped ion quantum computer with a quantum device that is now commercially available via IonQ’s cloud service. Variational quantum classifiers have been experimentally implemented on the superconducting platform to classify artificially generated data. In this experiment, a variety of circuit depths was investigated, and it was shown that when error mitigation techniques were employed alongside a depth four circuit, 100% success rate was achieved.

Nevertheless, when discussing quantum advantage from the perspective of computational complexity, it has been proven that the classical optimization problems for variational quantum algorithms are NP-hard, even for classically tractable systems which are composed of only logarithmically many qubits or free fermions. Moreover, quantum advantage in machine learning can also be viewed in terms of the number of times that a quantum process E has been accessed. It has been proven that for any input distribution, a classical machine learning model can achieve accurate predictions on average by accessing E with times comparable to the optimal quantum machine learning model. By comparison, an exponential quantum advantage is possible for achieving an accurate prediction on all inputs. However, despite proven potential advantage, there are several challenges for quantum classifiers that limit their practical applications. First, implementing quantum random access memory (QRAM) with a desirable scale at the current stage. Second, due to inevitable noise in the quantum devices, it is crucial to increase the fidelity of the quantum operations and develop better error correction techniques. Third, further exploration of all aforementioned structures and protocols is required to identify potential advantage against their classical counterparts for future applications.

Based on these limitations, there is no commercial quantum classifier available at the moment. However, with the advancements of hardware platforms, quantum algorithms, and quantum error correction techniques, quantum classifiers are expected to provide significant advantage in the near future.

Previous studies on quantum classifiers have extended popular classical classification algorithms to the quantum domain such as the quantum support vector machine, quantum nearest neighbor algorithms, and quantum decision tree classifiers, exhibiting the potential of providing quadratic or even exponential speedups.

This review paper introduces some important algorithms and models for quantum classifiers including i) quantum support vector machines, ii) quantum decision tree, iii) quantum nearest neighbors algorithms, iv) quantum annealing based classifiers, and v) variational quantum classifiers. The experimental advancements of the considered classifiers are also discussed along with theoretical progress.

So far, quantum kernel methods used in quantum support vector machines are proven to be the most successful to handle classification tasks. In that scenario, the quantum computer can be used to map the classical data to a Hilbert state space and estimate the inner products between those states to obtain the kernel matrix, which is further handled by classical computers.

Another strong candidate for classification is the quantum decision tree model, that uses von Neumann entropy for choosing the attributes to split the nodes. Recently proposed binary quantum decision tree classifier, named Q-tree, is based on a probabilistic approach where a quantum computer can be used to accomplish the probabilistic traversal of the decision tree via measurements, while tree inductions and predictions of query data can be integrated into this framework as well.

For the case of the quantum nearest neighbors algorithm, the authors discuss two algorithms. First is the quantum version of the nearest centroid algorithm which shows an exponential speedup for both the size of the training set and the dimension of the vectors. However, this is inefficient in few situations like when the centroid of the subset lies in the space of other subsets. The issue can be overcome by the second proposed algorithm, which includes a quantum random access memory, amplitude estimation, and the Dürr Høyer minimization algorithm. This algorithm can be implemented for the quantum nearest neighbors algorithm with a quadratic speedup over its classical counterparts.

Another type of classifier is the quantum annealing classifier. As the name suggests, quantum annealing is used for classification tasks reaching high levels of generalization and physical interpretability for various applications such as anomaly detection, Higgs boson classifications, classifying and ranking binding affinities, etc. In quantum annealing, two Hamiltonians are considered: a simple and easy to prepare Hamiltonian with a known ground state, and a problem Hamiltonian whose ground state represents the solution of the problem. By slowly evolving the simple Hamiltonian towards the problem Hamiltonian, one can find the solution of the problem. Quantum annealing uses quantum tunneling to explore the problem space and to avoid local minima.

The last type of quantum classifier that is discussed are variational quantum classifiers (VQCs), which are a subclass of quantum neural networks. As such, the parameters in a classification task will be optimized during the training process to minimize a predefined loss function. Many structures and protocols for VQCs have already been proposed, mainly addressing issues like gradient descent evaluation, neural network structure (convolutional, recurrent, etc.), avoiding barren plateaus, and vulnerability to adversarial attacks.

In terms of experimental advancements and proof-of-concept implementations to physical hardware, the authors discuss the following implementations. The distance-based quantum classifier was demonstrated in the superconducting platform of IBM Quantum Experience with a classic-quantum hybrid approach. The implementations for quantum nearest neighbors algorithms can be carried out using single photons acting as qubits generated through spontaneous parametric down-conversion. A nearest centroid classifier has been experimentally demonstrated on a eleven-qubit trapped ion quantum computer with a quantum device that is now commercially available via IonQ’s cloud service. Variational quantum classifiers have been experimentally implemented on the superconducting platform to classify artificially generated data. In this experiment, a variety of circuit depths was investigated, and it was shown that when error mitigation techniques were employed alongside a depth four circuit, 100% success rate was achieved.

Nevertheless, when discussing quantum advantage from the perspective of computational complexity, it has been proven that the classical optimization problems for variational quantum algorithms are NP-hard, even for classically tractable systems which are composed of only logarithmically many qubits or free fermions. Moreover, quantum advantage in machine learning can also be viewed in terms of the number of times that a quantum process E has been accessed. It has been proven that for any input distribution, a classical machine learning model can achieve accurate predictions on average by accessing E with times comparable to the optimal quantum machine learning model. By comparison, an exponential quantum advantage is possible for achieving an accurate prediction on all inputs. However, despite proven potential advantage, there are several challenges for quantum classifiers that limit their practical applications. First, implementing quantum random access memory (QRAM) with a desirable scale at the current stage. Second, due to inevitable noise in the quantum devices, it is crucial to increase the fidelity of the quantum operations and develop better error correction techniques. Third, further exploration of all aforementioned structures and protocols is required to identify potential advantage against their classical counterparts for future applications.

Based on these limitations, there is no commercial quantum classifier available at the moment. However, with the advancements of hardware platforms, quantum algorithms, and quantum error correction techniques, quantum classifiers are expected to provide significant advantage in the near future.

Tagged under

04 September 2021
###
Training Noisy Variational Algorithms

Variational Quantum Algorithms (VQAs) are one of the most prominent methods used during the Noisy Intermediate Scale Quantum (NISQ) era as they adapt to the constraints of NISQ devices. VQAs are used in a wide range of applications from dynamical quantum simulation to machine learning. NISQ devices do not have the resources to exploit the benefits arising from quantum error correction (QEC) techniques, therefore such algorithms suffer from the effects of noise, which decreases their performance. As a substitute of full QEC, many algorithms employ error mitigation techniques to counter the effects of noise. An example of a VQA is the Variational Quantum Eigensolver (VQE), a setting which does offer some strategies to mitigate errors, however additional strategies are required to complement the existing advantages over noise.

It has been proven that noise can significantly decrease the trainability of linear (or super-linear) depth VQAs. One effect of noise is the flattening of the cost landscape as a function of the variational parameters, thus requiring an exponential precision in system size to resolve its features. This phenomenon is known as Noise-Induced Barren Plateaus (NIBPs), because the algorithm can no longer resolve a finite cost gradient (known as Barren Plateau). It can be advantageous to pursue algorithms with sublinear circuit depth and utilizing strategies that limit the hardware noise.

In this work, the authors investigate the effects of error mitigation on the trainability of noisy cost function landscapes in two regimes: i) asymptotic regime (in terms of scaling with system size) and ii) non-asymptotic regime for various error mitigation strategies including Zero Noise Extrapolation (ZNE), Virtual Distillation (VD), Probabilistic Error Cancellation (PEC), and Clifford Data Regression (CDR). The error mitigation protocols are subjected to a class of local depolarizing noise, known to cause NIBP. The first case of an asymptotic regime was considered in terms of scaling with system size. The theoretical results show that, if VQA suffers from exponential cost concentration, the error mitigation strategies cannot remove this exponential scaling, implying that at least an exponential number of resources is required to extract accurate information from the cost landscape in order to find a cost-minimizing optimization direction.

In the second case, it is shown that VD decreases the resolvability of the noisy cost landscape, and impedes trainability. ZNE also shows similar results under more restrictive assumptions on the cost landscape. It is also observed that any improvement in the resolvability after applying PEC under local depolarizing noise exponentially degrades with increasing number of qubits. Finally, for Clifford Data Regression, there is no change to the resolvability of any pair of cost values if the same ansatz is used.

The work also numerically investigates the effects of error mitigation on VQA trainability for the case when the effects of cost concentration is minimal. This is done by simulating the Quantum Approximate Optimization Algorithm (QAOA) for 5-qubit MaxCut problems on a realistic noise model of an IBM quantum computer, obtained by gate set tomography of IBM’s Ourense quantum device. The results compare the quality of the solutions of noisy (unmitigated) and CDR-mitigated optimization and demonstrate that CDR-mitigated optimization outperforms noisy optimization for all considered implementations.

Unlike all the considered error mitigation strategies, it is shown that CDR reverses the concentration of cost values more than it increases the statistical uncertainty as it has a neutral impact on resolvability under a global depolarizing noise model. Also, it can remedy the effects of more complex noise models. This indicates that CDR could resolve trainability issues arising due to corruptions of the cost function outside of cost concentration, whilst having a neutral effect on cost concentration itself, and thus improving overall trainability. A potential direction for future work will be investigating other mechanisms which can allow mitigation strategies to improve the trainability of noisy cost landscapes. As is known from the theory of error correction, once sufficient resources exist, then NIBPs can indeed be avoided.

It has been proven that noise can significantly decrease the trainability of linear (or super-linear) depth VQAs. One effect of noise is the flattening of the cost landscape as a function of the variational parameters, thus requiring an exponential precision in system size to resolve its features. This phenomenon is known as Noise-Induced Barren Plateaus (NIBPs), because the algorithm can no longer resolve a finite cost gradient (known as Barren Plateau). It can be advantageous to pursue algorithms with sublinear circuit depth and utilizing strategies that limit the hardware noise.

In this work, the authors investigate the effects of error mitigation on the trainability of noisy cost function landscapes in two regimes: i) asymptotic regime (in terms of scaling with system size) and ii) non-asymptotic regime for various error mitigation strategies including Zero Noise Extrapolation (ZNE), Virtual Distillation (VD), Probabilistic Error Cancellation (PEC), and Clifford Data Regression (CDR). The error mitigation protocols are subjected to a class of local depolarizing noise, known to cause NIBP. The first case of an asymptotic regime was considered in terms of scaling with system size. The theoretical results show that, if VQA suffers from exponential cost concentration, the error mitigation strategies cannot remove this exponential scaling, implying that at least an exponential number of resources is required to extract accurate information from the cost landscape in order to find a cost-minimizing optimization direction.

In the second case, it is shown that VD decreases the resolvability of the noisy cost landscape, and impedes trainability. ZNE also shows similar results under more restrictive assumptions on the cost landscape. It is also observed that any improvement in the resolvability after applying PEC under local depolarizing noise exponentially degrades with increasing number of qubits. Finally, for Clifford Data Regression, there is no change to the resolvability of any pair of cost values if the same ansatz is used.

The work also numerically investigates the effects of error mitigation on VQA trainability for the case when the effects of cost concentration is minimal. This is done by simulating the Quantum Approximate Optimization Algorithm (QAOA) for 5-qubit MaxCut problems on a realistic noise model of an IBM quantum computer, obtained by gate set tomography of IBM’s Ourense quantum device. The results compare the quality of the solutions of noisy (unmitigated) and CDR-mitigated optimization and demonstrate that CDR-mitigated optimization outperforms noisy optimization for all considered implementations.

Unlike all the considered error mitigation strategies, it is shown that CDR reverses the concentration of cost values more than it increases the statistical uncertainty as it has a neutral impact on resolvability under a global depolarizing noise model. Also, it can remedy the effects of more complex noise models. This indicates that CDR could resolve trainability issues arising due to corruptions of the cost function outside of cost concentration, whilst having a neutral effect on cost concentration itself, and thus improving overall trainability. A potential direction for future work will be investigating other mechanisms which can allow mitigation strategies to improve the trainability of noisy cost landscapes. As is known from the theory of error correction, once sufficient resources exist, then NIBPs can indeed be avoided.

Tagged under

30 August 2021
###
Quantum Alphatron

Quantum machine learning has recently been one of the prominent areas in the field of quantum computing where quantum algorithms can be generalized to solve important problems. For specific problems like search and integer factoring, quantum algorithms can be generalized to important subroutines replacing linear algebra and machine learning. More accurately, the quantum search can be generalized to amplitude amplification allowing speedups for sampling and estimation tasks. Quantum factoring contains the phase estimation subroutine, which can be used for decomposing a matrix into eigenvalues and eigenvectors. Also, quantum kernel methods and quantum gradient computation are being actively researched for various machine learning applications.

Many quantum algorithms use heuristic methods similar to classical machine leaning. In the classical regime, it is difficult to obtain provable guarantees regarding the quality of learning and the run time. However, in quantum regimes, some algorithms retain the benefits of provable classical algorithms for learning and at the same time examine provable quantum speedups for machine learning. One such quantum algorithm that is inspired by its classical analogue is presented in this paper. It is called Alphatron and the classical algorithm is enhanced by quantum sub-routines. It is a gradient-descent-like algorithm for isotonic regression with the inclusion of kernel functions. This algorithm can be used to learn a kernelized, non-linear concept class of functions with a bounded noise term, hence can be exploited to learn a two-layer neural network, where one layer of activation functions feeds into a single activation function. In this work, the authors provide quantum speedups for the Alphatron and its application to non-linear classes of functions and two-layer neural networks.

The work considers pre-computation of the kernel matrix used in the algorithm and provides three sets of algorithms: Alphatron with pre-computation (Pre), approximate pre-computation (Approx Pre), and quantum estimation pre-computation (Q-Pre). Each algorithm offers improvement on the run-time complexity by pre-computation and quantum estimation, eventually improving the original Alphatron. The algorithms were tested against 'n' dimensional data, which is comparatively larger than the other parameters, thus showing relevance to many practical applications. With the aid of pre-computation, the complexity term which is quadratic in the total size of the dataset (m) and linear in "n" dimensions of data, loses a factor of T (number of iterations), therefore improving the performance. Furthermore, utilizing quantum amplitude estimation, a gain of quadratic speedup can be obtained along the dimension.

The work further exploits a setting where the samples are provided via quantum query access in order to harness quantum subroutines to estimate the entries of the kernel matrices used in the algorithm. This access can be provided by quantum random access memory (QRAM) via the GroverRudolph procedure. Such a device stores the data in (classical) memory cells but allows for superposition queries to the data. The cost of the resources is proportional to the length of the vector to set up, but then can provide a run-time of the superposition state that is logarithmic in the length of the vector.

The quantum subroutines used in this work are adaptations of the amplitude estimation algorithm. The results show that the learning guarantees can be preserved despite the erroneous estimation of the kernel matrices. Also, there are estimations inside the main loop of the algorithm which can be replaced with quantum subroutines while keeping the main loop intact. Moreover, in the case where data dimension 'n' is much larger than the other parameters, quantum pre-computation requires asymptotically more time than Alphatron with Kernel, since the same Alphatron with Kernel algorithm is used after the preparation of the kernel matrices. Therefore, optimizing Alphatron with Kernel is not cost-effective when the cost of preparing the data of size n is taken into account. Finally, further exploration of a potential quantum speedup is required for Alphatron with Kernel, especially for the case of pre-computation occurring prior to the algorithm.

Many quantum algorithms use heuristic methods similar to classical machine leaning. In the classical regime, it is difficult to obtain provable guarantees regarding the quality of learning and the run time. However, in quantum regimes, some algorithms retain the benefits of provable classical algorithms for learning and at the same time examine provable quantum speedups for machine learning. One such quantum algorithm that is inspired by its classical analogue is presented in this paper. It is called Alphatron and the classical algorithm is enhanced by quantum sub-routines. It is a gradient-descent-like algorithm for isotonic regression with the inclusion of kernel functions. This algorithm can be used to learn a kernelized, non-linear concept class of functions with a bounded noise term, hence can be exploited to learn a two-layer neural network, where one layer of activation functions feeds into a single activation function. In this work, the authors provide quantum speedups for the Alphatron and its application to non-linear classes of functions and two-layer neural networks.

The work considers pre-computation of the kernel matrix used in the algorithm and provides three sets of algorithms: Alphatron with pre-computation (Pre), approximate pre-computation (Approx Pre), and quantum estimation pre-computation (Q-Pre). Each algorithm offers improvement on the run-time complexity by pre-computation and quantum estimation, eventually improving the original Alphatron. The algorithms were tested against 'n' dimensional data, which is comparatively larger than the other parameters, thus showing relevance to many practical applications. With the aid of pre-computation, the complexity term which is quadratic in the total size of the dataset (m) and linear in "n" dimensions of data, loses a factor of T (number of iterations), therefore improving the performance. Furthermore, utilizing quantum amplitude estimation, a gain of quadratic speedup can be obtained along the dimension.

The work further exploits a setting where the samples are provided via quantum query access in order to harness quantum subroutines to estimate the entries of the kernel matrices used in the algorithm. This access can be provided by quantum random access memory (QRAM) via the GroverRudolph procedure. Such a device stores the data in (classical) memory cells but allows for superposition queries to the data. The cost of the resources is proportional to the length of the vector to set up, but then can provide a run-time of the superposition state that is logarithmic in the length of the vector.

The quantum subroutines used in this work are adaptations of the amplitude estimation algorithm. The results show that the learning guarantees can be preserved despite the erroneous estimation of the kernel matrices. Also, there are estimations inside the main loop of the algorithm which can be replaced with quantum subroutines while keeping the main loop intact. Moreover, in the case where data dimension 'n' is much larger than the other parameters, quantum pre-computation requires asymptotically more time than Alphatron with Kernel, since the same Alphatron with Kernel algorithm is used after the preparation of the kernel matrices. Therefore, optimizing Alphatron with Kernel is not cost-effective when the cost of preparing the data of size n is taken into account. Finally, further exploration of a potential quantum speedup is required for Alphatron with Kernel, especially for the case of pre-computation occurring prior to the algorithm.

Tagged under

26 August 2021
###
Honeycomb Memory

When designing fault-tolerant quantum memories, geometrical codes have shown significant success. A frequently used geometrical code is the surface code, mainly because it has proven higher circuit-level thresholds among other quantum codes, requiring only 4-body measurements. However, adding more locality, therefore decreasing the number of n-body measurements, can simplify quantum error-correction circuits, either through making the physical layout more sparse or making the syndrome circuit more compact. This can be achieved by decomposing the parity constraints into non-commuting measurements of unprotected degrees of freedom. The crucial limitation of adding more locality is that it often compromises the quality of error-correction, resulting in less information collected about the errors.

One of the potential candidates for building quantum codes is Kitaev's honeycomb model, which requires only 2-body measurements. In the Kitaev’s honeycomb model, the commuting operators serve as low-weight parity constraints, hence the resulting subsystem code from these constraints encodes no protected logical qubits. Based on that model, Hastings and Haah defined a static subsystem code, which they termed honeycomb code, that manifests ‘dynamic’ logical qubits instead of a static subsystem code. These qubits are encoded into pairs of macroscopic anticommuting observables changing in time, which when combined with fault-tolerant initialization and measurement, can lead to the design of a robust error-corrected quantum memory.

In this work, the authors quantify the robustness of logical qubits preserved by the honeycomb code using a correlated minimum-weight perfect-matching decoder. Using Monte Carlo sampling, the work estimates both thresholds and qubit counts required for finite-sized calculations at scale using the honeycomb memory, along with comparisons with the surface code. The authors consider three noisy gate sets: Noisy Gateset, Measurement Ancillae, and Honeycomb Cycle Length, each with an associated circuit-level error model controlled by a single error parameter p. For each of these error models, three figures of merit are considered, namely: the threshold, the lambda factor, and the teraquop qubit count, each one of them depending on the code, decoder, and error model. The notion of the teraquop qubit count is introduced as a good quantification of a code’s overall finite-size performance.

The honeycomb code was simulated using two pieces of software: i) Stim and ii) in-house minimum-weight perfect-matching decoder. For each code and error model, Stim generates a circuit file describing the stabilizer circuit and analyzes each error mechanism in the circuit, determining which detectors the error violates and which logical observables the error flips. A “graph-like” error model is obtained for the decoder which converts it into a weighted matching graph. The decoder optionally notes how errors with more than two symptoms have been decomposed into multiple graph-like (edge-like) errors, and derives reweighting rules between the edges corresponding to the graph-like pieces. These reweighting rules are used for correlated decoding and estimating the logical observable measurement outcome based on the detection event data provided.

When entanglement is generated using two-qubit unitary gates, the numerical results demonstrate a threshold of 0.2% − 0.3% for the honeycomb code compared to a threshold of 0.5% − 0.7% for the surface code in a controlled-not circuit model. It is evident that in this scenario, the 2-body measurements of the honeycomb model are not advantageous in the error correction scheme. The honeycomb memory requires between a 5-10 times qubit overhead to reach the teraquop regime compared to the surface code at 10^-3 error rates.

However, entanglement is instead generated using native two-body measurements (which is advantageous for quantum hardware due to less crosstalk), the honeycomb code achieves a threshold of 1.5% < p < 2.0%, where p is the collective error rate of the two-body measurement gate, including both readout errors and depolarizing noise. The comparative physical qubit savings of a teraquop honeycomb memory at a 10^−3 error rate is several orders of magnitude, because of the proximity to the surface code’s threshold.

The two-body locality of the honeycomb code allows to jointly measure the data qubits directly. This reduces the number of noisy operations in a syndrome cycle compared to decomposing the four-body measurements of the surface code, and eliminates the need for ancilla qubits. As a result, with two-body measurements at a physical error rate of 10^−3, a teraquop honeycomb memory requires only 600 qubits. An important direction to explore is whether the honeycomb code can be embedded naturally in a planar geometry. The main issue would be the proper introduction of boundaries, in a way that does not compromise certain properties of the code, similarly to the surface code. Finally, explicit constructions of the application of logical gates can be another challenging target problem.

One of the potential candidates for building quantum codes is Kitaev's honeycomb model, which requires only 2-body measurements. In the Kitaev’s honeycomb model, the commuting operators serve as low-weight parity constraints, hence the resulting subsystem code from these constraints encodes no protected logical qubits. Based on that model, Hastings and Haah defined a static subsystem code, which they termed honeycomb code, that manifests ‘dynamic’ logical qubits instead of a static subsystem code. These qubits are encoded into pairs of macroscopic anticommuting observables changing in time, which when combined with fault-tolerant initialization and measurement, can lead to the design of a robust error-corrected quantum memory.

In this work, the authors quantify the robustness of logical qubits preserved by the honeycomb code using a correlated minimum-weight perfect-matching decoder. Using Monte Carlo sampling, the work estimates both thresholds and qubit counts required for finite-sized calculations at scale using the honeycomb memory, along with comparisons with the surface code. The authors consider three noisy gate sets: Noisy Gateset, Measurement Ancillae, and Honeycomb Cycle Length, each with an associated circuit-level error model controlled by a single error parameter p. For each of these error models, three figures of merit are considered, namely: the threshold, the lambda factor, and the teraquop qubit count, each one of them depending on the code, decoder, and error model. The notion of the teraquop qubit count is introduced as a good quantification of a code’s overall finite-size performance.

The honeycomb code was simulated using two pieces of software: i) Stim and ii) in-house minimum-weight perfect-matching decoder. For each code and error model, Stim generates a circuit file describing the stabilizer circuit and analyzes each error mechanism in the circuit, determining which detectors the error violates and which logical observables the error flips. A “graph-like” error model is obtained for the decoder which converts it into a weighted matching graph. The decoder optionally notes how errors with more than two symptoms have been decomposed into multiple graph-like (edge-like) errors, and derives reweighting rules between the edges corresponding to the graph-like pieces. These reweighting rules are used for correlated decoding and estimating the logical observable measurement outcome based on the detection event data provided.

When entanglement is generated using two-qubit unitary gates, the numerical results demonstrate a threshold of 0.2% − 0.3% for the honeycomb code compared to a threshold of 0.5% − 0.7% for the surface code in a controlled-not circuit model. It is evident that in this scenario, the 2-body measurements of the honeycomb model are not advantageous in the error correction scheme. The honeycomb memory requires between a 5-10 times qubit overhead to reach the teraquop regime compared to the surface code at 10^-3 error rates.

However, entanglement is instead generated using native two-body measurements (which is advantageous for quantum hardware due to less crosstalk), the honeycomb code achieves a threshold of 1.5% < p < 2.0%, where p is the collective error rate of the two-body measurement gate, including both readout errors and depolarizing noise. The comparative physical qubit savings of a teraquop honeycomb memory at a 10^−3 error rate is several orders of magnitude, because of the proximity to the surface code’s threshold.

The two-body locality of the honeycomb code allows to jointly measure the data qubits directly. This reduces the number of noisy operations in a syndrome cycle compared to decomposing the four-body measurements of the surface code, and eliminates the need for ancilla qubits. As a result, with two-body measurements at a physical error rate of 10^−3, a teraquop honeycomb memory requires only 600 qubits. An important direction to explore is whether the honeycomb code can be embedded naturally in a planar geometry. The main issue would be the proper introduction of boundaries, in a way that does not compromise certain properties of the code, similarly to the surface code. Finally, explicit constructions of the application of logical gates can be another challenging target problem.

Tagged under

26 August 2021
###
Error mitigation using mid-circuit measurements

The capabilities of present-day NISQ devices are constrained by high noise levels and limited error mitigation of qubits. Hence, optimizing NISQ algorithms to use a limited amount of resources is an essential requirement to reduce the effect of errors. One strategy is to use a hybrid classical-quantum approach that employ shallow quantum circuits to solve the “hard” part of the problem. These shallow circuits are more resilient to noise by construction and require limited resources. Variational Quantum Algorithms (VQA) such as the Variational Quantum Eigensolver(VQE) and Quantum Approximate Optimization Algorithm(QAOA) have shown great promise in exploiting this hybrid approach, where a cost function is evaluated in the quantum circuit whose parameters are optimized by a classical non-linear optimizer. The application areas cover an extensive range including chemistry, machine learning, circuit compilation, and classical optimization among others.

Although VQA algorithms are shown to be resilient against coherent errors, the qubits utilized in the quantum circuit are still affected by decoherence. Also, due to high qubit requirements, quantum error correction methods cannot be utilized with VQAs to overcome the effect of decoherence. Another significant source of error that limits the current capability of quantum devices is the readout error caused by the imperfect measurement devices. To combat the noise, one can have the circuit evolution taking place only on a subspace of the full Hilbert space, which will lead to valid and invalid measurement outcomes. The invalid measurement outcomes are attributed to the noise and are discarded. Another helpful approach in combating noise is encoding the data in a format that indicates errors in a straightforward way. One popular approach is one-hot encoding, which results in one-hot quantum states. Such type of binary encodings are used to obtain qubit-saving formulations for the Travelling Salesman Problem, graph coloring problem, quadratic Knapsack problem, and Max k-Cut problem.

In this paper, the authors propose schemes for error mitigation in variational quantum circuits through mid-circuit post-selection, which is performed by injecting the error mitigation sub-circuit consisting of gates and measurements, that detects errors and discards the erroneous data. The work presents post-selection schemes for various encodings which were used to obtain different valid subspaces of quantum states to be used with VQA while solving particular combinatorial optimization problems and problems from quantum chemistry. The encoding methods are i) k-hot, ii) one-hot iii) domain wall encoding iv) Binary and gray encoding and v) one-hot and binary mixed. The measurement was done via two approaches: post-selection through filtering and post-selection through compression. The advantage of the second approach is that it does not require ancilla qubits.

The work implements one-hot to binary post-selection through a compression scheme to solve the Travelling Salesman Problem (TSP) using the Quantum Alternating Operator Ansatz (QAOA) algorithm. In the case of the one-hot method, encoding works by compressing the valid subspace to a smaller subspace of quantum states and differentiates from the known methods. The experiment results show that for amplitude damping, depolarizing, and bit-flip noise, the mid-circuit post-selection has a positive impact on the outcome compared to just using the final post-selection as a criterion. The proposed error mitigation schemes are qubit efficient i.e. they require only mid-circuit measurements and reset instead of a classical “if” operation. The presented methods are currently applicable to existing NISQ algorithms, as well as outside the scope of VQA and with different objective Hamiltonians. Furthermore, mid-circuit measurements have been increasingly made available by providers of quantum hardware such as IBM and Honeywell.

The ancilla-free post-selection through compression scheme can be applied to any problem where the feasible states are one-hot, including the problems defined over permutations such as Vehicle Routing Problem, variations of TSP, Railway Dispatching Problem, Graph Isomorphism Problem, and Flight Gate Assignment Problem. There are multiple factors that should be considered when designing such schemes, including the complexity of the post-selection, the form of the feasible subspace S, the strength, and form of the noise affecting the computation. It is advantageous to design methods that would choose the optimal number (and perhaps the position) of mid-circuit post-selections to be applied automatically. Utilizing such optimized error mitigation schemes can lead to high level of cancellation of noise in NISQ devices.

Although VQA algorithms are shown to be resilient against coherent errors, the qubits utilized in the quantum circuit are still affected by decoherence. Also, due to high qubit requirements, quantum error correction methods cannot be utilized with VQAs to overcome the effect of decoherence. Another significant source of error that limits the current capability of quantum devices is the readout error caused by the imperfect measurement devices. To combat the noise, one can have the circuit evolution taking place only on a subspace of the full Hilbert space, which will lead to valid and invalid measurement outcomes. The invalid measurement outcomes are attributed to the noise and are discarded. Another helpful approach in combating noise is encoding the data in a format that indicates errors in a straightforward way. One popular approach is one-hot encoding, which results in one-hot quantum states. Such type of binary encodings are used to obtain qubit-saving formulations for the Travelling Salesman Problem, graph coloring problem, quadratic Knapsack problem, and Max k-Cut problem.

In this paper, the authors propose schemes for error mitigation in variational quantum circuits through mid-circuit post-selection, which is performed by injecting the error mitigation sub-circuit consisting of gates and measurements, that detects errors and discards the erroneous data. The work presents post-selection schemes for various encodings which were used to obtain different valid subspaces of quantum states to be used with VQA while solving particular combinatorial optimization problems and problems from quantum chemistry. The encoding methods are i) k-hot, ii) one-hot iii) domain wall encoding iv) Binary and gray encoding and v) one-hot and binary mixed. The measurement was done via two approaches: post-selection through filtering and post-selection through compression. The advantage of the second approach is that it does not require ancilla qubits.

The work implements one-hot to binary post-selection through a compression scheme to solve the Travelling Salesman Problem (TSP) using the Quantum Alternating Operator Ansatz (QAOA) algorithm. In the case of the one-hot method, encoding works by compressing the valid subspace to a smaller subspace of quantum states and differentiates from the known methods. The experiment results show that for amplitude damping, depolarizing, and bit-flip noise, the mid-circuit post-selection has a positive impact on the outcome compared to just using the final post-selection as a criterion. The proposed error mitigation schemes are qubit efficient i.e. they require only mid-circuit measurements and reset instead of a classical “if” operation. The presented methods are currently applicable to existing NISQ algorithms, as well as outside the scope of VQA and with different objective Hamiltonians. Furthermore, mid-circuit measurements have been increasingly made available by providers of quantum hardware such as IBM and Honeywell.

The ancilla-free post-selection through compression scheme can be applied to any problem where the feasible states are one-hot, including the problems defined over permutations such as Vehicle Routing Problem, variations of TSP, Railway Dispatching Problem, Graph Isomorphism Problem, and Flight Gate Assignment Problem. There are multiple factors that should be considered when designing such schemes, including the complexity of the post-selection, the form of the feasible subspace S, the strength, and form of the noise affecting the computation. It is advantageous to design methods that would choose the optimal number (and perhaps the position) of mid-circuit post-selections to be applied automatically. Utilizing such optimized error mitigation schemes can lead to high level of cancellation of noise in NISQ devices.

Tagged under

03 August 2021
###
Error Bounds for Variational Quantum Time Evolution

The classical world is fundamentally governed by quantum mechanics. When attempting to simulate models of nature, one has to incorporate its quantum-mechanical nature. In many cases, simulating the quantum-mechanical nature is synonymous to quantum time evolution (QTE), which is the process of evolving a quantum state over time with respect to a Hamiltonian H. Current applications of QTE ranges from simulating Ising models, approximating quantum Gibbs states, and solving combinatorial optimization problems.

The implementation of QTE on an actual physical system such as a gate-based quantum computer requires the respective evolution to be translated into quantum gates. This translation can be approximated with approaches like Trotterization. However, this approach may lead to deep quantum circuits depending on the evolution time and expected accuracy, thus is not well-suited for near-term quantum computers. An alternative approach is Variational quantum time evolution (VarQTE) which approximates the target state with a state whose time dependence is projected onto the parameters of a variational ansatz and can be used to simulate quantum time dynamics with parameterized quantum circuits. Thus, it can utilize near-term devices for solving practically relevant tasks. However, available parameterized unitaries possess limited expressivity due to the usage of short circuits which results in an approximation error with VarQTE.

The authors in this work derive global phase-agnostic error bounds for real and imaginary time evolution based on McLachlan’s variational principle. The error bounds for variational quantum real and imaginary time evolution are referred to as VarQRTE (Variational Quantum Real Time Evolution) and VarQITE (Variational Quantum Imaginary Time Evolution), respectively.

The authors proposed theoretical derivations of the error bounds for the Bures metric between a state prepared with VarQTE and the target state. The preparation of the state was taken in infinitesimally small time steps in order to prevent large errors in the numerical simulations. To demonstrate the efficiency of the derived error bounds, the authors present 3 examples; an illustrative Hamiltonian (Hill), an Ising model and two qubit hydrogen molecule approximations. Furthermore, different ODE formulations and solvers (Euler & RK54) were used to implement VarQRTE and VarQITE, with the final goal being the comparison between the error bounds.

For VarQRTE, the results show very tight error bounds for (Hill) when the Euler method was implemented with 100 time steps, as well as an adaptive step size RK54 ODE solver. The results show that RK54 achieves better error bounds as well as smaller fluctuations in the system energy while using significantly less time steps. In the case of the Ising model, the argmin ODE, which is analytically equivalent to solving the standard ODE with a least square solver, leads to smaller errors than the standard ODE. Moreover, the experiment which uses RK54 and the argmin ODE, leads to the smallest state error as well as error bound in the case of hydrogen Hamiltonian. For VarQITE, the application of RK54 reduces the error in the integration in comparison to standard ODE when implemented on (Hill). However, the argmin ODE performs better than the standard ODE by a small margin in case of the Ising model. For VarQITE applied to Hydrogen, the argmin ODE was used with RK54 and all gradient errors were reduced to 0.

The overall results suggest that the error bounds are good approximations to the actual error in case the gradient errors are small, otherwise they can grow fast, exceeding the meaningful range. Also, the error bounds are strongly dependent on the numerical integration method used. For instance, the numerical integration error introduced by forward Euler can easily exceed the error bounds beyond the useful range, however an adaptive step size scheme such as RK54 can significantly increase the numerical stability and accuracy. Furthermore, ODE formulation based on the minimization of the local gradient error also contributes to reducing the simulation errors. An open question for future investigation is to derive error bounds that are applicable for larger errors and quantification of the influence of quantum hardware noise on these error bounds. Finally, a critical investigation of the behavior of VarQTE and the respective error bounds is crucial at designated points, such as phase transitions in order to unlock the full potential of the QTE simulation technique.

The implementation of QTE on an actual physical system such as a gate-based quantum computer requires the respective evolution to be translated into quantum gates. This translation can be approximated with approaches like Trotterization. However, this approach may lead to deep quantum circuits depending on the evolution time and expected accuracy, thus is not well-suited for near-term quantum computers. An alternative approach is Variational quantum time evolution (VarQTE) which approximates the target state with a state whose time dependence is projected onto the parameters of a variational ansatz and can be used to simulate quantum time dynamics with parameterized quantum circuits. Thus, it can utilize near-term devices for solving practically relevant tasks. However, available parameterized unitaries possess limited expressivity due to the usage of short circuits which results in an approximation error with VarQTE.

The authors in this work derive global phase-agnostic error bounds for real and imaginary time evolution based on McLachlan’s variational principle. The error bounds for variational quantum real and imaginary time evolution are referred to as VarQRTE (Variational Quantum Real Time Evolution) and VarQITE (Variational Quantum Imaginary Time Evolution), respectively.

The authors proposed theoretical derivations of the error bounds for the Bures metric between a state prepared with VarQTE and the target state. The preparation of the state was taken in infinitesimally small time steps in order to prevent large errors in the numerical simulations. To demonstrate the efficiency of the derived error bounds, the authors present 3 examples; an illustrative Hamiltonian (Hill), an Ising model and two qubit hydrogen molecule approximations. Furthermore, different ODE formulations and solvers (Euler & RK54) were used to implement VarQRTE and VarQITE, with the final goal being the comparison between the error bounds.

For VarQRTE, the results show very tight error bounds for (Hill) when the Euler method was implemented with 100 time steps, as well as an adaptive step size RK54 ODE solver. The results show that RK54 achieves better error bounds as well as smaller fluctuations in the system energy while using significantly less time steps. In the case of the Ising model, the argmin ODE, which is analytically equivalent to solving the standard ODE with a least square solver, leads to smaller errors than the standard ODE. Moreover, the experiment which uses RK54 and the argmin ODE, leads to the smallest state error as well as error bound in the case of hydrogen Hamiltonian. For VarQITE, the application of RK54 reduces the error in the integration in comparison to standard ODE when implemented on (Hill). However, the argmin ODE performs better than the standard ODE by a small margin in case of the Ising model. For VarQITE applied to Hydrogen, the argmin ODE was used with RK54 and all gradient errors were reduced to 0.

The overall results suggest that the error bounds are good approximations to the actual error in case the gradient errors are small, otherwise they can grow fast, exceeding the meaningful range. Also, the error bounds are strongly dependent on the numerical integration method used. For instance, the numerical integration error introduced by forward Euler can easily exceed the error bounds beyond the useful range, however an adaptive step size scheme such as RK54 can significantly increase the numerical stability and accuracy. Furthermore, ODE formulation based on the minimization of the local gradient error also contributes to reducing the simulation errors. An open question for future investigation is to derive error bounds that are applicable for larger errors and quantification of the influence of quantum hardware noise on these error bounds. Finally, a critical investigation of the behavior of VarQTE and the respective error bounds is crucial at designated points, such as phase transitions in order to unlock the full potential of the QTE simulation technique.

29 July 2021
###
Unifying Quantum Error Mitigation

The main objective of error mitigation techniques in quantum computing algorithms is to make quantum computation more efficient by reducing the noise level due to noisy quantum hardware to a low level. The main advantage of error mitigation techniques compared to error correction techniques lies in the lower resource requirements.

There already exist a variety of error mitigation techniques. A popular example is ZNE, which relies on collecting data at different noise levels and fitting an extrapolation to the noiseless case. However, this is limited to small size problems and is ineffective in high noise regimes. On the other hand, classically simulable circuits based on the Clifford data regression (CDR) method can be used to inform researchers about noise in a quantum device. CDR constructs a training set using data obtained from classically simulable near-Clifford circuits and performs a regressive fit on an ansatz mapping noisy values to exact expectation values. The fitted ansatz is used to estimate the value of the noiseless result. The combination of both methods is proven to be outperforming either individual method. Another recent method is Virtual Distillation (VD), which mitigates noise by simultaneously preparing multiple copies of a noisy state, entangling them, and making measurements to virtually prepare a state which is purer than the original noisy states, achieving exponential error suppression.

In this work, the authors propose to unify zero noise extrapolation (ZNE), variable noise Clifford data regression (vnCDR), and virtual distillation (VD) into one mitigation framework called UNIfied Technique for Error mitigation with Data (UNITED). The objective is to combine the advantages of each method in order to improve the mitigation of noise. The methods were applied to the mitigation of errors in local observables, which are measured from states prepared with random circuits. Each of these methods has different strengths and resource requirements, but their performance with 10^5 - 10^10 total shots was considered enough to mitigate the expectation value of an observable of interest.

The work combines the ideas of near-Clifford circuit data and the multi-copy data from VD to formulate Clifford guided VD (CGVD), which uses Clifford data to guide a Richardson-like extrapolation to the many copy limit. Furthermore, the work generalizes this method to perform Clifford-guided double Richardson-like extrapolation to the zero-noise limit. It is also shown that performance improves if the CDR training data are supplemented with data from the training circuits that run under increased noise. Similarly, supplementing the vnCDR training set with results obtained by VD will improve its performance.

A realistic simulation of a noisy trapped-ion device was used to enable all-to-all connectivity and to benchmark UNITED and the other methods for problems with various numbers of qubits, circuit depths, and total numbers of shots. The results show that different techniques are optimal for different shot budgets. For instance, ZNE is comparatively more efficient for small shot budgets (10^5), while Clifford-based approaches perform better for larger shot budgets (10^6 - 10^8). UNITED outperforms all other methods for shot budgets on the scale of 10^10.

These results offer new hope for future work in the direction of further unification with other error mitigation techniques such as noise-resilient quantum compiling, verified phase estimation, and specific approaches leveraging symmetries and/or post-selection. Another potential improvement lies in the timing of measurements of new expectation values (i.e. more near-Clifford circuits, more noise levels, more copies of the state, etc.) versus expending more shots per expectation value in order to get the best mitigation possible, given fixed resources. Finally, investigation of the effects of noisy controlled derangement on VD and UNITED performance is crucial in determining the maximum potential of these methods.

There already exist a variety of error mitigation techniques. A popular example is ZNE, which relies on collecting data at different noise levels and fitting an extrapolation to the noiseless case. However, this is limited to small size problems and is ineffective in high noise regimes. On the other hand, classically simulable circuits based on the Clifford data regression (CDR) method can be used to inform researchers about noise in a quantum device. CDR constructs a training set using data obtained from classically simulable near-Clifford circuits and performs a regressive fit on an ansatz mapping noisy values to exact expectation values. The fitted ansatz is used to estimate the value of the noiseless result. The combination of both methods is proven to be outperforming either individual method. Another recent method is Virtual Distillation (VD), which mitigates noise by simultaneously preparing multiple copies of a noisy state, entangling them, and making measurements to virtually prepare a state which is purer than the original noisy states, achieving exponential error suppression.

In this work, the authors propose to unify zero noise extrapolation (ZNE), variable noise Clifford data regression (vnCDR), and virtual distillation (VD) into one mitigation framework called UNIfied Technique for Error mitigation with Data (UNITED). The objective is to combine the advantages of each method in order to improve the mitigation of noise. The methods were applied to the mitigation of errors in local observables, which are measured from states prepared with random circuits. Each of these methods has different strengths and resource requirements, but their performance with 10^5 - 10^10 total shots was considered enough to mitigate the expectation value of an observable of interest.

The work combines the ideas of near-Clifford circuit data and the multi-copy data from VD to formulate Clifford guided VD (CGVD), which uses Clifford data to guide a Richardson-like extrapolation to the many copy limit. Furthermore, the work generalizes this method to perform Clifford-guided double Richardson-like extrapolation to the zero-noise limit. It is also shown that performance improves if the CDR training data are supplemented with data from the training circuits that run under increased noise. Similarly, supplementing the vnCDR training set with results obtained by VD will improve its performance.

A realistic simulation of a noisy trapped-ion device was used to enable all-to-all connectivity and to benchmark UNITED and the other methods for problems with various numbers of qubits, circuit depths, and total numbers of shots. The results show that different techniques are optimal for different shot budgets. For instance, ZNE is comparatively more efficient for small shot budgets (10^5), while Clifford-based approaches perform better for larger shot budgets (10^6 - 10^8). UNITED outperforms all other methods for shot budgets on the scale of 10^10.

These results offer new hope for future work in the direction of further unification with other error mitigation techniques such as noise-resilient quantum compiling, verified phase estimation, and specific approaches leveraging symmetries and/or post-selection. Another potential improvement lies in the timing of measurements of new expectation values (i.e. more near-Clifford circuits, more noise levels, more copies of the state, etc.) versus expending more shots per expectation value in order to get the best mitigation possible, given fixed resources. Finally, investigation of the effects of noisy controlled derangement on VD and UNITED performance is crucial in determining the maximum potential of these methods.

Tagged under

- Avoiding Barren Plateaus 22 January 2022 in Blog
- Decompositional QGNN 15 January 2022 in Blog
- Quantum Capsule Networks 08 January 2022 in Blog
- Quantum Digital-Analog Fermionic Simulation 31 December 2021 in Blog
- Quantum Semidefinite Programming 19 December 2021 in Blog
- Shallow Circuits in Variational QML 14 December 2021 in Blog

Copyright © Qu & Co BV