21 October 2021
##
Robust Quantum Neural Networks

Quantum Machine Learning (QML) models of the variational type often consist of a data embedding, followed by a parametrized quantum circuit often called Quantum Neural Network (QNN). So far, QNNs have played a significant role in quantum machine learning applications. However, when it comes to physical implementation, especially in current NISQ (Noisy Intermediate Scale Devices) devices, QNNs face efficiency limitations like the high error rate of NISQ systems, and degraded accuracy compared to noise-free simulation. The same QNN on different hardware platforms has distinctly different performance/accuracy, mainly due to different gate error rates, which severely limits the performance.

So far, various types of noise mitigation techniques have been proposed to reduce the impact of noise when running quantum circuits on NISQ devices. However, these are often either generic techniques that do not leverage the advantages of QNNs, and are therefore only applicable to the inference stage, or they are specific to other algorithms or applications only, such as quantum chemistry. This paper proposes a QNN-specific noise aware framework called RoQNN (Robust QNN) that optimizes QNN robustness in both training and inference stages, boosts the intrinsic robustness of QNN parameters, and improves accuracy on real quantum machines.

The architecture of a QNN considered in this work consists of multiple blocks each having three components i) an encoder which encodes the classical values to quantum states with rotation gates, ii) trainable quantum layers containing parameterized gates to perform certain ML tasks, and iii) measurement of each qubit for obtaining a classical value. The measurement outcomes of one block are passed to the next block, and the architecture works in a three-stage pipeline. First, the measurement outcome distribution of each qubit is normalized across input samples during both training and inference to compensate for information loss. Second, noise is injected to the QNN training process by performing error gate insertion. Injecting noises into neural network training helps in obtaining a smoothed loss landscape for better generalization. Hence, by emulating the real noisy environment when deploying NNs, noise-injection-based training significantly boosts the noise-robustness. This is done by iteratively sampling error gates, inserting them to QNN, and updating weights during training. Finally, post-measurement quantization is carried out to quantize the measurement outcomes to discrete values to effectively correct for quantum noise-induced errors. The quantization corrects errors by value clamping, thus avoiding cascaded error accumulation. Moreover, by sparsifying the parameter space, quantization reduces the NN complexity as a regularization mechanism that mitigates potential overfitting issues.

The experimentation was carried out on 8 prototypical ML classification tasks, including MNIST, Fashion and CIFAR.

In terms of the hardware implementation, IBMQ quantum computers were used, and compilation was done via Qiskit (IBM) APIs. All experiments were run using the maximum offered 8192 shots. The experimental results show that RoQNN can improve accuracy by 20-40% for typical classification tasks and demonstrates high accuracy with pure quantum parameters on real quantum hardware. Furthermore, the work compares the noise-free measurement result distribution of 4 qubits with their noisy counterparts MNIST-4 on two devices. It is shown that the post-measurement normalization reduces the mismatch between two distributions with improved SNR on each qubit and each individual measurement outcome.

The overall results demonstrate that RoQNN improves the accuracy of considered baseline designs, irrespective of the QNN model size and design space. Both noise injection and quantization individually improve the overall accuracy of the model by 9%, however, combining these two techniques delivers an accuracy gain of 17%, hence better performance. Another crucial advantage of the proposed model is scalability since the training cost, post-measurement normalization and quantization are linearly scalable with the number of qubits. However, the real challenge lies in optimizing the level of noise injected, because large noise can affect the stability and hence the accuracy of the training process. On the other hand, low noise level doesn’t contribute to improving the model's robustness. Such an architecture can be employed by other QML models, following an analysis of the noise level that is inserted in the quantum system. Further exploration of such QML models should be investigated.

So far, various types of noise mitigation techniques have been proposed to reduce the impact of noise when running quantum circuits on NISQ devices. However, these are often either generic techniques that do not leverage the advantages of QNNs, and are therefore only applicable to the inference stage, or they are specific to other algorithms or applications only, such as quantum chemistry. This paper proposes a QNN-specific noise aware framework called RoQNN (Robust QNN) that optimizes QNN robustness in both training and inference stages, boosts the intrinsic robustness of QNN parameters, and improves accuracy on real quantum machines.

The architecture of a QNN considered in this work consists of multiple blocks each having three components i) an encoder which encodes the classical values to quantum states with rotation gates, ii) trainable quantum layers containing parameterized gates to perform certain ML tasks, and iii) measurement of each qubit for obtaining a classical value. The measurement outcomes of one block are passed to the next block, and the architecture works in a three-stage pipeline. First, the measurement outcome distribution of each qubit is normalized across input samples during both training and inference to compensate for information loss. Second, noise is injected to the QNN training process by performing error gate insertion. Injecting noises into neural network training helps in obtaining a smoothed loss landscape for better generalization. Hence, by emulating the real noisy environment when deploying NNs, noise-injection-based training significantly boosts the noise-robustness. This is done by iteratively sampling error gates, inserting them to QNN, and updating weights during training. Finally, post-measurement quantization is carried out to quantize the measurement outcomes to discrete values to effectively correct for quantum noise-induced errors. The quantization corrects errors by value clamping, thus avoiding cascaded error accumulation. Moreover, by sparsifying the parameter space, quantization reduces the NN complexity as a regularization mechanism that mitigates potential overfitting issues.

The experimentation was carried out on 8 prototypical ML classification tasks, including MNIST, Fashion and CIFAR.

In terms of the hardware implementation, IBMQ quantum computers were used, and compilation was done via Qiskit (IBM) APIs. All experiments were run using the maximum offered 8192 shots. The experimental results show that RoQNN can improve accuracy by 20-40% for typical classification tasks and demonstrates high accuracy with pure quantum parameters on real quantum hardware. Furthermore, the work compares the noise-free measurement result distribution of 4 qubits with their noisy counterparts MNIST-4 on two devices. It is shown that the post-measurement normalization reduces the mismatch between two distributions with improved SNR on each qubit and each individual measurement outcome.

The overall results demonstrate that RoQNN improves the accuracy of considered baseline designs, irrespective of the QNN model size and design space. Both noise injection and quantization individually improve the overall accuracy of the model by 9%, however, combining these two techniques delivers an accuracy gain of 17%, hence better performance. Another crucial advantage of the proposed model is scalability since the training cost, post-measurement normalization and quantization are linearly scalable with the number of qubits. However, the real challenge lies in optimizing the level of noise injected, because large noise can affect the stability and hence the accuracy of the training process. On the other hand, low noise level doesn’t contribute to improving the model's robustness. Such an architecture can be employed by other QML models, following an analysis of the noise level that is inserted in the quantum system. Further exploration of such QML models should be investigated.

Published in
Blog

Tagged under

29 October 2021
##
Quantum Portfolio Optimization

Quantum computing is currently being explored to target a wide variety of problems, with optimization-type problems being one of the primary applications. A specific type of optimization problem in finance, where it is believed that quantum computing could potentially have an advantage over classical computing, is portfolio optimization, due to its complex and time consuming nature in classical approaches. Portfolio optimization can be formulated in the language of quadratic program programming, which can in-turn be written in a feasible format for a quantum computer, with the cost function enforcing risk minimization for a targeted return.

Previous works have utilized the Harrow-Hassidim-Lloyd (HHL) algorithm for mean-variance portfolio optimization, which can be formulated as a linear quadratic-programming problem. However, multiple components of HHL are incompatible with the implementation on Noisy Intermediate Scale Quantum (NISQ) systems, which reflects the current state of quantum computers. For instance, eigenvalue inversion is an expensive subcircuit to run on such near-term devices, which suffer from prohibitively many errors over the course of the circuit execution.

The authors in this work introduce a classical/quantum hybrid HHL algorithm called NISQ-HHL, which is obtained by leveraging newly available hardware features and optimizing existing components of HHL. The objective is to execute and validate NISQ-HHL on real quantum hardware supporting mid-circuit measurement, qubit reset and reuse, and Quantum Conditional Logic (QCL). An enhanced formulation of the classical/quantum hybrid HHL algorithm is presented that integrates all the aforementioned features into the separate Quantum Phase Estimation (QPE) routine used for eigenvalue estimation. The end-to-end flow of NISQ-HHL consists of four steps:

i) The QCL-QPE procedure is used to construct a distribution over the estimates of the relevant eigenvalues with n-bit precision.

ii) Classical post-processing is performed on the resulting histogram to obtain the estimates of the relevant eigenvalues.

iii) The obtained n-bit estimates are used to determine rotation angles for the eigenvalue inversion circuit. The estimates are then mapped to a smaller number of bits (r) such that each rotation is conditioned on its corresponding r-bit estimate.

iv) A standard HHL procedure is executed using the circuit constructed in the above step for the eigenvalue inversion. In order to estimate the performance of NISQ-HHL on quantum hardware, in this work the Hamiltonian is simulated using classical algorithms instead of quantum algorithms and a circuit transpiler is used that decomposes it into basis gates relevant for e.g. superconducting-based NISQ hardware.

The first step of NISQ-HHL is to obtain estimates of the relevant eigenvalues using QCL-QPE. This variant of QPE uses mid-circuit measurement, qubit reset and reuse, and Quantum Conditional Logic (QCL). Implementing QCL-QPE has two advantages: i) It reduces the number of ancillas to just one for arbitrary bit precision and hence reduces the number of qubits required, which is important in the NISQ era where not that many logical qubits are available. ii) QCL-QPE replaces two-qubit gates with one-qubit gates controlled by classical bits, thereby reducing the requirement for qubit connectivity, SWAP gates or qubit transport. Also, an algorithm is proposed to optimize the scaling parameter γ for the Hamiltonian simulation required by QCL-QPE, that led to higher accuracy while resolving the eigenvalues. In comparison to the uniformly controlled rotation approach, the number of rotations in the NISQ-HHL implementation is smaller, and the circuit is therefore less deep.

In terms of an experimental implementation of the algorithm, the authors consider a portfolio-optimization problem with two S&P 500 assets. It is implemented on trapped-ion Honeywell System Model H1 after transpiling and optimizing the circuits from Qiskit to H1’s native gates using Cambridge Quantum’s pytket package. Experimental results show that QCL-QPE achieves high fidelity, between experimental and simulated measurement distributions, for three-bit precision. For four-bit estimations, QLC-QPE achieved a higher fidelity than the standard QPE. Also, the NISQ-HHL eigenvalue inversion circuit is shown to be more efficient than the uniformly controlled rotation gate method, mainly due to the reduced number of controlled rotations, and the circuit depth. Moreover, the work demonstrates that NISQ-HHL can scale to larger portfolios than those supported on current quantum hardware.

The total complexity of NISQ-HHL presented here is shown to be lower than the previously proposed methods for a low-qubit count, such as quantum arithmetic and uniformly controlled rotation gate, although the authors were unable to show asymptotic efficiency. Nevertheless, the presented methodology brings HHL implementation closer to current quantum hardware.

Previous works have utilized the Harrow-Hassidim-Lloyd (HHL) algorithm for mean-variance portfolio optimization, which can be formulated as a linear quadratic-programming problem. However, multiple components of HHL are incompatible with the implementation on Noisy Intermediate Scale Quantum (NISQ) systems, which reflects the current state of quantum computers. For instance, eigenvalue inversion is an expensive subcircuit to run on such near-term devices, which suffer from prohibitively many errors over the course of the circuit execution.

The authors in this work introduce a classical/quantum hybrid HHL algorithm called NISQ-HHL, which is obtained by leveraging newly available hardware features and optimizing existing components of HHL. The objective is to execute and validate NISQ-HHL on real quantum hardware supporting mid-circuit measurement, qubit reset and reuse, and Quantum Conditional Logic (QCL). An enhanced formulation of the classical/quantum hybrid HHL algorithm is presented that integrates all the aforementioned features into the separate Quantum Phase Estimation (QPE) routine used for eigenvalue estimation. The end-to-end flow of NISQ-HHL consists of four steps:

i) The QCL-QPE procedure is used to construct a distribution over the estimates of the relevant eigenvalues with n-bit precision.

ii) Classical post-processing is performed on the resulting histogram to obtain the estimates of the relevant eigenvalues.

iii) The obtained n-bit estimates are used to determine rotation angles for the eigenvalue inversion circuit. The estimates are then mapped to a smaller number of bits (r) such that each rotation is conditioned on its corresponding r-bit estimate.

iv) A standard HHL procedure is executed using the circuit constructed in the above step for the eigenvalue inversion. In order to estimate the performance of NISQ-HHL on quantum hardware, in this work the Hamiltonian is simulated using classical algorithms instead of quantum algorithms and a circuit transpiler is used that decomposes it into basis gates relevant for e.g. superconducting-based NISQ hardware.

The first step of NISQ-HHL is to obtain estimates of the relevant eigenvalues using QCL-QPE. This variant of QPE uses mid-circuit measurement, qubit reset and reuse, and Quantum Conditional Logic (QCL). Implementing QCL-QPE has two advantages: i) It reduces the number of ancillas to just one for arbitrary bit precision and hence reduces the number of qubits required, which is important in the NISQ era where not that many logical qubits are available. ii) QCL-QPE replaces two-qubit gates with one-qubit gates controlled by classical bits, thereby reducing the requirement for qubit connectivity, SWAP gates or qubit transport. Also, an algorithm is proposed to optimize the scaling parameter γ for the Hamiltonian simulation required by QCL-QPE, that led to higher accuracy while resolving the eigenvalues. In comparison to the uniformly controlled rotation approach, the number of rotations in the NISQ-HHL implementation is smaller, and the circuit is therefore less deep.

In terms of an experimental implementation of the algorithm, the authors consider a portfolio-optimization problem with two S&P 500 assets. It is implemented on trapped-ion Honeywell System Model H1 after transpiling and optimizing the circuits from Qiskit to H1’s native gates using Cambridge Quantum’s pytket package. Experimental results show that QCL-QPE achieves high fidelity, between experimental and simulated measurement distributions, for three-bit precision. For four-bit estimations, QLC-QPE achieved a higher fidelity than the standard QPE. Also, the NISQ-HHL eigenvalue inversion circuit is shown to be more efficient than the uniformly controlled rotation gate method, mainly due to the reduced number of controlled rotations, and the circuit depth. Moreover, the work demonstrates that NISQ-HHL can scale to larger portfolios than those supported on current quantum hardware.

The total complexity of NISQ-HHL presented here is shown to be lower than the previously proposed methods for a low-qubit count, such as quantum arithmetic and uniformly controlled rotation gate, although the authors were unable to show asymptotic efficiency. Nevertheless, the presented methodology brings HHL implementation closer to current quantum hardware.

Published in
Blog

Tagged under

06 October 2021
##
Local Minima in Quantum Neural Networks

A Quantum Neural Network (QNN) is a parameterized quantum circuit that is tuned by parameterized unitary operations, such as gates. QNNs can be seen as the quantum analogues of classical Neural Networks (NNs), and are typically used to provide speedups to hard computational Machine Learning (ML) problems. QNNs have also provided promising improvements to quantum chemistry and material science problems. Similarly to the classical case, the efficiency of QNN applications depends on a combination of the expressivity and the effectiveness of the training procedure, which optimizes a loss function in terms of the read-outs and the QNN parameters for specific applications. QNN training is often a non-deterministic polynomial time problem.

So far, the investigation of training QNNs has been a trial-and-error approach by empirically comparing the performance of standard classical optimizers on training QNNs’ loss functions. These studies are restricted to small system sizes due to the limited access to quantum machines of reasonable sizes and the exponential cost in simulating them classically. Most of the available results involve special cases of QNNs such as the quantum approximate optimization algorithm (QAOA) or extremely over-parameterized cases. However, one prominent result is that random initialization of parameters will lead to vanishing gradients for much smaller size QNNs than classical Neural Networks. The challenge in training QNNs mainly arises from the highly non-convex nature of the corresponding loss functions.

In this paper, the authors conducted a quantitative investigation on the landscape of loss functions for QNNs as a way to study their training issue. The work focuses on understanding the properties of local minima of loss functions such as i) the number of local minima depending on the architecture of QNNs, and ii) whether these local minima are benign or spurious ones, meaning that they are either close to the global minima or saddle points that can be escaped, or they are truly bad local minima that will hinder the training procedure. This work also provides theoretical proofs in order to find spurious local minima for under-parametrized QNNs. The authors identify a general condition of under-parameterized QNNs, in the sense that a dataset can be constructed such that the number of spurious local minima scales exponentially with the number of parameters. This conceptual paradox could be explained by interference affecting the qubits. Interference replaces the role of non-linear activation (in classical NNs) in creating bad local minima for QNNs. Therefore, training a dataset with simple variants of gradient-based methods is hard. Based on the proposed construction, training is almost optimal in terms of the dependence of the number of local minima on the number of parameters, since an almost matching upper bound is calculated. This upper bound also demonstrates a clear separation between QNNs and classical NNs. The authors find an upper bound 2p for the number of local minima for a QNN circuit with p parameters.

Alongside theoretical results, the authors investigate the practical performance of the common optimizers on the construction with p-parameters involving p-qubits. Unlike theoretical results, the experimental data suggests that the existence of spurious local minima can affect the optimization process, which increases the difficulty in training. The experiments being implemented are small enough that were run on an Intel Core i7-7700HQ Processor (2.80GHz) with 16G memory and the training is simulated with PyTorch.

The considered QNN instances are trained with three popular optimizers: Adam, RMSProp, and L-BFGS. The first two methods are variants of vanilla gradient descent with adaptive learning rate while the last method involves implementation of the approximate Newton method that utilizes second-order gradient information. The results show that for all instances and optimizers, under random initialization, the optimizations converge to local minima with non-negligible suboptimality (i.e., different from the global one by a non-negligible amount) with high probability. Also, as the number of bad local minima grows exponentially with the number of parameters in the construction, the success probability should also decay exponentially.

The overall results demonstrate the practical difficulty in training under-parameterized QNNs with gradient-based methods highlighting that gradient-based black-box methods may not potentially solve QNNs efficiently. A potential scope of work in the future could be designing a QNN architecture with a benign landscape when the data distribution is known. Based on the known fact that the landscape for optimizing a variational quantum ansatz can be benign when sufficiently parameterized, it could be beneficial to explore the change in the landscape with the increasing number of parameters, provided that the system size is fixed. Another direction of interest would be to consider alternative loss functions, such as those constructed from sums of QNN outputs with same architectures but different settings (a common setting in solving differential equations with QNNs for example). Furthermore, exploring algorithms beyond gradient-based methods that can solve the optimization problem efficiently and provably with minimum losses may lead to great improvements.

So far, the investigation of training QNNs has been a trial-and-error approach by empirically comparing the performance of standard classical optimizers on training QNNs’ loss functions. These studies are restricted to small system sizes due to the limited access to quantum machines of reasonable sizes and the exponential cost in simulating them classically. Most of the available results involve special cases of QNNs such as the quantum approximate optimization algorithm (QAOA) or extremely over-parameterized cases. However, one prominent result is that random initialization of parameters will lead to vanishing gradients for much smaller size QNNs than classical Neural Networks. The challenge in training QNNs mainly arises from the highly non-convex nature of the corresponding loss functions.

In this paper, the authors conducted a quantitative investigation on the landscape of loss functions for QNNs as a way to study their training issue. The work focuses on understanding the properties of local minima of loss functions such as i) the number of local minima depending on the architecture of QNNs, and ii) whether these local minima are benign or spurious ones, meaning that they are either close to the global minima or saddle points that can be escaped, or they are truly bad local minima that will hinder the training procedure. This work also provides theoretical proofs in order to find spurious local minima for under-parametrized QNNs. The authors identify a general condition of under-parameterized QNNs, in the sense that a dataset can be constructed such that the number of spurious local minima scales exponentially with the number of parameters. This conceptual paradox could be explained by interference affecting the qubits. Interference replaces the role of non-linear activation (in classical NNs) in creating bad local minima for QNNs. Therefore, training a dataset with simple variants of gradient-based methods is hard. Based on the proposed construction, training is almost optimal in terms of the dependence of the number of local minima on the number of parameters, since an almost matching upper bound is calculated. This upper bound also demonstrates a clear separation between QNNs and classical NNs. The authors find an upper bound 2p for the number of local minima for a QNN circuit with p parameters.

Alongside theoretical results, the authors investigate the practical performance of the common optimizers on the construction with p-parameters involving p-qubits. Unlike theoretical results, the experimental data suggests that the existence of spurious local minima can affect the optimization process, which increases the difficulty in training. The experiments being implemented are small enough that were run on an Intel Core i7-7700HQ Processor (2.80GHz) with 16G memory and the training is simulated with PyTorch.

The considered QNN instances are trained with three popular optimizers: Adam, RMSProp, and L-BFGS. The first two methods are variants of vanilla gradient descent with adaptive learning rate while the last method involves implementation of the approximate Newton method that utilizes second-order gradient information. The results show that for all instances and optimizers, under random initialization, the optimizations converge to local minima with non-negligible suboptimality (i.e., different from the global one by a non-negligible amount) with high probability. Also, as the number of bad local minima grows exponentially with the number of parameters in the construction, the success probability should also decay exponentially.

The overall results demonstrate the practical difficulty in training under-parameterized QNNs with gradient-based methods highlighting that gradient-based black-box methods may not potentially solve QNNs efficiently. A potential scope of work in the future could be designing a QNN architecture with a benign landscape when the data distribution is known. Based on the known fact that the landscape for optimizing a variational quantum ansatz can be benign when sufficiently parameterized, it could be beneficial to explore the change in the landscape with the increasing number of parameters, provided that the system size is fixed. Another direction of interest would be to consider alternative loss functions, such as those constructed from sums of QNN outputs with same architectures but different settings (a common setting in solving differential equations with QNNs for example). Furthermore, exploring algorithms beyond gradient-based methods that can solve the optimization problem efficiently and provably with minimum losses may lead to great improvements.

Published in
Blog

Tagged under

30 August 2021
##
Quantum Alphatron

Quantum machine learning has recently been one of the prominent areas in the field of quantum computing where quantum algorithms can be generalized to solve important problems. For specific problems like search and integer factoring, quantum algorithms can be generalized to important subroutines replacing linear algebra and machine learning. More accurately, the quantum search can be generalized to amplitude amplification allowing speedups for sampling and estimation tasks. Quantum factoring contains the phase estimation subroutine, which can be used for decomposing a matrix into eigenvalues and eigenvectors. Also, quantum kernel methods and quantum gradient computation are being actively researched for various machine learning applications.

Many quantum algorithms use heuristic methods similar to classical machine leaning. In the classical regime, it is difficult to obtain provable guarantees regarding the quality of learning and the run time. However, in quantum regimes, some algorithms retain the benefits of provable classical algorithms for learning and at the same time examine provable quantum speedups for machine learning. One such quantum algorithm that is inspired by its classical analogue is presented in this paper. It is called Alphatron and the classical algorithm is enhanced by quantum sub-routines. It is a gradient-descent-like algorithm for isotonic regression with the inclusion of kernel functions. This algorithm can be used to learn a kernelized, non-linear concept class of functions with a bounded noise term, hence can be exploited to learn a two-layer neural network, where one layer of activation functions feeds into a single activation function. In this work, the authors provide quantum speedups for the Alphatron and its application to non-linear classes of functions and two-layer neural networks.

The work considers pre-computation of the kernel matrix used in the algorithm and provides three sets of algorithms: Alphatron with pre-computation (Pre), approximate pre-computation (Approx Pre), and quantum estimation pre-computation (Q-Pre). Each algorithm offers improvement on the run-time complexity by pre-computation and quantum estimation, eventually improving the original Alphatron. The algorithms were tested against 'n' dimensional data, which is comparatively larger than the other parameters, thus showing relevance to many practical applications. With the aid of pre-computation, the complexity term which is quadratic in the total size of the dataset (m) and linear in "n" dimensions of data, loses a factor of T (number of iterations), therefore improving the performance. Furthermore, utilizing quantum amplitude estimation, a gain of quadratic speedup can be obtained along the dimension.

The work further exploits a setting where the samples are provided via quantum query access in order to harness quantum subroutines to estimate the entries of the kernel matrices used in the algorithm. This access can be provided by quantum random access memory (QRAM) via the GroverRudolph procedure. Such a device stores the data in (classical) memory cells but allows for superposition queries to the data. The cost of the resources is proportional to the length of the vector to set up, but then can provide a run-time of the superposition state that is logarithmic in the length of the vector.

The quantum subroutines used in this work are adaptations of the amplitude estimation algorithm. The results show that the learning guarantees can be preserved despite the erroneous estimation of the kernel matrices. Also, there are estimations inside the main loop of the algorithm which can be replaced with quantum subroutines while keeping the main loop intact. Moreover, in the case where data dimension 'n' is much larger than the other parameters, quantum pre-computation requires asymptotically more time than Alphatron with Kernel, since the same Alphatron with Kernel algorithm is used after the preparation of the kernel matrices. Therefore, optimizing Alphatron with Kernel is not cost-effective when the cost of preparing the data of size n is taken into account. Finally, further exploration of a potential quantum speedup is required for Alphatron with Kernel, especially for the case of pre-computation occurring prior to the algorithm.

Many quantum algorithms use heuristic methods similar to classical machine leaning. In the classical regime, it is difficult to obtain provable guarantees regarding the quality of learning and the run time. However, in quantum regimes, some algorithms retain the benefits of provable classical algorithms for learning and at the same time examine provable quantum speedups for machine learning. One such quantum algorithm that is inspired by its classical analogue is presented in this paper. It is called Alphatron and the classical algorithm is enhanced by quantum sub-routines. It is a gradient-descent-like algorithm for isotonic regression with the inclusion of kernel functions. This algorithm can be used to learn a kernelized, non-linear concept class of functions with a bounded noise term, hence can be exploited to learn a two-layer neural network, where one layer of activation functions feeds into a single activation function. In this work, the authors provide quantum speedups for the Alphatron and its application to non-linear classes of functions and two-layer neural networks.

The work considers pre-computation of the kernel matrix used in the algorithm and provides three sets of algorithms: Alphatron with pre-computation (Pre), approximate pre-computation (Approx Pre), and quantum estimation pre-computation (Q-Pre). Each algorithm offers improvement on the run-time complexity by pre-computation and quantum estimation, eventually improving the original Alphatron. The algorithms were tested against 'n' dimensional data, which is comparatively larger than the other parameters, thus showing relevance to many practical applications. With the aid of pre-computation, the complexity term which is quadratic in the total size of the dataset (m) and linear in "n" dimensions of data, loses a factor of T (number of iterations), therefore improving the performance. Furthermore, utilizing quantum amplitude estimation, a gain of quadratic speedup can be obtained along the dimension.

The work further exploits a setting where the samples are provided via quantum query access in order to harness quantum subroutines to estimate the entries of the kernel matrices used in the algorithm. This access can be provided by quantum random access memory (QRAM) via the GroverRudolph procedure. Such a device stores the data in (classical) memory cells but allows for superposition queries to the data. The cost of the resources is proportional to the length of the vector to set up, but then can provide a run-time of the superposition state that is logarithmic in the length of the vector.

The quantum subroutines used in this work are adaptations of the amplitude estimation algorithm. The results show that the learning guarantees can be preserved despite the erroneous estimation of the kernel matrices. Also, there are estimations inside the main loop of the algorithm which can be replaced with quantum subroutines while keeping the main loop intact. Moreover, in the case where data dimension 'n' is much larger than the other parameters, quantum pre-computation requires asymptotically more time than Alphatron with Kernel, since the same Alphatron with Kernel algorithm is used after the preparation of the kernel matrices. Therefore, optimizing Alphatron with Kernel is not cost-effective when the cost of preparing the data of size n is taken into account. Finally, further exploration of a potential quantum speedup is required for Alphatron with Kernel, especially for the case of pre-computation occurring prior to the algorithm.

Published in
Blog

Tagged under

31 January 2021
##
Active Quantum Research Areas: Quantum Machine Learning

In this edition of Active Quantum Research Areas (AQRAs), we highlight recent research in the quantum community on quantum machine learning; in particular, we focus here on training, expressibility and potential for quantum advantage.

Quantum Machine Learning (QML) techniques have been rigorously researched in the past few years, showing great promise in a number of applications and using various quantum algorithmic techniques and potential speedups. However, the effectiveness of quantum- over classical machine learning is not always evident and still heavily being investigated. In this blog post, we briefly summarize and contextualize several recent arXiv contributions in the area of quantum machine learning, quantum neural networks, quantum generative adversarial networks, and training their variational quantum circuits.

Research, such as the one presented in [1], suggests that potentially exponential quantum advantage is possible for certain tasks when the goal is to achieve accurate prediction on all inputs. While training techniques differ from the classical machine learning realm, more focus is on developing new models and algorithms to exploit the quantum mechanical phenomena. For instance, hybrid quantum-classical QML protocols can offer a significantly enhanced potential. Similarly, features like the expressivity of quantum neural networks can be exploited to gain quantum advantage. More accurately, the relationship between expressibillity and trainability of highly expressive ansatze states can provide useful insight into QML.

It was shown that the potential advantage of quantum ML over classical ML is limited, in the specific case of predicting outcomes of quantum experiments with a specified average-case prediction error. The recent findings by Huang et al. [1] establish the fact that quantum ML can have an exponential advantage over classical ML for certain problems where the objective is achieving a specified worst-case prediction error. This work claims that for any input distribution, a classical ML model can provide accurate predictions on average by accessing a quantum process a number of times comparable to the optimal quantum ML model. These results and analysis complement recent studies of the computational complexity of classical ML models trained with data from quantum experiments. Together, these rigorous results indicate that classical ML trained models using quantum measurement data can be surprisingly powerful. Such a hybrid protocol can potentially address challenging quantum problems in physics, chemistry, and materials for near-term experimental platforms.

Another direction being investigated is exploration towards establishing a quantum neural network (QNN) analogy of the universal approximation theorem, which attempts to provide the foundation for expressivity of classical neural networks. Recent work by Wu Y. et al. [2] discusses the generalization of expressivity of QNN for learning targets (for example observables of input wave functions), while focusing on fully connected architectures. The findings suggest that an accurate approximation is possible only when the input wave functions (target function) in the dataset do not exhaust the entire Hilbert space that the quantum circuit acts on, and more precisely, the Hilbert space dimension of the former has to be less than half of the Hilbert space dimension of the latter. If this requirement cannot be satisfied by the dataset, the expressivity capabilities can be restored by adding one ancillary qubit where the wave function is always fixed at input. Hybrid quantum-classical QML protocols with mixed target states can be exploited in already available experimental platforms with potential promising applications in quantum computing and noise sensing.

Another aspect of quantum ML that is being explored is quantum generative adversarial learning (QGAN) which is a promising strategy to use quantum devices for (quantum) estimation or generative machine learning tasks. The recent work by Braccia et al. [3], discusses the convergence behaviors of a QGAN training process and proposes new algorithms for QGAN training for mixed states which are expected to work better than previously used techniques. It is observed that the states obtained from currently available NISQ devices may display a “chaotic” behaviour during training, without achieving convergence. Results from this work show that algorithms based on the adaptation of a gradient-based technique allow provable convergence with bilinear score functions while the algorithms based on convex optimization techniques are shown to be especially suited for non-parametric states that are iteratively updated via a quantum circuit. Also, endowing the scheme with the ability to process entangled copies of the target state results in enhanced performance.

In Ref. [4] it is shown how generative models can be enhanced via quantum correlations, possibly paving the way to a quantum advantage in a generative ML setting. They show how quantum correlations offer a powerful resource, using a numerical example on a practical problem, and even prove a separation in expressive power between a particular classical generative model and its quantum counterpart. The separation can be explained in terms of the quantum nonlocality and quantum contextuality. Furthermore, there may be classical ML improvements drawing from QI foundations (also known as ‘quantum inspired’ research, for example using MPS or TN). As an outlook, the authors show how their specific proof and example can be extended to more general model paradigms. It would be great to see practical implementations of generative modeling advantage on hardware in the coming years.

In variational quantum algorithms, it is known that highly expressive ansatze can access a close approximation of a desired solution to a number of variational problems and provide a flexible paradigm for programming near-term quantum computers. However, the circuit ansatz must have sufficiently large gradients to allow for proper training; much has been researched about the 'barren plateau' phenomenon in this context and it is seen as a big challenge observed in variational QML and QGAN. Holmes et al. [5] attempts to derive a fundamental relationship between two essential ansatze properties: expressibility and trainability. The resulting bounds in the work by Holmes et al. [5] indicates that highly expressive anatze exhibit flatter cost landscapes and therefore will be harder to train. Therefore, on one hand making the quantum neural network more expressive leads to better representation of a function. However, on the other hand, larger expressivity leads to smaller gradients, which lead to the barren plateau problem and convergence issues. Strategies such as using structured ansatze, physics informed circuits, correlating parameters and restricting rotation angles have been proposed to avoid or mitigate barren plateaus. Further exploring these and other strategies will be an important direction for future research.

Also recently, the connection between quantum machine learning models and kernel methods was explored, more rigorously than previously. Oftentimes, a variational quantum circuit in QML is compared to a neural network (and aptly named a ‘QNN’). Such circuits typically consist of encoding circuit, a variational ansatz, and a measurement. But conceptually, these quantum models really only consist of two parts: the data encoding/embedding and the measurement. Training a quantum model is then simply the problem of finding the measurement that minimises a data-dependent cost function. The author in [6] argues such quantum models are typically much more related to kernel methods than NN’s. Besides being conceptually very interesting, this view also yields several interesting benefits: for example, it refocuses the attention of a potential quantum advantage over classical methods to the way data is encoded into quantum states. Also, kernel-based training guarantees to find the global optimum, which is typically hard for variational circuit training. It is also shown that the main drawback of kernel-based methods is the same reason why NN methods superseded kernel methods in Big Data applications: while NN training is theoretically very hard, empirically O(N) training time can yield good results, while kernel training requires a runtime of at least O(N^2) in a classical feedback loop, even though the training is convex (‘efficient’). However, the model could also be trained using quantum algorithms, and is known to have a quadratic speedup to O(N), although such protocols require large fault-tolerant quantum hardware.

Quantum machine learning is a promising candidate for quantum advantage and recent results indicate the possibility of an advantage for certain tasks. However, this is still ongoing research with various aspects that need to be investigated. A number of algorithms are proposed claiming enhanced performance in numerics but in particular the training and learning problems are essential to explore under future scopes. Similarly, further generalizations on learning targets, regression, and classification tasks are required for implementation on experimental platforms.

References:

[1] Huang H. Y., Kueng R., J. Preskill, “Information-theoretic bounds on quantum advantage in machine learning” arXiv:2101.02464 (Jan. 2021)

[2] Y. Wu, J. Yao, P. Zhang, H. Zhai, “Expressivity of Quantum Neural Networks” arXiv:2101.04273 (Jan. 2021)

[3] P. Braccia, F. Caruso, L. Banchi, “How to enhance quantum generative adversarial learning of noisy information” arXiv:2012.05996 (Dec. 2020)

[4] Xun Gao, Eric R. Anschuetz, Sheng-Tao Wang, J. Ignacio Cirac, Mikhail D. Lukin, “Enhancing Generative Models via Quantum Correlations” arXiv:2101.08354 (Jan. 2021)

[5] Z.Holmes, K. Sharma, M. Cerezo, P.J. Coles, “Connecting ansatz expressibility to gradient magnitudes and barren plateaus” arXiv:2101.02138 (Jan. 2021)

[6] Maria Schuld, “Quantum Machine Learning Models are kernel methods” arXiv:2101.11020 (Jan. 2021)

Quantum Machine Learning (QML) techniques have been rigorously researched in the past few years, showing great promise in a number of applications and using various quantum algorithmic techniques and potential speedups. However, the effectiveness of quantum- over classical machine learning is not always evident and still heavily being investigated. In this blog post, we briefly summarize and contextualize several recent arXiv contributions in the area of quantum machine learning, quantum neural networks, quantum generative adversarial networks, and training their variational quantum circuits.

Research, such as the one presented in [1], suggests that potentially exponential quantum advantage is possible for certain tasks when the goal is to achieve accurate prediction on all inputs. While training techniques differ from the classical machine learning realm, more focus is on developing new models and algorithms to exploit the quantum mechanical phenomena. For instance, hybrid quantum-classical QML protocols can offer a significantly enhanced potential. Similarly, features like the expressivity of quantum neural networks can be exploited to gain quantum advantage. More accurately, the relationship between expressibillity and trainability of highly expressive ansatze states can provide useful insight into QML.

It was shown that the potential advantage of quantum ML over classical ML is limited, in the specific case of predicting outcomes of quantum experiments with a specified average-case prediction error. The recent findings by Huang et al. [1] establish the fact that quantum ML can have an exponential advantage over classical ML for certain problems where the objective is achieving a specified worst-case prediction error. This work claims that for any input distribution, a classical ML model can provide accurate predictions on average by accessing a quantum process a number of times comparable to the optimal quantum ML model. These results and analysis complement recent studies of the computational complexity of classical ML models trained with data from quantum experiments. Together, these rigorous results indicate that classical ML trained models using quantum measurement data can be surprisingly powerful. Such a hybrid protocol can potentially address challenging quantum problems in physics, chemistry, and materials for near-term experimental platforms.

Another direction being investigated is exploration towards establishing a quantum neural network (QNN) analogy of the universal approximation theorem, which attempts to provide the foundation for expressivity of classical neural networks. Recent work by Wu Y. et al. [2] discusses the generalization of expressivity of QNN for learning targets (for example observables of input wave functions), while focusing on fully connected architectures. The findings suggest that an accurate approximation is possible only when the input wave functions (target function) in the dataset do not exhaust the entire Hilbert space that the quantum circuit acts on, and more precisely, the Hilbert space dimension of the former has to be less than half of the Hilbert space dimension of the latter. If this requirement cannot be satisfied by the dataset, the expressivity capabilities can be restored by adding one ancillary qubit where the wave function is always fixed at input. Hybrid quantum-classical QML protocols with mixed target states can be exploited in already available experimental platforms with potential promising applications in quantum computing and noise sensing.

Another aspect of quantum ML that is being explored is quantum generative adversarial learning (QGAN) which is a promising strategy to use quantum devices for (quantum) estimation or generative machine learning tasks. The recent work by Braccia et al. [3], discusses the convergence behaviors of a QGAN training process and proposes new algorithms for QGAN training for mixed states which are expected to work better than previously used techniques. It is observed that the states obtained from currently available NISQ devices may display a “chaotic” behaviour during training, without achieving convergence. Results from this work show that algorithms based on the adaptation of a gradient-based technique allow provable convergence with bilinear score functions while the algorithms based on convex optimization techniques are shown to be especially suited for non-parametric states that are iteratively updated via a quantum circuit. Also, endowing the scheme with the ability to process entangled copies of the target state results in enhanced performance.

In Ref. [4] it is shown how generative models can be enhanced via quantum correlations, possibly paving the way to a quantum advantage in a generative ML setting. They show how quantum correlations offer a powerful resource, using a numerical example on a practical problem, and even prove a separation in expressive power between a particular classical generative model and its quantum counterpart. The separation can be explained in terms of the quantum nonlocality and quantum contextuality. Furthermore, there may be classical ML improvements drawing from QI foundations (also known as ‘quantum inspired’ research, for example using MPS or TN). As an outlook, the authors show how their specific proof and example can be extended to more general model paradigms. It would be great to see practical implementations of generative modeling advantage on hardware in the coming years.

In variational quantum algorithms, it is known that highly expressive ansatze can access a close approximation of a desired solution to a number of variational problems and provide a flexible paradigm for programming near-term quantum computers. However, the circuit ansatz must have sufficiently large gradients to allow for proper training; much has been researched about the 'barren plateau' phenomenon in this context and it is seen as a big challenge observed in variational QML and QGAN. Holmes et al. [5] attempts to derive a fundamental relationship between two essential ansatze properties: expressibility and trainability. The resulting bounds in the work by Holmes et al. [5] indicates that highly expressive anatze exhibit flatter cost landscapes and therefore will be harder to train. Therefore, on one hand making the quantum neural network more expressive leads to better representation of a function. However, on the other hand, larger expressivity leads to smaller gradients, which lead to the barren plateau problem and convergence issues. Strategies such as using structured ansatze, physics informed circuits, correlating parameters and restricting rotation angles have been proposed to avoid or mitigate barren plateaus. Further exploring these and other strategies will be an important direction for future research.

Also recently, the connection between quantum machine learning models and kernel methods was explored, more rigorously than previously. Oftentimes, a variational quantum circuit in QML is compared to a neural network (and aptly named a ‘QNN’). Such circuits typically consist of encoding circuit, a variational ansatz, and a measurement. But conceptually, these quantum models really only consist of two parts: the data encoding/embedding and the measurement. Training a quantum model is then simply the problem of finding the measurement that minimises a data-dependent cost function. The author in [6] argues such quantum models are typically much more related to kernel methods than NN’s. Besides being conceptually very interesting, this view also yields several interesting benefits: for example, it refocuses the attention of a potential quantum advantage over classical methods to the way data is encoded into quantum states. Also, kernel-based training guarantees to find the global optimum, which is typically hard for variational circuit training. It is also shown that the main drawback of kernel-based methods is the same reason why NN methods superseded kernel methods in Big Data applications: while NN training is theoretically very hard, empirically O(N) training time can yield good results, while kernel training requires a runtime of at least O(N^2) in a classical feedback loop, even though the training is convex (‘efficient’). However, the model could also be trained using quantum algorithms, and is known to have a quadratic speedup to O(N), although such protocols require large fault-tolerant quantum hardware.

Quantum machine learning is a promising candidate for quantum advantage and recent results indicate the possibility of an advantage for certain tasks. However, this is still ongoing research with various aspects that need to be investigated. A number of algorithms are proposed claiming enhanced performance in numerics but in particular the training and learning problems are essential to explore under future scopes. Similarly, further generalizations on learning targets, regression, and classification tasks are required for implementation on experimental platforms.

References:

[1] Huang H. Y., Kueng R., J. Preskill, “Information-theoretic bounds on quantum advantage in machine learning” arXiv:2101.02464 (Jan. 2021)

[2] Y. Wu, J. Yao, P. Zhang, H. Zhai, “Expressivity of Quantum Neural Networks” arXiv:2101.04273 (Jan. 2021)

[3] P. Braccia, F. Caruso, L. Banchi, “How to enhance quantum generative adversarial learning of noisy information” arXiv:2012.05996 (Dec. 2020)

[4] Xun Gao, Eric R. Anschuetz, Sheng-Tao Wang, J. Ignacio Cirac, Mikhail D. Lukin, “Enhancing Generative Models via Quantum Correlations” arXiv:2101.08354 (Jan. 2021)

[5] Z.Holmes, K. Sharma, M. Cerezo, P.J. Coles, “Connecting ansatz expressibility to gradient magnitudes and barren plateaus” arXiv:2101.02138 (Jan. 2021)

[6] Maria Schuld, “Quantum Machine Learning Models are kernel methods” arXiv:2101.11020 (Jan. 2021)

Published in
Blog

Tagged under

- Avoiding Barren Plateaus 22 January 2022 in Blog
- Decompositional QGNN 15 January 2022 in Blog
- Quantum Capsule Networks 08 January 2022 in Blog
- Quantum Digital-Analog Fermionic Simulation 31 December 2021 in Blog
- Quantum Semidefinite Programming 19 December 2021 in Blog
- Shallow Circuits in Variational QML 14 December 2021 in Blog

Copyright © Qu & Co BV