08 January 2022
##
Quantum Capsule Networks

Artificial Intelligence incorporates two important complementary approaches namely connectionism and symbolism. Connectionism aims to model intelligence as an emergent phenomenon by connecting a large number of neurons while symbolic AI is based on logic, deduction, and higher-level symbolic representations. Recent attempts focus on uniting these two approaches leading to the development of Capsule Networks (CapsNets).

The building block of capsule networks is a so-called ‘capsule’ which is a group of neurons represented by a vector to encode different features of an observable entity such as shape, color, texture, deformation, etc. The information is then transferred through capsule layers hierarchically i.e. the capsule in the higher-level layer is predicated upon its geometric relationship with the lower-level one. This routing mechanism used in CapsNets can preserve the geometric relationships amongst entities and hence have more intrinsic explainability than regular (classical) Neural Networks (NNs). Inspired by the advantages of classical CapsNets, the authors in this paper propose a quantum capsule network (QCapsNet) to explore a potential quantum advantage over the classical counterpart. In QCapsNets, interacting qubits are encapsulated into a capsule as the building block of the architecture.

QCapsNets consist of three crucial components; i) the preprocessing layers, ii) the capsule layers, and iii) the quantum dynamic routing process. Firstly, the model’s input is fed into the preprocessing layers to extract some preliminary features. These features are then encapsulated into several quantum states and then sent to the capsule layers. Inside each capsule, there are a group of interacting qubits building up a sub-quantum neural network (sub-QNN). To enable the feed-forward process between the two adjacent capsule layers, a quantum dynamic routing algorithm is proposed. This routing algorithm operates with a certain routing probability which is dynamically updated with the geometric relationship between capsule layers.

For benchmarking the performance of QCapsNets against classification accuracy, numerical experiments were performed on three proposed models of QCapsNets, each with different sub-QNNs i.e. i) parameterized quantum circuit (PQC), ii) deep quantum feed-forward neural network (DQFNN), and iii) post-selection deep quantum feedforward neural network (post-DQFNN). The QCapsNets is then applied to the classification of handwritten digit images in the MNIST dataset. The results demonstrate that the inaccuracy of all three QCapsNets is reduced to less than 2 × 10^(−2) as the number of parameters increases. Also, the proposed quantum dynamic routing algorithm is able to evaluate the distance measure of quantum states in parallel, and thus is suggested to achieve an exponential speedup over its classical counterpart, given the same number of parameters. No rigorous proof has been established with this regard though.

Furthermore, the work tests the efficiency of QCapsNets in making critical decisions. To achieve this, the QCapsNet is attached to a classical encoder (CNN) and a classical decoder (feed-forward network). The whole network is used to reconstruct the input image from the MNIST dataset. The results demonstrate that the potential to encode information in each capsule seems to grow close to exponentially, just as the Hilbert space grows exponentially with respect to the number of qubits, implying a potential for quantum advantage compared to classical CapsNets.

This work suggests QCapsNets as an efficient approach demonstrating state-of-the-art accuracy among quantum classifiers. It is claimed that by means of geometric relationships, QCapsNets can capture the essential features of the input data, and then encapsulate them into the output capsule of which one specific subspace could correspond to a human-understandable feature of the input data.

QCapsNets can act as guide towards explainable quantum artificial intelligence, with some potential directions to explore being consideration of other distance measures to quantify the geometric relationships between capsule states, and exploration of the robustness of QCapsNets against adversarial perturbations as compared to traditional quantum classifiers.

The building block of capsule networks is a so-called ‘capsule’ which is a group of neurons represented by a vector to encode different features of an observable entity such as shape, color, texture, deformation, etc. The information is then transferred through capsule layers hierarchically i.e. the capsule in the higher-level layer is predicated upon its geometric relationship with the lower-level one. This routing mechanism used in CapsNets can preserve the geometric relationships amongst entities and hence have more intrinsic explainability than regular (classical) Neural Networks (NNs). Inspired by the advantages of classical CapsNets, the authors in this paper propose a quantum capsule network (QCapsNet) to explore a potential quantum advantage over the classical counterpart. In QCapsNets, interacting qubits are encapsulated into a capsule as the building block of the architecture.

QCapsNets consist of three crucial components; i) the preprocessing layers, ii) the capsule layers, and iii) the quantum dynamic routing process. Firstly, the model’s input is fed into the preprocessing layers to extract some preliminary features. These features are then encapsulated into several quantum states and then sent to the capsule layers. Inside each capsule, there are a group of interacting qubits building up a sub-quantum neural network (sub-QNN). To enable the feed-forward process between the two adjacent capsule layers, a quantum dynamic routing algorithm is proposed. This routing algorithm operates with a certain routing probability which is dynamically updated with the geometric relationship between capsule layers.

For benchmarking the performance of QCapsNets against classification accuracy, numerical experiments were performed on three proposed models of QCapsNets, each with different sub-QNNs i.e. i) parameterized quantum circuit (PQC), ii) deep quantum feed-forward neural network (DQFNN), and iii) post-selection deep quantum feedforward neural network (post-DQFNN). The QCapsNets is then applied to the classification of handwritten digit images in the MNIST dataset. The results demonstrate that the inaccuracy of all three QCapsNets is reduced to less than 2 × 10^(−2) as the number of parameters increases. Also, the proposed quantum dynamic routing algorithm is able to evaluate the distance measure of quantum states in parallel, and thus is suggested to achieve an exponential speedup over its classical counterpart, given the same number of parameters. No rigorous proof has been established with this regard though.

Furthermore, the work tests the efficiency of QCapsNets in making critical decisions. To achieve this, the QCapsNet is attached to a classical encoder (CNN) and a classical decoder (feed-forward network). The whole network is used to reconstruct the input image from the MNIST dataset. The results demonstrate that the potential to encode information in each capsule seems to grow close to exponentially, just as the Hilbert space grows exponentially with respect to the number of qubits, implying a potential for quantum advantage compared to classical CapsNets.

This work suggests QCapsNets as an efficient approach demonstrating state-of-the-art accuracy among quantum classifiers. It is claimed that by means of geometric relationships, QCapsNets can capture the essential features of the input data, and then encapsulate them into the output capsule of which one specific subspace could correspond to a human-understandable feature of the input data.

QCapsNets can act as guide towards explainable quantum artificial intelligence, with some potential directions to explore being consideration of other distance measures to quantify the geometric relationships between capsule states, and exploration of the robustness of QCapsNets against adversarial perturbations as compared to traditional quantum classifiers.

Published in
Blog

Tagged under

15 January 2022
##
Decompositional QGNN

Quantum Machine Learning is a field that aims to solve Machine Learning problems with quantum algorithms and quantum computing. There exists a large variety of quantum algorithms used in solving such problems, with the most common being Quantum Neural Networks (QNNs). Varieties of QNNs include Quantum Convolutional Neural Network (QCNN), Quantum Generative Adversarial Network (QGAN), and Quantum Graph Neural Network (QGNN). As the names suggest, these algorithms are inspired from their classical counterparts. In this paper, the proposed algorithm is a hybrid quantum-classical algorithm for graph-structured data, which is called Decompositional Quantum Graph Neural Network (DQGNN), and is based on the classical GNN, which is one of the most popular learning tools, especially for dealing with non-Euclidean data. The main differentiator between the DQGNN and previous QGNN implementations is the decompositional aspect of this algorithm, that allows the computation of larger-sized graph-structured data with a fixed-sized quantum circuit with limited number of qubits, regardless of the size of the data. Algorithms based on GNNs have found applications in social network analysis, drug design, combinatorial optimization, and multi-agent learning.

GNNs function on the basis of the neighborhood aggregation strategy, which involves updating the representation of a graph node by recursively aggregating the representations of its neighbors. So far, a variety of GNNs has been realized such as the Relational Graph Convolutional Networks (R-GCN), Graph Isomorphism Networks (GIN), edGNN, Random Walk Graph Neural Networks (RW-GNN), and Factorizable Graph Convolutional Networks (Factor GCN). However, these approaches embed nodes represented as points in a Euclidean space leading to significant structural distortion and information loss during computation. On the other hand, DQGNN is a hybrid quantum-classical algorithm for graph-structured data, that focuses on effective and scalable learning.

DQGNN utilizes tensor and unitary matrices to implement a quantum GNN by designing a quantum circuit and replacing the Euclidean weight matrices of the GNN with unitary matrices, i.e. quantum gates. Utilizing tensor products offers two advantages: i) enlarges the node representation space exponentially such that nodes with different features can be mapped to different representations, and ii) requires significantly fewer model parameters than related deep learning models. DQGNN consists of the following processing steps; 1) A classical computer is used to decompose graphs into subgraphs. 2) The circuit of the mapping method is trained with node features. After training, the node features of the subgraphs are mapped into quantum states by applying the circuit. 3) The quantum circuit is implemented on a quantum device to compute the representations of the different subgraphs sequentially. 4) Finally, a classical computer computes the entropies of the individual representations and combines them. All these steps are iterated for obtaining graph representations for classification.

Besides the proposed decompositional processing strategy, a trainable method is also proposed for mapping data from a Euclidean space to a Hilbert space, which can maintain distance relations and reduce information loss during the mapping.

DQGNN is evaluated experimentally on six graph classification benchmarks. The compared methods include graph kernels, deep learning methods, and quantum machine learning methods. DQGNN is also compared with recent baselines for graph classification: 1) Kernel-based methods: WL subtree kernel, and Subgraph Matching Kernel (CSM), 2) Representative GCNs, including Deep Graph CNN’s (DGCNN) and Relational Graph Convolutional Networks (R-GCN), 3) Representative GNNs, i.e., Graph Isomorphism Network (GIN), edGNN, and RW-GNN. A derivative-free optimization method, UOBYQA is utilized for training DQGNN. A quantum simulator is used to implement quantum machine learning models on a classical computer utilizing codes by Qiskit and Tensorflow Quantum.

The results demonstrate that DQGNN achieves the best results on 5 out of 6 benchmarks with accuracy close to GNNs, while requiring fewer parameters in comparison. As compared to the GNN model with the least parameters referenced as DGCNN, the authors find that DQGNN achieves similar performance with only 1.68% parameters of the classical counterpart. The performance is also compared with alternative quantum machine learning methods like QSVM (Quantum-enhanced Support Vector Machine) and QCNN (Quantum Convolutional Neural Networks) using the graphlet count vector as the inputs (Gra+QSVM, Gra+QCNN). Results show that Gra+QSVM, Gra+QCNN and Gra+QCNN with the proposed trainable mapping method (Gra+QCNN w/M) demonstrate no improvement on DQGNN. However, as compared to Gra+QCNN, Gra+QCNN w/M achieve higher accuracy, therefore highlighting the effectiveness of the proposed mapping method. For example, in the case of the PROTEINS dataset, the accuracy achieved by DQGNN is 9% higher than that of DQGNN without the mapping. This improvement can be resulted from less information loss in the former case.

DQGNN is a method that can be applied to any application that includes graph-structured data, and offers many advantages compared to other contemporary algorithms. The main advantage lies in the ability to divide the graph into sub-graphs and process each sub-graph one at a time with DQGNN, while using the same quantum circuit. Such methods can help with potential scalability issues that other algorithms might exhibit. Furthermore, it is speculated that the proposed mapping method can enhance the accuracy of alternative quantum machine learning methods, because of less information loss.

DQGNN provides a good insight into transitioning to larger-sized graph-structured data during the NISQ era (limited qubits); requiring fewer parameters but having similar performance to other contemporary quantum algorithms.

GNNs function on the basis of the neighborhood aggregation strategy, which involves updating the representation of a graph node by recursively aggregating the representations of its neighbors. So far, a variety of GNNs has been realized such as the Relational Graph Convolutional Networks (R-GCN), Graph Isomorphism Networks (GIN), edGNN, Random Walk Graph Neural Networks (RW-GNN), and Factorizable Graph Convolutional Networks (Factor GCN). However, these approaches embed nodes represented as points in a Euclidean space leading to significant structural distortion and information loss during computation. On the other hand, DQGNN is a hybrid quantum-classical algorithm for graph-structured data, that focuses on effective and scalable learning.

DQGNN utilizes tensor and unitary matrices to implement a quantum GNN by designing a quantum circuit and replacing the Euclidean weight matrices of the GNN with unitary matrices, i.e. quantum gates. Utilizing tensor products offers two advantages: i) enlarges the node representation space exponentially such that nodes with different features can be mapped to different representations, and ii) requires significantly fewer model parameters than related deep learning models. DQGNN consists of the following processing steps; 1) A classical computer is used to decompose graphs into subgraphs. 2) The circuit of the mapping method is trained with node features. After training, the node features of the subgraphs are mapped into quantum states by applying the circuit. 3) The quantum circuit is implemented on a quantum device to compute the representations of the different subgraphs sequentially. 4) Finally, a classical computer computes the entropies of the individual representations and combines them. All these steps are iterated for obtaining graph representations for classification.

Besides the proposed decompositional processing strategy, a trainable method is also proposed for mapping data from a Euclidean space to a Hilbert space, which can maintain distance relations and reduce information loss during the mapping.

DQGNN is evaluated experimentally on six graph classification benchmarks. The compared methods include graph kernels, deep learning methods, and quantum machine learning methods. DQGNN is also compared with recent baselines for graph classification: 1) Kernel-based methods: WL subtree kernel, and Subgraph Matching Kernel (CSM), 2) Representative GCNs, including Deep Graph CNN’s (DGCNN) and Relational Graph Convolutional Networks (R-GCN), 3) Representative GNNs, i.e., Graph Isomorphism Network (GIN), edGNN, and RW-GNN. A derivative-free optimization method, UOBYQA is utilized for training DQGNN. A quantum simulator is used to implement quantum machine learning models on a classical computer utilizing codes by Qiskit and Tensorflow Quantum.

The results demonstrate that DQGNN achieves the best results on 5 out of 6 benchmarks with accuracy close to GNNs, while requiring fewer parameters in comparison. As compared to the GNN model with the least parameters referenced as DGCNN, the authors find that DQGNN achieves similar performance with only 1.68% parameters of the classical counterpart. The performance is also compared with alternative quantum machine learning methods like QSVM (Quantum-enhanced Support Vector Machine) and QCNN (Quantum Convolutional Neural Networks) using the graphlet count vector as the inputs (Gra+QSVM, Gra+QCNN). Results show that Gra+QSVM, Gra+QCNN and Gra+QCNN with the proposed trainable mapping method (Gra+QCNN w/M) demonstrate no improvement on DQGNN. However, as compared to Gra+QCNN, Gra+QCNN w/M achieve higher accuracy, therefore highlighting the effectiveness of the proposed mapping method. For example, in the case of the PROTEINS dataset, the accuracy achieved by DQGNN is 9% higher than that of DQGNN without the mapping. This improvement can be resulted from less information loss in the former case.

DQGNN is a method that can be applied to any application that includes graph-structured data, and offers many advantages compared to other contemporary algorithms. The main advantage lies in the ability to divide the graph into sub-graphs and process each sub-graph one at a time with DQGNN, while using the same quantum circuit. Such methods can help with potential scalability issues that other algorithms might exhibit. Furthermore, it is speculated that the proposed mapping method can enhance the accuracy of alternative quantum machine learning methods, because of less information loss.

DQGNN provides a good insight into transitioning to larger-sized graph-structured data during the NISQ era (limited qubits); requiring fewer parameters but having similar performance to other contemporary quantum algorithms.

Published in
Blog

Tagged under

22 January 2022
##
Avoiding Barren Plateaus

During the current NISQ era of quantum computation the most common algorithms used to solve problems are based on a hybrid variational approach, where the quantum hardware is used to implement a variational wave function and measure observables, whereas the classical computer is used to store and update the variational parameters. Such algorithms are known as Variational Quantum Algorithms (VQAs) and are the most promising approach in reaching quantum advantage in the near future. However, despite proven advantages, variational algorithms are limited by the existence of barren plateaus (BPs) , which are regions of the optimization landscape where the gradients become vanishingly small. This makes the optimization exponentially hard as the gradients of the cost function are on average zero and deviations vanish exponentially in system size, which is furthermore exacerbated by finite estimation statistics in case of Hamiltonian averaging procedures.

So far, numerous approaches have been proposed to minimize this limitation. For instance, one can avoid BP at the initialization stage of variational algorithms, by performing pre-processing that finds a favorable initial configuration that can lead the system towards the desired optimization solution without the gradient becoming small. Also, studying the relation between BPs and entanglement can potentially lead to controlling entanglement to mitigate BPs. However, these approaches are impractical to implement on current quantum devices due to the difficulty in measuring entanglement. In an attempt to diagnose and avoid BPs, the authors in this paper propose the notion of weak barren plateaus (WBPs), which emerge when the entanglement of a local subsystem exceeds a certain threshold identified by the entanglement of a fully scrambled state. Utilizing WBPs, the authors propose a general algorithm to avoid barren plateaus that can be implemented on available NISQ devices, both at initialization and during the optimization.

The proposed algorithm estimates the expectation value of the cost function, its gradients, and the second Rényi entropy of small subsystems using classical shadow estimation during the optimization process. For the optimization procedure, gradient descent is used. The classical shadows protocol enables the tracking of the second Rényi entropy, which allows for an efficient diagnosis of the WBP both at the initialization step and during the optimization process of variational parameters. Upon encountering a WBP, as a certain subregion having a sufficiently large Rényi entropy, the algorithm restarts the optimization process with a decreased value of the update step. This process is controlled by a certain learning rate. The overall results show that the absence of WBPs is a sufficient condition for avoiding BPs.

The work further includes numerical simulations utilizing variational quantum eigensolvers (VQE) to test the proposed algorithm. For this, a WBP-free initial state is prepared using small qubit rotation angles. The algorithm is then implemented with different learning rates to compare the optimization performance. The results show that lower learning rates avoid WBPs during the optimization cycle, however if the learning rate is further decreased after an optimum value, then the algorithm converges slower and therefore to a larger (worse) energy due to the less than necessary training iterations. Hence, the learning rate should be sufficiently low enough to avoid BPs, but at the same time high enough such that it does not degrade the performance.

The results further indicate that entanglement also plays a crucial role in circumventing BPs and is important for achieving a good optimization performance. Moreover, observing the optimization performance according to the learning rate used, indicates that the system might get stuck in local minima characterized by large entanglement. The proposed way to avoid such areas is to define a parameter ‘alpha’ that suggests the presence of a WBP. It was found in the experimentation that setting the parameter ‘alpha’ to less than 1 can be beneficial for the optimization. This can help in avoiding low-quality local minima that are characterized by higher entanglement. However, this requires the entanglement structure of the target state to be previously known. One potential direction to further explore is whether an algorithm can be designed where the learning rate can be dynamically adjusted at every step of the optimization process and not only when a WBP is encountered. This may allow for efficiently maneuvering complicated optimization landscapes by staying clear of highly entangled local minima. Avoiding barren plateaus is a critical step in increasing the performance of any VQA, therefore it will be of interest to test the applicability of the techniques introduced here for quantum machine learning, quantum optimization, or variational time evolution.

So far, numerous approaches have been proposed to minimize this limitation. For instance, one can avoid BP at the initialization stage of variational algorithms, by performing pre-processing that finds a favorable initial configuration that can lead the system towards the desired optimization solution without the gradient becoming small. Also, studying the relation between BPs and entanglement can potentially lead to controlling entanglement to mitigate BPs. However, these approaches are impractical to implement on current quantum devices due to the difficulty in measuring entanglement. In an attempt to diagnose and avoid BPs, the authors in this paper propose the notion of weak barren plateaus (WBPs), which emerge when the entanglement of a local subsystem exceeds a certain threshold identified by the entanglement of a fully scrambled state. Utilizing WBPs, the authors propose a general algorithm to avoid barren plateaus that can be implemented on available NISQ devices, both at initialization and during the optimization.

The proposed algorithm estimates the expectation value of the cost function, its gradients, and the second Rényi entropy of small subsystems using classical shadow estimation during the optimization process. For the optimization procedure, gradient descent is used. The classical shadows protocol enables the tracking of the second Rényi entropy, which allows for an efficient diagnosis of the WBP both at the initialization step and during the optimization process of variational parameters. Upon encountering a WBP, as a certain subregion having a sufficiently large Rényi entropy, the algorithm restarts the optimization process with a decreased value of the update step. This process is controlled by a certain learning rate. The overall results show that the absence of WBPs is a sufficient condition for avoiding BPs.

The work further includes numerical simulations utilizing variational quantum eigensolvers (VQE) to test the proposed algorithm. For this, a WBP-free initial state is prepared using small qubit rotation angles. The algorithm is then implemented with different learning rates to compare the optimization performance. The results show that lower learning rates avoid WBPs during the optimization cycle, however if the learning rate is further decreased after an optimum value, then the algorithm converges slower and therefore to a larger (worse) energy due to the less than necessary training iterations. Hence, the learning rate should be sufficiently low enough to avoid BPs, but at the same time high enough such that it does not degrade the performance.

The results further indicate that entanglement also plays a crucial role in circumventing BPs and is important for achieving a good optimization performance. Moreover, observing the optimization performance according to the learning rate used, indicates that the system might get stuck in local minima characterized by large entanglement. The proposed way to avoid such areas is to define a parameter ‘alpha’ that suggests the presence of a WBP. It was found in the experimentation that setting the parameter ‘alpha’ to less than 1 can be beneficial for the optimization. This can help in avoiding low-quality local minima that are characterized by higher entanglement. However, this requires the entanglement structure of the target state to be previously known. One potential direction to further explore is whether an algorithm can be designed where the learning rate can be dynamically adjusted at every step of the optimization process and not only when a WBP is encountered. This may allow for efficiently maneuvering complicated optimization landscapes by staying clear of highly entangled local minima. Avoiding barren plateaus is a critical step in increasing the performance of any VQA, therefore it will be of interest to test the applicability of the techniques introduced here for quantum machine learning, quantum optimization, or variational time evolution.

Published in
Blog

Tagged under

13 December 2021
##
Improving Quantum K-Means

Many quantum machine learning algorithms draw inspiration out of their classical analogues and attempt at solving the same problem with the help of quantum computing. Q-means is one such algorithm which is the quantum analogue of the classical K-means algorithm. K-Means is an unsupervised learning algorithm in machine learning, which groups the unlabeled dataset into different clusters, thus discovering the categories of groups in the unlabeled dataset. On the other hand, Q-means is implemented on the quantum states to obtain characteristic cluster quantum states, i.e. the quantum states that are in a superposition of all the quantum states in a particular cluster. This is done by evaluating a cost function based on the cluster quantum states through measuring the overlap or fidelity between the characteristic cluster quantum states, ensuring high separation between clusters and low cluster scatter about the centroids.

Early implementations of Q-means claim an exponential speedup over its classical counterpart, which sounds very promising since this is a NP-Hard problem for d-dimensional observations; however, this speedup is only possible for linearly separable clusters. Q-means performs cluster assignment based on Euclidean distances, which does not work for non-linearly separable clusters in the original space, which can be a significant limitation. Therefore, there is a need for feature mapping / kernel methods to efficiently separate the data in a high dimensional space, which is closely related to the Support Vector Machine (SVM) method of operation.

This paper proposes a hybrid quantum-classical algorithm that learns a suitable quantum feature map which separates non linearly separable unlabeled data using a variational quantum feature map and Q-means as a subroutine for unsupervised learning. The objective of the paper is the proposal of the new hybrid algorithm that includes Q-means. The objective of the algorithm is to maximally separate the clusters in the quantum feature Hilbert space.

The quantum circuit involves encoding unlabeled classical data using quantum gates that perform unitary operations on the initial quantum state based on quantum parameters and input data. To perform quantum feature encoding, 1) a variational circuit is used to learn the best possible quantum feature map that obtains meaningful clustering results. Since the quantum feature Hilbert space dimensions are exponential with respect to qubits used in the initial state, high dimensional feature spaces can be efficiently implemented. 2) Kernel estimation and cluster assignment is then performed by implementing the Q-means sub-routine. 3) Finally, measurement is done to compute the overlap between the characteristic cluster quantum states, that also acts as a measure of separation between the clusters. 4) The output of the complete quantum circuit is used to compute the value of the cost function that is based on the Hilbert-Schmidt distance between the density matrices of the characteristic cluster quantum states.

The simulations were implemented to compute overlap cost function using Pennylane’s QAOA embedding framework. Sk-learn is used to pre-process all the datasets before quantum embedding. Cluster datasets with labels were used as inputs for the quantum embedding circuit to train the quantum parameters. For the considered 3 datasets; blobs, concentric circles and moons, the algorithm took the least iterations to converge for the blobs as they are very well separated in the original space, whereas in the case of non-linearly separable datasets, the algorithm took a larger number of iterations to converge.

The overall method adapts a repetitive approach of performing unsupervised learning followed by adaptively training the feature map, which aims at learning the best representation of the data in the quantum feature Hilbert space. It is shown that one can explore more feature spaces that are better suited for the embedded dataset by explicitly implementing the kernel estimation. However, since there are no quantum simulators available to test Q-means to date, the work could not establish quantum advantage experimentally.

Early implementations of Q-means claim an exponential speedup over its classical counterpart, which sounds very promising since this is a NP-Hard problem for d-dimensional observations; however, this speedup is only possible for linearly separable clusters. Q-means performs cluster assignment based on Euclidean distances, which does not work for non-linearly separable clusters in the original space, which can be a significant limitation. Therefore, there is a need for feature mapping / kernel methods to efficiently separate the data in a high dimensional space, which is closely related to the Support Vector Machine (SVM) method of operation.

This paper proposes a hybrid quantum-classical algorithm that learns a suitable quantum feature map which separates non linearly separable unlabeled data using a variational quantum feature map and Q-means as a subroutine for unsupervised learning. The objective of the paper is the proposal of the new hybrid algorithm that includes Q-means. The objective of the algorithm is to maximally separate the clusters in the quantum feature Hilbert space.

The quantum circuit involves encoding unlabeled classical data using quantum gates that perform unitary operations on the initial quantum state based on quantum parameters and input data. To perform quantum feature encoding, 1) a variational circuit is used to learn the best possible quantum feature map that obtains meaningful clustering results. Since the quantum feature Hilbert space dimensions are exponential with respect to qubits used in the initial state, high dimensional feature spaces can be efficiently implemented. 2) Kernel estimation and cluster assignment is then performed by implementing the Q-means sub-routine. 3) Finally, measurement is done to compute the overlap between the characteristic cluster quantum states, that also acts as a measure of separation between the clusters. 4) The output of the complete quantum circuit is used to compute the value of the cost function that is based on the Hilbert-Schmidt distance between the density matrices of the characteristic cluster quantum states.

The simulations were implemented to compute overlap cost function using Pennylane’s QAOA embedding framework. Sk-learn is used to pre-process all the datasets before quantum embedding. Cluster datasets with labels were used as inputs for the quantum embedding circuit to train the quantum parameters. For the considered 3 datasets; blobs, concentric circles and moons, the algorithm took the least iterations to converge for the blobs as they are very well separated in the original space, whereas in the case of non-linearly separable datasets, the algorithm took a larger number of iterations to converge.

The overall method adapts a repetitive approach of performing unsupervised learning followed by adaptively training the feature map, which aims at learning the best representation of the data in the quantum feature Hilbert space. It is shown that one can explore more feature spaces that are better suited for the embedded dataset by explicitly implementing the kernel estimation. However, since there are no quantum simulators available to test Q-means to date, the work could not establish quantum advantage experimentally.

Published in
Blog

Tagged under

14 December 2021
##
Shallow Circuits in Variational QML

Variational Quantum Machine Learning (VQML) models are based on variational principles where computation is done on a quantum computer with classical model parameters. These algorithms have shown efficient performance even in the presence of noise in hardware implementations, making them compatible with current Noisy Intermediate-Scale Quantum (NISQ) devices.

At the moment, there exist many useful approaches for implementing VQML algorithms, especially for solving machine learning tasks, however no clear ‘optimal’ approach to designing the circuit architectures has been identified. Some approaches use reinforcement learning or simulated annealing to design variational quantum circuits, however these techniques are sample-inefficient. Furthermore, such techniques are not applicable when access to real devices is scarce and can be expensive for a higher number of qubits. In addition, deep quantum circuits in principle have a higher capacity over shallower circuits given a fixed number of qubits, however there is no clarity on how to take advantage of such benefit. The authors in this work attempt to establish a set of core design principles for VQML, focusing specifically to classification tasks. The objective also includes investigating the effects of key architecture properties on machine learning model performance, noise apparent in the system (inducing a range of errors into the model), and the number of qubits used.

The work explores an ansatz based on fixed blocks from which circuits are sampled and evaluated. The following workflow was established to explore a large range of candidate circuits given a dataset and (classical) optimizer, 1) creating a number of candidate circuits by randomly sampling the available design space with trainable parameters such as number of qubits, the entanglement pattern, the number of blocks and the configuration of blocks, 2) compiling these circuits into an executable Qiskit circuit, 3) deriving a number of noise models to be applied to circuits, based on the two decoherence times and gate duration, 4) compiling the noise models into Qiskit executable noise model, and 5) training all permutations of circuit designs and noise models and evaluating their performance.

The design guidelines were established for VQML classifiers based on experiments with a large number (n = 6500) of circuits applied to five common datasets. All VQML models (circuits) were trained using the COBYLA optimizer for a maximum of 200 epochs. The overall results suggest four design guidelines: i) Circuits should be as shallow as possible since they are more robust to two-qubit-gate errors and generally have higher accuracy potential in both noisy and noise-free simulations ii) Circuits should be as wide as possible, iii) The number of features should be kept small, which can be done through dimensionality reduction techniques like PCA and, iv) The noise level should be small. It is also observed that measurement error is only a significant factor if “large”, i.e. > 10%.

Since the results demonstrate that overall low depth circuits exhibit a larger variance, it is crucial to explore improvement in results when the number of qubits is increased further. Also, the design space and constraints on the proposed model introduce limitations in leveraging the maximum number of qubits. It is an important future scope to explore tradeoffs in relaxing these constraints. For instance, by using features multiple times, wider circuits can be implemented, however, this could increase circuit depth and potentially variance. Further study of various circuit designs can also involve the influence of the barren plateau problem. Moreover, a larger variety of noise models and sources of noise, especially inspired from noise in hardware, should be further studied to develop more noise resilient circuits. Finally, as these results represent only 5 datasets, a potential next step would be to analyze classification problems with more classes or features via exploring larger numbers of datasets and models.

At the moment, there exist many useful approaches for implementing VQML algorithms, especially for solving machine learning tasks, however no clear ‘optimal’ approach to designing the circuit architectures has been identified. Some approaches use reinforcement learning or simulated annealing to design variational quantum circuits, however these techniques are sample-inefficient. Furthermore, such techniques are not applicable when access to real devices is scarce and can be expensive for a higher number of qubits. In addition, deep quantum circuits in principle have a higher capacity over shallower circuits given a fixed number of qubits, however there is no clarity on how to take advantage of such benefit. The authors in this work attempt to establish a set of core design principles for VQML, focusing specifically to classification tasks. The objective also includes investigating the effects of key architecture properties on machine learning model performance, noise apparent in the system (inducing a range of errors into the model), and the number of qubits used.

The work explores an ansatz based on fixed blocks from which circuits are sampled and evaluated. The following workflow was established to explore a large range of candidate circuits given a dataset and (classical) optimizer, 1) creating a number of candidate circuits by randomly sampling the available design space with trainable parameters such as number of qubits, the entanglement pattern, the number of blocks and the configuration of blocks, 2) compiling these circuits into an executable Qiskit circuit, 3) deriving a number of noise models to be applied to circuits, based on the two decoherence times and gate duration, 4) compiling the noise models into Qiskit executable noise model, and 5) training all permutations of circuit designs and noise models and evaluating their performance.

The design guidelines were established for VQML classifiers based on experiments with a large number (n = 6500) of circuits applied to five common datasets. All VQML models (circuits) were trained using the COBYLA optimizer for a maximum of 200 epochs. The overall results suggest four design guidelines: i) Circuits should be as shallow as possible since they are more robust to two-qubit-gate errors and generally have higher accuracy potential in both noisy and noise-free simulations ii) Circuits should be as wide as possible, iii) The number of features should be kept small, which can be done through dimensionality reduction techniques like PCA and, iv) The noise level should be small. It is also observed that measurement error is only a significant factor if “large”, i.e. > 10%.

Since the results demonstrate that overall low depth circuits exhibit a larger variance, it is crucial to explore improvement in results when the number of qubits is increased further. Also, the design space and constraints on the proposed model introduce limitations in leveraging the maximum number of qubits. It is an important future scope to explore tradeoffs in relaxing these constraints. For instance, by using features multiple times, wider circuits can be implemented, however, this could increase circuit depth and potentially variance. Further study of various circuit designs can also involve the influence of the barren plateau problem. Moreover, a larger variety of noise models and sources of noise, especially inspired from noise in hardware, should be further studied to develop more noise resilient circuits. Finally, as these results represent only 5 datasets, a potential next step would be to analyze classification problems with more classes or features via exploring larger numbers of datasets and models.

Published in
Blog

Tagged under

11 November 2021
##
Quantum Kernel Bandwidth

Quantum Kernels have shown great promise in many quantum supervised machine learning methods where an appropriately defined kernel may provide speedups over classical ML methods. In quantum kernel methods, data points are mapped to quantum states using a quantum feature map, and the value of the kernel between two data points is given by some similarity measure (such as fidelity) of the corresponding quantum states. The advantage compared to classical kernels lies in processing the data using an exponentially sized Hilbert space and performing classically hard computations.

One major limitation to these methods is the exponentially decreasing fidelity of two random quantum states with the number of qubits, which leads to the exponential vanishing of kernel values that makes learning impossible, as such small differences can not be distinguished nor used for training. One can overcome this by controlling the inductive bias of the quantum kernel methods, by projecting the quantum state into a lower-dimensional subspace which can be done via hyperparameter tuning. One such hyperparameter is the kernel’s ‘bandwidth’. In this work, the authors identify quantum kernel bandwidth as a centrally important hyperparameter for quantum kernel methods.

The work considers the problem of supervised learning, specifically the task of classification. Given a training dataset, the goal is to learn a map from data points to labels that is in agreement with the true map with high probability on an unseen test set. The datapoint is encoded in a quantum state by a parameterized unitary. This unitary is referred to as a Hamiltonian evolution quantum feature map. A kernel matrix is then obtained by computing the quantum kernel for all pairs of data points. This value can be computed on a quantum computer by measuring the value of appropriate observable on the state. This kernel matrix is then used inside a Support Vector Machine (SVM) or other kernel methods.

The results demonstrate that varying the quantum kernel bandwidth, typically via the scaling factor and Trotter steps in the feature map, controls the expressiveness of the model. For the quantum feature maps considered, the bandwidth can be controlled by rescaling the data points. Numerical simulations were done with multiple quantum feature maps and datasets using up to 26 qubits. The results show that larger scaling factor leads to a narrow kernel for which the Support Vector Classifier can fit any labels, leading to overfitting. On the other hand, choosing too small values of scaling factor leads to a wide kernel making it insufficiently expressive, hence leading to underfitting. Optimizing the bandwidth can improve the performance of quantum kernel methods with qubit count. It was also observed that hardware limitations such as finite precision of controls and the variance introduced by sampling, do not limit the overall performance significantly.

The overall work discusses the potential of controlling the inductive bias of quantum kernels via projecting them into a lower-dimensional subspace using hyperparameter operations. Combining this projection with bandwidth optimization, leads to more precise modulation of the inductive bias of the model. Following the results of this work, it would be interesting to explore more elaborate feature maps with more tuned hyperparameters. Optimizing hyperparameters of quantum kernels may enable the tuning of inductive bias of the model in a classically feasible way and at the same time can bridge the gap between expensive fully trainable quantum embeddings and fixed feature maps.

One major limitation to these methods is the exponentially decreasing fidelity of two random quantum states with the number of qubits, which leads to the exponential vanishing of kernel values that makes learning impossible, as such small differences can not be distinguished nor used for training. One can overcome this by controlling the inductive bias of the quantum kernel methods, by projecting the quantum state into a lower-dimensional subspace which can be done via hyperparameter tuning. One such hyperparameter is the kernel’s ‘bandwidth’. In this work, the authors identify quantum kernel bandwidth as a centrally important hyperparameter for quantum kernel methods.

The work considers the problem of supervised learning, specifically the task of classification. Given a training dataset, the goal is to learn a map from data points to labels that is in agreement with the true map with high probability on an unseen test set. The datapoint is encoded in a quantum state by a parameterized unitary. This unitary is referred to as a Hamiltonian evolution quantum feature map. A kernel matrix is then obtained by computing the quantum kernel for all pairs of data points. This value can be computed on a quantum computer by measuring the value of appropriate observable on the state. This kernel matrix is then used inside a Support Vector Machine (SVM) or other kernel methods.

The results demonstrate that varying the quantum kernel bandwidth, typically via the scaling factor and Trotter steps in the feature map, controls the expressiveness of the model. For the quantum feature maps considered, the bandwidth can be controlled by rescaling the data points. Numerical simulations were done with multiple quantum feature maps and datasets using up to 26 qubits. The results show that larger scaling factor leads to a narrow kernel for which the Support Vector Classifier can fit any labels, leading to overfitting. On the other hand, choosing too small values of scaling factor leads to a wide kernel making it insufficiently expressive, hence leading to underfitting. Optimizing the bandwidth can improve the performance of quantum kernel methods with qubit count. It was also observed that hardware limitations such as finite precision of controls and the variance introduced by sampling, do not limit the overall performance significantly.

The overall work discusses the potential of controlling the inductive bias of quantum kernels via projecting them into a lower-dimensional subspace using hyperparameter operations. Combining this projection with bandwidth optimization, leads to more precise modulation of the inductive bias of the model. Following the results of this work, it would be interesting to explore more elaborate feature maps with more tuned hyperparameters. Optimizing hyperparameters of quantum kernels may enable the tuning of inductive bias of the model in a classically feasible way and at the same time can bridge the gap between expensive fully trainable quantum embeddings and fixed feature maps.

Published in
Blog

Tagged under

11 September 2021
##
Quantum Classifiers

Recently, there has been extensive research into quantum machine learning models and quantum algorithms that rival their classical counterparts. Some notable examples are image recognition advancements, automated driven cars, identification of skin cancers, and prediction of protein structures. However, one of the most well-known branches of machine learning is classification, which is widely applied in commercial and academic applications ranging from face recognition and recommendation systems to earthquake detection, disease diagnosis and further handling complex classification tasks.

Previous studies on quantum classifiers have extended popular classical classification algorithms to the quantum domain such as the quantum support vector machine, quantum nearest neighbor algorithms, and quantum decision tree classifiers, exhibiting the potential of providing quadratic or even exponential speedups.

This review paper introduces some important algorithms and models for quantum classifiers including i) quantum support vector machines, ii) quantum decision tree, iii) quantum nearest neighbors algorithms, iv) quantum annealing based classifiers, and v) variational quantum classifiers. The experimental advancements of the considered classifiers are also discussed along with theoretical progress.

So far, quantum kernel methods used in quantum support vector machines are proven to be the most successful to handle classification tasks. In that scenario, the quantum computer can be used to map the classical data to a Hilbert state space and estimate the inner products between those states to obtain the kernel matrix, which is further handled by classical computers.

Another strong candidate for classification is the quantum decision tree model, that uses von Neumann entropy for choosing the attributes to split the nodes. Recently proposed binary quantum decision tree classifier, named Q-tree, is based on a probabilistic approach where a quantum computer can be used to accomplish the probabilistic traversal of the decision tree via measurements, while tree inductions and predictions of query data can be integrated into this framework as well.

For the case of the quantum nearest neighbors algorithm, the authors discuss two algorithms. First is the quantum version of the nearest centroid algorithm which shows an exponential speedup for both the size of the training set and the dimension of the vectors. However, this is inefficient in few situations like when the centroid of the subset lies in the space of other subsets. The issue can be overcome by the second proposed algorithm, which includes a quantum random access memory, amplitude estimation, and the Dürr Høyer minimization algorithm. This algorithm can be implemented for the quantum nearest neighbors algorithm with a quadratic speedup over its classical counterparts.

Another type of classifier is the quantum annealing classifier. As the name suggests, quantum annealing is used for classification tasks reaching high levels of generalization and physical interpretability for various applications such as anomaly detection, Higgs boson classifications, classifying and ranking binding affinities, etc. In quantum annealing, two Hamiltonians are considered: a simple and easy to prepare Hamiltonian with a known ground state, and a problem Hamiltonian whose ground state represents the solution of the problem. By slowly evolving the simple Hamiltonian towards the problem Hamiltonian, one can find the solution of the problem. Quantum annealing uses quantum tunneling to explore the problem space and to avoid local minima.

The last type of quantum classifier that is discussed are variational quantum classifiers (VQCs), which are a subclass of quantum neural networks. As such, the parameters in a classification task will be optimized during the training process to minimize a predefined loss function. Many structures and protocols for VQCs have already been proposed, mainly addressing issues like gradient descent evaluation, neural network structure (convolutional, recurrent, etc.), avoiding barren plateaus, and vulnerability to adversarial attacks.

In terms of experimental advancements and proof-of-concept implementations to physical hardware, the authors discuss the following implementations. The distance-based quantum classifier was demonstrated in the superconducting platform of IBM Quantum Experience with a classic-quantum hybrid approach. The implementations for quantum nearest neighbors algorithms can be carried out using single photons acting as qubits generated through spontaneous parametric down-conversion. A nearest centroid classifier has been experimentally demonstrated on a eleven-qubit trapped ion quantum computer with a quantum device that is now commercially available via IonQ’s cloud service. Variational quantum classifiers have been experimentally implemented on the superconducting platform to classify artificially generated data. In this experiment, a variety of circuit depths was investigated, and it was shown that when error mitigation techniques were employed alongside a depth four circuit, 100% success rate was achieved.

Nevertheless, when discussing quantum advantage from the perspective of computational complexity, it has been proven that the classical optimization problems for variational quantum algorithms are NP-hard, even for classically tractable systems which are composed of only logarithmically many qubits or free fermions. Moreover, quantum advantage in machine learning can also be viewed in terms of the number of times that a quantum process E has been accessed. It has been proven that for any input distribution, a classical machine learning model can achieve accurate predictions on average by accessing E with times comparable to the optimal quantum machine learning model. By comparison, an exponential quantum advantage is possible for achieving an accurate prediction on all inputs. However, despite proven potential advantage, there are several challenges for quantum classifiers that limit their practical applications. First, implementing quantum random access memory (QRAM) with a desirable scale at the current stage. Second, due to inevitable noise in the quantum devices, it is crucial to increase the fidelity of the quantum operations and develop better error correction techniques. Third, further exploration of all aforementioned structures and protocols is required to identify potential advantage against their classical counterparts for future applications.

Based on these limitations, there is no commercial quantum classifier available at the moment. However, with the advancements of hardware platforms, quantum algorithms, and quantum error correction techniques, quantum classifiers are expected to provide significant advantage in the near future.

Previous studies on quantum classifiers have extended popular classical classification algorithms to the quantum domain such as the quantum support vector machine, quantum nearest neighbor algorithms, and quantum decision tree classifiers, exhibiting the potential of providing quadratic or even exponential speedups.

This review paper introduces some important algorithms and models for quantum classifiers including i) quantum support vector machines, ii) quantum decision tree, iii) quantum nearest neighbors algorithms, iv) quantum annealing based classifiers, and v) variational quantum classifiers. The experimental advancements of the considered classifiers are also discussed along with theoretical progress.

So far, quantum kernel methods used in quantum support vector machines are proven to be the most successful to handle classification tasks. In that scenario, the quantum computer can be used to map the classical data to a Hilbert state space and estimate the inner products between those states to obtain the kernel matrix, which is further handled by classical computers.

Another strong candidate for classification is the quantum decision tree model, that uses von Neumann entropy for choosing the attributes to split the nodes. Recently proposed binary quantum decision tree classifier, named Q-tree, is based on a probabilistic approach where a quantum computer can be used to accomplish the probabilistic traversal of the decision tree via measurements, while tree inductions and predictions of query data can be integrated into this framework as well.

For the case of the quantum nearest neighbors algorithm, the authors discuss two algorithms. First is the quantum version of the nearest centroid algorithm which shows an exponential speedup for both the size of the training set and the dimension of the vectors. However, this is inefficient in few situations like when the centroid of the subset lies in the space of other subsets. The issue can be overcome by the second proposed algorithm, which includes a quantum random access memory, amplitude estimation, and the Dürr Høyer minimization algorithm. This algorithm can be implemented for the quantum nearest neighbors algorithm with a quadratic speedup over its classical counterparts.

Another type of classifier is the quantum annealing classifier. As the name suggests, quantum annealing is used for classification tasks reaching high levels of generalization and physical interpretability for various applications such as anomaly detection, Higgs boson classifications, classifying and ranking binding affinities, etc. In quantum annealing, two Hamiltonians are considered: a simple and easy to prepare Hamiltonian with a known ground state, and a problem Hamiltonian whose ground state represents the solution of the problem. By slowly evolving the simple Hamiltonian towards the problem Hamiltonian, one can find the solution of the problem. Quantum annealing uses quantum tunneling to explore the problem space and to avoid local minima.

The last type of quantum classifier that is discussed are variational quantum classifiers (VQCs), which are a subclass of quantum neural networks. As such, the parameters in a classification task will be optimized during the training process to minimize a predefined loss function. Many structures and protocols for VQCs have already been proposed, mainly addressing issues like gradient descent evaluation, neural network structure (convolutional, recurrent, etc.), avoiding barren plateaus, and vulnerability to adversarial attacks.

In terms of experimental advancements and proof-of-concept implementations to physical hardware, the authors discuss the following implementations. The distance-based quantum classifier was demonstrated in the superconducting platform of IBM Quantum Experience with a classic-quantum hybrid approach. The implementations for quantum nearest neighbors algorithms can be carried out using single photons acting as qubits generated through spontaneous parametric down-conversion. A nearest centroid classifier has been experimentally demonstrated on a eleven-qubit trapped ion quantum computer with a quantum device that is now commercially available via IonQ’s cloud service. Variational quantum classifiers have been experimentally implemented on the superconducting platform to classify artificially generated data. In this experiment, a variety of circuit depths was investigated, and it was shown that when error mitigation techniques were employed alongside a depth four circuit, 100% success rate was achieved.

Nevertheless, when discussing quantum advantage from the perspective of computational complexity, it has been proven that the classical optimization problems for variational quantum algorithms are NP-hard, even for classically tractable systems which are composed of only logarithmically many qubits or free fermions. Moreover, quantum advantage in machine learning can also be viewed in terms of the number of times that a quantum process E has been accessed. It has been proven that for any input distribution, a classical machine learning model can achieve accurate predictions on average by accessing E with times comparable to the optimal quantum machine learning model. By comparison, an exponential quantum advantage is possible for achieving an accurate prediction on all inputs. However, despite proven potential advantage, there are several challenges for quantum classifiers that limit their practical applications. First, implementing quantum random access memory (QRAM) with a desirable scale at the current stage. Second, due to inevitable noise in the quantum devices, it is crucial to increase the fidelity of the quantum operations and develop better error correction techniques. Third, further exploration of all aforementioned structures and protocols is required to identify potential advantage against their classical counterparts for future applications.

Based on these limitations, there is no commercial quantum classifier available at the moment. However, with the advancements of hardware platforms, quantum algorithms, and quantum error correction techniques, quantum classifiers are expected to provide significant advantage in the near future.

Published in
Blog

Tagged under

25 September 2021
##
Overparametrizing quantum neural networks

Quantum Machine Learning (QML) models often consist of a data embedding followed by a parametrized quantum circuit often called Quantum Neural Network (QNN). So far, QNNs have played a significant role in quantum machine learning applications in function approximation, handling big data, modelling social networks, associative memory devices, and automated control systems. However, the training of such NNs requires optimizing the QNN’s parameters, which is a non-deterministic polynomial time problem.

The authors in this work employ the technique of over-parameterization, which is the regime where one trains a NN with a larger capacity than required to represent the distribution of the training data. Therefore, the QNN has more parameters to explore all relevant directions in the Hilbert space. Over-parametrizing a NN can improve its performance by improving the trainability and reducing generalization errors, leading to faster convergence. In this work, the authors provide a theoretical framework for the over-parametrization of QNNs and describe its correspondence to a computational phase transition, where the trainability of QNNs is vastly improved due to a landscape with less local minima. The work proposes three theorems to analyze the over-parameterization phenomenon followed by numerical verifications.

The first theorem derives an upper bound as a sufficient condition for over-parametrization. This indicates that for a general type of periodic-structured QNNs, an over-parameterized regime can be reached by increasing the number of parameters past some threshold critical value. This value is related to the dimension of the Dynamical Lie Algebra (DLA) associated with the generators of the QNN. The second theorem provides an operational meaning to the over-parameterization definition in terms of the model’s capacity i.e. increasing the number of parameters can never increase the model capacity beyond the upper bound derived in the first theorem. Lastly, the third theorem shows that the maximum number of relevant directions around the global minima of the optimization problem is always smaller than the upper bound, hence imposing a maximal rank on the Hessian when evaluated at the solution.

For numerical demonstration, three different optimization tasks were considered: i) finding the groundstate energy with a Variational Quantum Eigensolver (VQE), ii) decomposing a unitary to quantum gates through unitary compilation, and iii) compressing information with quantum autoencoding. For VQE, the problem sizes of n = 4, 6, 8, 10 qubits were considered for ansatzes with different depths L with both open and closed boundary conditions. After averaging over 50 random parameter initializations, it is observed that the loss function decreases exponentially with each optimization step when the number of layers is large enough. Similarly for unitary compilation, the numerical results suggest that as the depth of the circuit increases, the convergence towards the global optimum improves dramatically until reaching a saturation point. Lastly, the results for quantum autoencoding were obtained for a system of n = 4 qubits and for the same hardware efficient ansatz used for unitary compilation. Here, it was again observed that an over-parameterized QNN is able to accurately reach the global optima in a few iterations. The value of the obtained rank in contrast to the dimension of the DLA suggests that over-parameterization can be achieved with a number of parameters that is much smaller than the upper-bound.

It is noteworthy to mention a crucial difference between the results obtained for the Hamiltonian variational ansatz (theoretical) and for the hardware efficient ansatz (numerical). In the case of Hamiltonian variational ansatz, the dimension of the DLA scaled polynomially with n, whereas for the hardware efficient ansatz the dimension of the DLA grows exponentially with the system size. This implies that the system becomes over-parameterized at a polynomial number of parameters for the first case, while in the latter case one requires an exponentially large number of parameters. Since most ansatzes used for QNNs are ultimately hardware efficient ansatzes which exhibit barren plateaus, this may require an exponential number of parameters to be over-parameterized, which requires more effort in designing scalable and trainable ansatzes. Finally, a take-away message of this work is that QNN architectures with polynomially large DLAs will have more favorable loss landscapes due to the lack of barren plateaus.

One direction of future research would be to consider more complicated loss functions consisting of functions of different QNN contributions, such as those used in Quantum Circuit Learning for supervised learning or Differentiable Quantum Circuits for solving differential equations. It remains an open question whether such loss function would exhibit similar traits as those single-Hamiltonian minimizations considered here.

The authors in this work employ the technique of over-parameterization, which is the regime where one trains a NN with a larger capacity than required to represent the distribution of the training data. Therefore, the QNN has more parameters to explore all relevant directions in the Hilbert space. Over-parametrizing a NN can improve its performance by improving the trainability and reducing generalization errors, leading to faster convergence. In this work, the authors provide a theoretical framework for the over-parametrization of QNNs and describe its correspondence to a computational phase transition, where the trainability of QNNs is vastly improved due to a landscape with less local minima. The work proposes three theorems to analyze the over-parameterization phenomenon followed by numerical verifications.

The first theorem derives an upper bound as a sufficient condition for over-parametrization. This indicates that for a general type of periodic-structured QNNs, an over-parameterized regime can be reached by increasing the number of parameters past some threshold critical value. This value is related to the dimension of the Dynamical Lie Algebra (DLA) associated with the generators of the QNN. The second theorem provides an operational meaning to the over-parameterization definition in terms of the model’s capacity i.e. increasing the number of parameters can never increase the model capacity beyond the upper bound derived in the first theorem. Lastly, the third theorem shows that the maximum number of relevant directions around the global minima of the optimization problem is always smaller than the upper bound, hence imposing a maximal rank on the Hessian when evaluated at the solution.

For numerical demonstration, three different optimization tasks were considered: i) finding the groundstate energy with a Variational Quantum Eigensolver (VQE), ii) decomposing a unitary to quantum gates through unitary compilation, and iii) compressing information with quantum autoencoding. For VQE, the problem sizes of n = 4, 6, 8, 10 qubits were considered for ansatzes with different depths L with both open and closed boundary conditions. After averaging over 50 random parameter initializations, it is observed that the loss function decreases exponentially with each optimization step when the number of layers is large enough. Similarly for unitary compilation, the numerical results suggest that as the depth of the circuit increases, the convergence towards the global optimum improves dramatically until reaching a saturation point. Lastly, the results for quantum autoencoding were obtained for a system of n = 4 qubits and for the same hardware efficient ansatz used for unitary compilation. Here, it was again observed that an over-parameterized QNN is able to accurately reach the global optima in a few iterations. The value of the obtained rank in contrast to the dimension of the DLA suggests that over-parameterization can be achieved with a number of parameters that is much smaller than the upper-bound.

It is noteworthy to mention a crucial difference between the results obtained for the Hamiltonian variational ansatz (theoretical) and for the hardware efficient ansatz (numerical). In the case of Hamiltonian variational ansatz, the dimension of the DLA scaled polynomially with n, whereas for the hardware efficient ansatz the dimension of the DLA grows exponentially with the system size. This implies that the system becomes over-parameterized at a polynomial number of parameters for the first case, while in the latter case one requires an exponentially large number of parameters. Since most ansatzes used for QNNs are ultimately hardware efficient ansatzes which exhibit barren plateaus, this may require an exponential number of parameters to be over-parameterized, which requires more effort in designing scalable and trainable ansatzes. Finally, a take-away message of this work is that QNN architectures with polynomially large DLAs will have more favorable loss landscapes due to the lack of barren plateaus.

One direction of future research would be to consider more complicated loss functions consisting of functions of different QNN contributions, such as those used in Quantum Circuit Learning for supervised learning or Differentiable Quantum Circuits for solving differential equations. It remains an open question whether such loss function would exhibit similar traits as those single-Hamiltonian minimizations considered here.

Published in
Blog

Tagged under

22 July 2021
##
Quantum Training of a Classical DNN

Quantum Machine Learning (QML) techniques have been rigorously researched in the past few years, showing great promise in a number of applications, using various quantum algorithmic techniques and potential speedups as compared to its classical counterparts. Many of these techniques exploit the high dimensionality of the Hilbert space through kernel methods while most exponential speedups require fault-tolerant quantum computers with access to a quantum memory (QRAM).

While previous works have successfully established quantum analogues to common classical approaches like classification, clustering, regression, and other tasks in data analysis, there has been a lack of a clear demonstration of a quantum speedup for tasks on classical datasets for existing quantum neural networks proposals. While algorithms for quantum machine learning are largely based on methods from linear algebra, neural networks rely on nonlinearity to act as a universal approximator. Given the success of deep neural networks in classical machine learning, the use of quantum computers to efficiently train classical deep neural networks remains an interesting open question.

In this work, the authors show how one may exploit the benefit of deep neural networks via the Neural Tangent Kernel (NTK) formalism, i.e. increasing the number of hidden layers L. As the neural network is deepened, the NTK matrix becomes increasingly well-conditioned, improving the speed at which gradient descent trains the neural network. The authors propose a general framework for fully connected neural network architectures via a quantum algorithm to train a wide and deep neural network under an approximation of the NTK, estimating the trained neural network output with vanishing error as the training set size increases. Furthermore, the linear form of the NTK offers a general framework for neural network architectures similar to those used in state-of-the-art applications of deep learning. The work provides two different approximations: a sparsified NTK and a diagonal NTK approximation. The diagonal NTK is defined by setting all off-diagonal elements of the NTK to zero, while the sparsified NTK is defined by only permitting off-diagonal elements to be nonzero in any row or column. For both-approximations, the matrix element bounds of the NTK guarantee the convergence of the approximation and enable efficient gradient descent which highlights the correspondence between conditions for trainable classical neural networks and an efficient quantum algorithm.

The sparsified NTK approximation has a strictly tighter upper bound given by the Gershgorin circle theorem on the error compared to the diagonal NTK approximation, suggesting the superior performance of the former. To support this theoretical result, numerical demonstration has been done using the MNIST image dataset, more accurately considering the MNIST binary image classification task between pairs of digits. Two infinite-width neural network architectures are used: i) a feedforward fully-connected neural network and ii) an architecture based on the convolutional Myrtle network. In each case, the handwritten image dataset is projected onto the surface of a unit sphere, during the initial encoding into QRAM. The results demonstrate that even relatively common data distributions satisfy the necessary conditions for efficient quantum state preparation and quantum state measurement, fully realizing an exponential quantum speedup over gradient descent.

This work demonstrates a quantum algorithm to train classical neural networks in logarithmic time O(log n) and provides numerical evidence of such efficiency on the standard MNIST image dataset. As the neural tangent kernel offers a versatile framework across architectures such as convolutional neural networks, graph neural networks, and transformers, the approximation introduced by a sparsified or diagonal kernel in this work has the potential to extend to any chaotic kernel. Also, the increase of the depth of a chaotic kernel might give rise to the kernel structure key to well-conditioning and successful approximation in logarithmic time. This can potentially open new possibilities for improved machine learning methods to quantum computing beyond the scope of classical neural networks.

While previous works have successfully established quantum analogues to common classical approaches like classification, clustering, regression, and other tasks in data analysis, there has been a lack of a clear demonstration of a quantum speedup for tasks on classical datasets for existing quantum neural networks proposals. While algorithms for quantum machine learning are largely based on methods from linear algebra, neural networks rely on nonlinearity to act as a universal approximator. Given the success of deep neural networks in classical machine learning, the use of quantum computers to efficiently train classical deep neural networks remains an interesting open question.

In this work, the authors show how one may exploit the benefit of deep neural networks via the Neural Tangent Kernel (NTK) formalism, i.e. increasing the number of hidden layers L. As the neural network is deepened, the NTK matrix becomes increasingly well-conditioned, improving the speed at which gradient descent trains the neural network. The authors propose a general framework for fully connected neural network architectures via a quantum algorithm to train a wide and deep neural network under an approximation of the NTK, estimating the trained neural network output with vanishing error as the training set size increases. Furthermore, the linear form of the NTK offers a general framework for neural network architectures similar to those used in state-of-the-art applications of deep learning. The work provides two different approximations: a sparsified NTK and a diagonal NTK approximation. The diagonal NTK is defined by setting all off-diagonal elements of the NTK to zero, while the sparsified NTK is defined by only permitting off-diagonal elements to be nonzero in any row or column. For both-approximations, the matrix element bounds of the NTK guarantee the convergence of the approximation and enable efficient gradient descent which highlights the correspondence between conditions for trainable classical neural networks and an efficient quantum algorithm.

The sparsified NTK approximation has a strictly tighter upper bound given by the Gershgorin circle theorem on the error compared to the diagonal NTK approximation, suggesting the superior performance of the former. To support this theoretical result, numerical demonstration has been done using the MNIST image dataset, more accurately considering the MNIST binary image classification task between pairs of digits. Two infinite-width neural network architectures are used: i) a feedforward fully-connected neural network and ii) an architecture based on the convolutional Myrtle network. In each case, the handwritten image dataset is projected onto the surface of a unit sphere, during the initial encoding into QRAM. The results demonstrate that even relatively common data distributions satisfy the necessary conditions for efficient quantum state preparation and quantum state measurement, fully realizing an exponential quantum speedup over gradient descent.

This work demonstrates a quantum algorithm to train classical neural networks in logarithmic time O(log n) and provides numerical evidence of such efficiency on the standard MNIST image dataset. As the neural tangent kernel offers a versatile framework across architectures such as convolutional neural networks, graph neural networks, and transformers, the approximation introduced by a sparsified or diagonal kernel in this work has the potential to extend to any chaotic kernel. Also, the increase of the depth of a chaotic kernel might give rise to the kernel structure key to well-conditioning and successful approximation in logarithmic time. This can potentially open new possibilities for improved machine learning methods to quantum computing beyond the scope of classical neural networks.

Published in
Blog

Tagged under

09 July 2021
##
Quantum Evolution Kernel

Classical machine learning research is blossoming in recent years. Some of the most interesting approaches focus on studying graphs and developing algorithms to exploit the information contained in their structures. In many fields relevant to the modern world, such as chemistry, bioinformatics, social network analysis, computer vision, and natural language, one can find inherent graph structures which possess valuable information about the underlying problem structure. One way to find the similarities between various graph structures is done via graph kernels. The graph kernel approach is based on finding a way to associate any graph with a feature vector encapsulating its relevant characteristics (the feature map) and then computing the similarity between those vectors, in the form of a scalar product in the feature space.

However, the design of a graph kernel is always dependent on the problem and the specific features that need to be encapsulated. Thus, its biggest limitation is the algorithmic complexity when applied to a variety of problems. Typical examples are the graphlets subsampling and the random walk kernels. In this work, the authors propose a versatile and easily scalable graph kernel based on the time-evolution of a quantum system, where the information about a graph is encoded in the parameters of a tunable Hamiltonian acting on an array of qubits. An alternating sequence of free Hamiltonian evolution produces measurement samples that retain key features of the data.

The proposed Quantum Evolution (QE) Kernel approach is based on associating each graph with a probability distribution, obtained by the measurement of an observable on a quantum system whose dynamics are driven by the topology of the associated graph. The QE kernel between two graphs is given as a distance between their respective probability distributions. The protocol was tested on a graph classification task on several datasets and results were compared with other graph kernels. Each graph kernel is associated with a Support Vector Machine, and the accuracy is obtained via a score, namely the proportion of graphs in the test set that is attributed to the correct class. A grid search is performed in the defined range and the value with the best score is selected and averaged over 10 repetitions of the following cross-validation procedure: the dataset is first randomly split into 10 subsets of equal size, and the accuracy of the kernel is then measured on each of these 10 subsets, after having been trained on the 9 other subsets. The accuracy resulting from this split is then the average accuracy over the 10 possible choices of the test subset. The performance of the kernel is also shown to be stable against detection error and noise.

The results show that for all datasets, at least one quantum kernel is better than the best classical kernel. The depth of the time evolution is quite crucial as better accuracy is obtained by increasing the number of layers for the Ising evolution, suggesting richer features can be captured by adding more layers. However, as shown in the IMDB-MULTI and Fingerprints experiments, the larger layer Ising scheme performed worse, because of the incomplete convergence of the optimization. Bigger datasets require more function evaluations, which require a larger computational budget.

The physical implementation of the approach is presented for the case of a programmable array of qubits made of neutral atoms trapped in optical tweezers. By promoting the atoms to Rydberg states, they behave as huge electric dipoles and experience dipole-dipole interactions that map into spin Hamiltonians. The spins undergo various types of interactions depending on the Rydberg states involved in the process, leading to different Hamiltonians. This platform is limited to smaller graphs for now (~100 nodes), however, the authors point out that potentially a significant speed-up could also be gained by implementing variations of near-term quantum computing algorithms running on classical compute hardware like GPUs, in a so-called quantum-inspred manner, possibly applicable to the current approach as well. Moreover, designing the right observable and choosing the appropriate Hamiltonian are crucial parameters, in order to improve the accuracy of the design. Finally, potential improvement of such experiments lies in the use of multi-kernel learning, which combines different kernels built from the same sequence of measurements on the final states, and optimization of noise cancellation techniques.

However, the design of a graph kernel is always dependent on the problem and the specific features that need to be encapsulated. Thus, its biggest limitation is the algorithmic complexity when applied to a variety of problems. Typical examples are the graphlets subsampling and the random walk kernels. In this work, the authors propose a versatile and easily scalable graph kernel based on the time-evolution of a quantum system, where the information about a graph is encoded in the parameters of a tunable Hamiltonian acting on an array of qubits. An alternating sequence of free Hamiltonian evolution produces measurement samples that retain key features of the data.

The proposed Quantum Evolution (QE) Kernel approach is based on associating each graph with a probability distribution, obtained by the measurement of an observable on a quantum system whose dynamics are driven by the topology of the associated graph. The QE kernel between two graphs is given as a distance between their respective probability distributions. The protocol was tested on a graph classification task on several datasets and results were compared with other graph kernels. Each graph kernel is associated with a Support Vector Machine, and the accuracy is obtained via a score, namely the proportion of graphs in the test set that is attributed to the correct class. A grid search is performed in the defined range and the value with the best score is selected and averaged over 10 repetitions of the following cross-validation procedure: the dataset is first randomly split into 10 subsets of equal size, and the accuracy of the kernel is then measured on each of these 10 subsets, after having been trained on the 9 other subsets. The accuracy resulting from this split is then the average accuracy over the 10 possible choices of the test subset. The performance of the kernel is also shown to be stable against detection error and noise.

The results show that for all datasets, at least one quantum kernel is better than the best classical kernel. The depth of the time evolution is quite crucial as better accuracy is obtained by increasing the number of layers for the Ising evolution, suggesting richer features can be captured by adding more layers. However, as shown in the IMDB-MULTI and Fingerprints experiments, the larger layer Ising scheme performed worse, because of the incomplete convergence of the optimization. Bigger datasets require more function evaluations, which require a larger computational budget.

The physical implementation of the approach is presented for the case of a programmable array of qubits made of neutral atoms trapped in optical tweezers. By promoting the atoms to Rydberg states, they behave as huge electric dipoles and experience dipole-dipole interactions that map into spin Hamiltonians. The spins undergo various types of interactions depending on the Rydberg states involved in the process, leading to different Hamiltonians. This platform is limited to smaller graphs for now (~100 nodes), however, the authors point out that potentially a significant speed-up could also be gained by implementing variations of near-term quantum computing algorithms running on classical compute hardware like GPUs, in a so-called quantum-inspred manner, possibly applicable to the current approach as well. Moreover, designing the right observable and choosing the appropriate Hamiltonian are crucial parameters, in order to improve the accuracy of the design. Finally, potential improvement of such experiments lies in the use of multi-kernel learning, which combines different kernels built from the same sequence of measurements on the final states, and optimization of noise cancellation techniques.

Published in
Blog

Tagged under

- Avoiding Barren Plateaus 22 January 2022 in Blog
- Decompositional QGNN 15 January 2022 in Blog
- Quantum Capsule Networks 08 January 2022 in Blog
- Quantum Digital-Analog Fermionic Simulation 31 December 2021 in Blog
- Quantum Semidefinite Programming 19 December 2021 in Blog
- Shallow Circuits in Variational QML 14 December 2021 in Blog

Copyright © Qu & Co BV