Classical machine learning research is blossoming in recent years. Some of the most interesting approaches focus on studying graphs and developing algorithms to exploit the information contained in their structures. In many fields relevant to the modern world, such as chemistry, bioinformatics, social network analysis, computer vision, and natural language, one can find inherent graph structures which possess valuable information about the underlying problem structure. One way to find the similarities between various graph structures is done via graph kernels. The graph kernel approach is based on finding a way to associate any graph with a feature vector encapsulating its relevant characteristics (the feature map) and then computing the similarity between those vectors, in the form of a scalar product in the feature space.
However, the design of a graph kernel is always dependent on the problem and the specific features that need to be encapsulated. Thus, its biggest limitation is the algorithmic complexity when applied to a variety of problems. Typical examples are the graphlets subsampling and the random walk kernels. In this work, the authors propose a versatile and easily scalable graph kernel based on the time-evolution of a quantum system, where the information about a graph is encoded in the parameters of a tunable Hamiltonian acting on an array of qubits. An alternating sequence of free Hamiltonian evolution produces measurement samples that retain key features of the data.
The proposed Quantum Evolution (QE) Kernel approach is based on associating each graph with a probability distribution, obtained by the measurement of an observable on a quantum system whose dynamics are driven by the topology of the associated graph. The QE kernel between two graphs is given as a distance between their respective probability distributions. The protocol was tested on a graph classification task on several datasets and results were compared with other graph kernels. Each graph kernel is associated with a Support Vector Machine, and the accuracy is obtained via a score, namely the proportion of graphs in the test set that is attributed to the correct class. A grid search is performed in the defined range and the value with the best score is selected and averaged over 10 repetitions of the following cross-validation procedure: the dataset is first randomly split into 10 subsets of equal size, and the accuracy of the kernel is then measured on each of these 10 subsets, after having been trained on the 9 other subsets. The accuracy resulting from this split is then the average accuracy over the 10 possible choices of the test subset. The performance of the kernel is also shown to be stable against detection error and noise.
The results show that for all datasets, at least one quantum kernel is better than the best classical kernel. The depth of the time evolution is quite crucial as better accuracy is obtained by increasing the number of layers for the Ising evolution, suggesting richer features can be captured by adding more layers. However, as shown in the IMDB-MULTI and Fingerprints experiments, the larger layer Ising scheme performed worse, because of the incomplete convergence of the optimization. Bigger datasets require more function evaluations, which require a larger computational budget.
The physical implementation of the approach is presented for the case of a programmable array of qubits made of neutral atoms trapped in optical tweezers. By promoting the atoms to Rydberg states, they behave as huge electric dipoles and experience dipole-dipole interactions that map into spin Hamiltonians. The spins undergo various types of interactions depending on the Rydberg states involved in the process, leading to different Hamiltonians. This platform is limited to smaller graphs for now (~100 nodes), however, the authors point out that potentially a significant speed-up could also be gained by implementing variations of near-term quantum computing algorithms running on classical compute hardware like GPUs, in a so-called quantum-inspred manner, possibly applicable to the current approach as well. Moreover, designing the right observable and choosing the appropriate Hamiltonian are crucial parameters, in order to improve the accuracy of the design. Finally, potential improvement of such experiments lies in the use of multi-kernel learning, which combines different kernels built from the same sequence of measurements on the final states, and optimization of noise cancellation techniques.