I've been doing some exploratory work on quantum kernel methods (I'm a newcomer to the field), and I struggle to reconcile two facts.
- As I increase the number of qubits, the Hilbert space becomes so big that most samples tend to be orthogonal to each other, therefore losing any similarity measure the kernel is supposed to find. At this point, the Gram matrix of the kernel tends to the Identity matrix, from which it is impossible to make any predictions. Hence, in order for a quantum kernel method to work, I end up using a small qubit register (as small as 3 qubits).
- A necessary (but not sufficient) condition for quantum advantage is that the circuit must be hard to simulate or compute classically. However, a 3 qubit circuit is trivial to simulate on a classical computer. In fact, it's probably easier if one considers that the dimensionality of the data might require a deep circuit to encode all data points, which poses some problems in terms of error.
Am I on the right path? or is any of these facts not entirely true?
I brief overview of my thinking would be as follows.
I appreciate Quantum Kernel Methods are essentially linear models on the encoded data, and the real substance is in the encoding / feature map. And it may well be the case that single qubits rotations and entangling gates encode the data in a very useful way. However, as long as it is done on just 3 qubits, I can 'borrow' the idea of that feature map from the quantum circuit and just simulate it on a classical computer.
Is it the case that there might be an exceptionally good encoding circuit that is able to capture similarity on a 30-qubit Hilbert space, therefore needing a quantum computer to avoid manipulating matrices of dimension 2^30 by 2^30? Is this the scenario where quantum kernel methods offer an advantage?
I've seen papers showing 20+ qubit circuits with nice looking Gram matrices (with the desirable block diagonal structure), however, upon further inspection, I see the scale of the colormap is [0:0.2]. Then I wonder, does this pose a problem in terms of the number of shot required to resolve values in that rather small interval (compared to what would be required if the similarity was noticeable in the interval [0:1])?