Single-shot quantum machine learning
单次量子机器学习
Quantum machine learning aims to improve learning methods by using quantum computers. To realize its potential, many obstacles need to be overcome. A particularly pressing problem arises in the prediction phase, because the output of a quantum learning model is inherently random. This creates a typically substantial overhead, as the results of multiple executions of the quantum learning model must be aggregated to obtain a practical prediction. In this work, we analyze when quantum learning models can circumvent this problem and produce predictions in a near-deterministic manner—paving the way for single-shot quantum machine learning. We provide a rigorous definition of single-shotness in quantum classifiers and show that the degree to which a quantum learning model approaches determinism is constrained by the distinguishability of the embedded quantum states used in the model. Opening the black box of the embedding, we show that if the embedding is implemented by a quantum circuit, then a certain depth is required for single-shotness to be possible. We finally show that quantum learning models cannot achieve single-shotness in a general manner and, at the same time, be trainable.
I. Introduction
Machine learning is a rapidly advancing field whose rapid progress frequently overturns assumptions about what classical computers can and cannot learn. This ongoing success story has spurred interest in quantum machine learning, which is an intersection with quantum computing, which has also seen great technological advancements recently. Investigating whether quantum computers can be used to build a learning model that surpasses classical counterparts has become one of the main research directions in quantum machine learning.
While the intrinsic quantum properties of these models have at least the theoretical potential to break the boundaries of classical possibilities [1-4], they also come with inherent drawbacks. Previous research in this area has mainly focused on the training phase, where problems such as barren plateaus [5] complicate the optimization of quantum learning models. However, problems also arise in the inference phase, i.e., when the quantum learning model needs to produce a prediction. A bona fide quantum learning model requires some operation on a quantum system. However, to extract a classical label, a measurement is needed. The nature of quantum mechanics dictates that the result of such a measurement is inherently probabilistic. For the model to solve a learning problem, for example, classify images correctly, the output needs to be deterministic. In practice, a quantum learning model needs to be run many times to circumvent the probabilistic outcome of the measurement and extract a prediction, for example, in the form of an expectation value of an appropriately chosen observable. Even when running the model on a fault-tolerant quantum computer, this problem, an instance of the so-called measurement problem, persists.
The indeterminism of quantum mechanics implies that quantum learning models typically come with an overhead that does not exist in classical learning models. This significantly limits their appeal, especially because it not only increases the time needed to produce a prediction but also the cost. Considering the full quantum machine learning pipeline, the ability to reduce the measurement overhead for a given quantum model would lower costs across the stages. It could also be useful in non-repetitive settings, for example, when a quantum state is produced and needs to be classified once, such as detecting objects from scattered states of light.
In this work, we investigate whether this specific limitation can be circumvented, particularly under what conditions a prediction can be extracted from a single run of a quantum learning model (see Fig. 1). We establish a rigorous definition for such single-shot models and use tools from quantum hypothesis testing to understand their ultimate limitations. If the quantum classifier is built using a quantum circuit, we ascertain that single-shotness is only possible if the circuit has a certain depth. We deal with both the noiseless and the noisy cases. We complement our results by identifying that learning models that exhibit single-shotness are typically overexpressive and, hence, hard to train.
To understand the full potential of quantum learning models, efforts have been invested in studying their potential and limitations. Quantum learning models, in particular quantum classifiers, can be analyzed through the lens of multiple hypothesis testing, which has been leveraged in Refs. [6-12]. Ref. [11] points out a trade-off between expressiveness and generalization performance. Reducing the measurement overhead of quantum learning models by training to reduce the variance of observables has been attempted [13]. The continuity properties of quantum circuits have also been studied in Ref. [14].
The paper is structured as follows: We first introduce quantum classifiers (Section II) and quantum multiple hypothesis testing (Section III). We then introduce the concept of single-shot quantum learning models in Section IV. In the subsequent Section V, we use the described concepts and tools to present bounds that establish that shallow models cannot have the single-shot property. Finally, in Section VI, we show that quantum learning models that achieve the general single-shot property are hard to learn. We conclude the paper in Section VII.
II. Quantum classifiers
When using expectation values of observables to determine the labels, this also happens implicitly, because one typically needs to run the quantum experiment many times to faithfully approximate the expectation value.
III. Quantum multiple hypothesis testing
In Bayesian and minimax settings, the optimal measurement can be computed via semidefinite programming (SDP) [16]. The usefulness of hypothesis testing stems from the fact that we have good lower bounds on the multiple hypothesis testing error. In the case of binary hypothesis testing, , we even have a closed-form solution. The Hellstrom-Holevo theorem [17] links the minimum error of a binary hypothesis test to the trace distance of the quantum states.
Proof in Appendix A. The corresponding result in the minimax setting is straightforward, because when considering a hypothesis test with fewer alternatives, the worst-case error only improves, hence
Combining the above results with the Holevo-Helstrom theorem, we obtain a lower bound on the error of the multiple hypothesis testing problem, expressed in terms of the pairwise distances of the quantum states. We note that these bounds are not necessarily tight. But they allow us to later link the limitations of single-shot quantum learning models to an interpretable quantity based on the trace distance of quantum states.
Multiple hypothesis testing simplification for quantum classifiers
If we compare the above with Eq. (2), we see that a quantum classifier is equivalent to a quantum multiple hypothesis testing problem on the average states of each class. We can thus conclude that the error of the quantum classifier is at least the optimal achievable error.
The embedded states of different classes must be well distinguishable to achieve reasonable accuracy—which can also be measured by other loss functions—this fact has been observed in previous works [6, 8, 11].
IV. Single-shot quantum machine learning
The fact that we have to run the quantum classifier multiple times, as explained in Section II, leads to the so-called measurement problem, which implies an overhead compared to classical methods due to the probabilistic nature of quantum mechanics. The goal of this work is to formalize when this problem can be circumvented by forcing the classifier to be so deterministic with respect to the assigned label that a single run suffices, i.e., what we call a "single-shot" classifier.
A. Single-shot classifiers
where the probability is taken from the classifier's intrinsic randomness.
B. Geometric picture
However, even continuously parameterized classifiers can be epistemically single-shot on a subset of the data domain, if we see points that are very likely from that domain, we can still have reasonably good Bayesian single-shot performance. This idea is formalized by the following lemma, which is just a simple application of the union bound.
C. Relation to sample complexity
We can connect the concept of single-shotness for probabilistic classifiers to the notion of sample complexity that we often encounter when dealing with the measurement problem in quantum machine learning models (see also Ref. [12]). This can be seen by considering that the process of using the same "weak" classifier m times and then taking a majority vote can be seen as a single model that should have single-shotness, which is equivalent to low failure probability. Mathematically, we have the following lemma on boosting the success probability of a weak classifier, i.e., one with single-shotness but high failure probability.
We could have used a tighter estimate from the Chernoff-Hoeffding theorem, which concerns the sum of i.i.d. Bernoulli random variables, but these improvements are not necessary for our argument. If we choose to be half the gap between the maximum and second maximum of
, we can guarantee the success result of the majority vote. Formally, we can define the gap as:
V. Ultimate limits of single-shot quantum machine learning
Having defined the single-shotness property, it is natural to ask whether it can be achieved for a given embedding , and conversely, what resources are required for an embedding that allows single-shot classification. This section is devoted to exploring the ultimate limits of single-shot quantum classifiers by reducing the single-shot classification task to a multiple hypothesis testing task. After establishing this reduction, we open the black box of the embedding and establish a lower bound on the gate complexity required to build the single-shotness property.
A. Reduction to hypothesis testing
In Section III, we have already seen that the error probability of a quantum classifier can be lower bounded by reduction to multiple hypothesis testing. We can use a similar argument to bound the optimal value of the single-shot error probability . The subtle difference is that we no longer care about classifying data well according to the true label
, because they do not appear in the definition of single-shotness. Instead, we want the classifier to work according to the label assigned by the classifier itself, i.e.,
in Eq. (1). After this perspective change, the argument can be carried out similarly:
We note that theoretically, we could further improve the lower bound on the epistemic single-shot error probability by including multiple states in each label, which would lead to a reduction to a composite multiple hypothesis testing problem. These are less studied, and fewer analytical tools are available. However, for numerical studies, combining with semidefinite programming is the appropriate method to achieve tighter bounds.
We now use the tools introduced in Section III to link the ultimate limits of single-shot quantum classification to more intuitive quantities. We apply the binary case reduction from Section III and the Holevo-Helstrom theorem from Eq. (4) to the Bayesian error probability lower bound in Section V A. This gives the following lower bound on the error probability, expressed in terms of the trace distance of the average states of the different classes:
From this point, we can conclude that agnostic single shotness imposes a high requirement on the distance between the embedded states of the different classes that the classifier predicts, because we can invert the above inequality to get:
From this bound, any embedding that varies continuously with
cannot achieve agnostic single-shotness over the whole range of
values. This again emphasizes that the definition of agnostic single-shotness needs to restrict the domain of
,
, as we used in the agnostic-to-Bayesian提升 in Section IV A.
After establishing the connection between single-shot error probability and the trace distance of embedded states in both the agnostic and Bayesian settings, we open the black box of the embedding and consider that it is implemented by a variational quantum circuit. In this case, imposing a certain single-shot error probability requires the circuit to have a certain depth and/or width, as shown below.
B. Noiseless quantum circuits
Section V B shows us that parameterized circuits embed data points that are close to each other into quantum states that are also close. Intuitively, we can infer that for two data points that are close but belong to different labels, the quantum classifier will face difficulties. We formalize this in the following theorem:
Theorem 3. Lower bound on single-shotness error probability (Bayesian). Let be a quantum classifier tasked with classifying classical data for which we know a probability distribution over
groups. The single-shotness error probability is bounded by the worst average distance between the data classes,
It is important to emphasize here that even if the conditions of the above theorem allow for some single-shotness error probability, it does not necessarily mean that the circuit actually achieves this single-shotness error probability.
C. Noisy circuits
Realistic circuits are far from perfect. In the NISQ (Noisy Intermediate-Scale Quantum) era, they are subject to pervasive noise that limits their performance. To study the effect of noise on the single-shotness of quantum classifiers, we study an error model where a local depolarizing noise with survival probability is applied after each step of the classifier computation on each qubit, where
is the maximally mixed state applied after each step. To track this mathematically, we will use the notation
to denote the quantum state produced by the noisy embedding circuit after
steps of computation. In our embedding circuit model, we have so far computed L layers, where each layer consists of a fixed unitary transformation and variational quantum gates that we assume require one step of computation, taking
steps in total in our parameterized circuit.
Quantum circuits affected by local depolarizing noise will eventually produce a quantum state very close to the maximally mixed state . This result is established by computing the decay of the relative entropy between the circuit states after
steps of computation and leads to the following estimate [19]:
However, this estimate does not take into account the continuity of the circuit, and thus for close parameters, it may actually be too optimistic. Because noisy channels cannot increase the trace distance between states, we can immediately see that if noisy circuits are considered, the results from Section V B still hold, although they would also be too optimistic because they do not consider noise in the system. A simple way to combine these two estimates is to simply take the minimum of both, which gives:
Unlike the noise-only estimate, the above bound does not decay to zero with , meaning it can only improve the estimate from Eq. (66) for a limited parameter range. However, it does always improve the simple continuity estimate given by Eq. (57). Fig. 3 compares the three different bounds. Now, we can reconnect to the setting of single-shot classification in a similar manner as in Section V B. The derivation can be carried out similarly; just the constant
is replaced by:
Combining all three bounds we derived for the noisy case provides us with an upper bound on the achievable distance between the average states of each multi-class embedding for the quantum classifier's predictions, and thus allows us to analyze the possibility of a noisy quantum classifier performing single-shot classification.
VI. General accurate single-shot models are hard to learn
After establishing the theoretical foundation of single-shot quantum machine learning, we learned that single-shotness is a property of the training model, which is not necessarily given. It depends on the specific way a quantum learning model embeds classical data into quantum states, and only manifests when quantum states with the same predicted label are clustered together—in other words, when they are not too close to quantum states with different predicted labels.
From this intuition, we can see that theoretically it should be possible to build an embedding that allows single-shot classification for all possible labelings of a dataset. In machine learning language, this means the model is expressive enough to crush the data. This is achieved by embedding all sufficiently separated inputs into mutually orthogonal directions in Hilbert space. It is expected, for example, that we could build such an embedding by alternating many encoding gates with deep random unitary gates. However, it is clear that such an embedding would lead to poor generalization performance for any POVM inferred from the training data. Previously unseen data points would be embedded into the orthogonal complement of the embedding training data in Hilbert space. There, we cannot infer a good classification, and the best strategy would be random guessing.
This "thought experiment" highlights an inherent problem of (quantum) learning models, that there is a subtle trade-off between expressivity, trainability, and generalization performance, which is a subject of ongoing research in quantum machine learning [11, 20-22]. In this section, we show that this trade-off also affects the single-shot property of learning models.
This does not limit the generality of our argument, because very close data points will represent the same input. An example is images: images that can be considered reasonably different must also have reasonably different pixel values. Intuitively, we expect a generally accurate single-shot model to map all points that are at least separated by to (approximately) orthogonal directions in Hilbert space.
To study this specific class of models, we rely on results from Ref. [6]. There, the authors studied the essentially similar problem of learning a POVM effect from examples of the form:
The number of samples required to learn a POVM effect in a D-dimensional Hilbert space with accuracy . The argument we present here does not depend on the notions of approximation and precision but on the dimension dependence. However, an
-approximation of the POVM effect in the operator norm immediately guarantees, via the matrix Hölder inequality, that the precision of the realized binary classifier is within an additive error of
of the target.
What remains for us to analyze is the dimension of the Hilbert space we consider. For this, we return to our assumption that inputs that are at a distance in some norm should be mapped to different corners of the Hilbert space. This means we have D roughly the number of balls needed to cover the input space X in the chosen norm. If the space has some characteristic length L and dimension d, we expect to need:
dimensions, which is inefficient in terms of input size, as expected from our thought experiment. This is also consistent with the conclusion of Ref. [11], which shows that overly expressive models have poor generalization performance.
VII. Conclusion
If quantum machine learning is to become useful, many obstacles must be overcome. One of them is the measurement problem, which directly stems from the probabilistic nature of quantum mechanics. The predictions of quantum learning models are inherently random, and therefore, the results of multiple executions of the quantum learning model must be aggregated to obtain a practical prediction.
In this work, we proposed the concept of single-shot quantum machine learning. We established a rigorous definition of a quantum classifier for which the measurement problem can be circumvented by providing labels almost deterministically. Using the close connection between quantum classifiers and quantum multiple hypothesis testing, we established that the single-shotness of a quantum classifier is fundamentally constrained by the distinguishability of the embedded quantum states. Considering that embeddings are typically implemented by quantum circuits, we also showed that achieving single-shot classification requires a certain depth of these circuits.
Finally, we also established that a learning model cannot achieve single-shotness in a general manner, i.e., for all possible input labelings. In that case, it would become overly expressive, and such a model would have no meaningful way to generalize.
This work can only be seen as the beginning of our journey to overcome the measurement problem. It invites future research in many directions. First, we only considered classification tasks in this work. A generalization to regression is certainly also possible and could express similar fundamental bounds using quantum metrology instead of multiple hypothesis testing [23].
Beyond extending the definition and fundamental treatment of single-shot quantum machine learning, a pressing question is how we can build single-shot models? How can we enforce this property during training to get both accuracy and single-shotness? Possible inspiration in this direction is to consider training schemes that are directly linked to the distinguishability of the embedded quantum states [8]. Can we also find explicit constructions of learning models or embedding circuits to enforce a certain degree of single-shotness?