Number of logical cycles to implement $R_Z(\theta)$ gates in lattice surgery: why isn't it the number of layers $\times$ the $T$-count in $R_Z$?

Question

Let's assume I want to implement a rotation $R_Z(\theta)$ for a lattice-surgery based computation, for $\theta$ being an arbitrary angle (i.e. the gate is not necessarily Clifford nor a $T$-gate).

It can typically be done by

Decomposing $R_Z(\theta)$ on Clifford+T gateset (possible up to some approximation $\epsilon$). I call $N_T$ the number of $T$-gates in the decomposition.
The Clifford are "commuted" toward the end of the algorithm (we do the measurement based lattice surgery framework), which changes each of the $T$-gate in $\exp(-i P \pi/8)$ for some single-qubit Pauli $P$ (not anymore necessarily $Z$).

Hence, the duration of one $R_Z(\theta)$ should from my understanding be equal to $N_T$ logical timesteps. If we have many $R_Z(\theta)$ gates, the total duration of these gates should be $N_T \times D_R$ where $D_R$ is the depth associated to these gates (the number of layers containing at least one gate $R_Z(\theta)$).

However, while reading this paper (which explains the assumptions behind the Azure resource estimator), assuming no $T$-gates nor Toffoli nor measurements are in the end-user algorithm, they would find a total number of logical timestep, based on Eq (D3) on page 30:

$$C_{\min}=M_R+N_T \times D_R$$

What I call $N_T$ correspond to their $A \log_2(M_R/\epsilon)+B$ in their notation.

Basically, they add on my counting $M_R$ which is the total number of $R_Z(\theta)$ gates. Why should it be added? Isn't the term $N_T \times D_R$ sufficient?

[Edit]: Below is a concrete example showing why for me $C_{\min}=N_T \times D_R$ should be the formula (without $M_R$):

Left: two logical qubits are shown, on each of them I implement a $R_Z$ (of an angle that is not Clifford, and neither the one of a $T$-gate (i.e. not $\pi/4$).

Middle: this rotation is decomposed on Clifford+T gateset (the gates with label "C" represent single qubit Clifford).

Right: The Clifford are commuted through the $T$-gates. It results a circuit with only "generalized" $T$-gates (i.e. the rotation angle are $\pi/4$ but not necessarily around the $Z$ axis for the $T_i$).

In this example the circuit depth would be $N_T \times D_R$: I don't get why we should add $M_R$ on top.

The ancilla doing state injection are not represented in this image, but they would not change the number of logical timesteps (as doing one of the $T$-gate is essentially measuring a multi-Pauli observable which is done in a single timestep).

Mariia Mykhailova · Answer 1 · 2023-12-05T19:01:00.820

This number corresponds to the sentence "...the number of logical time steps required is simply the number of measurements needed to entangle the algorithm and synthesis ancillas." - that step is separate from the synthesis itself. (And yes, synthesis = Clifford + T decomposition.)

To answer the updated question: in your concrete example, you're doing synthesis on the algorithmic data qubits directly, rather than on additional synthesis qubits. This is not the assumption under which the paper is written (and Azure Quantum Resource Estimator acts).

The paper assumes 2D nearest-neighbor connectivity between qubits, and a certain level of being able to apply operations in parallel. For example, T factories run in parallel (on completely separate sets of qubits) to prepare the T states that are then consumed during synthesis. But for most operations it's unclear whether they can be applied in parallel, and analyzing this would require the tool to do a complete layout of the logical qubits and all kinds of auxiliary qubits (which is computationally expensive, so it doesn't do that).

Instead, Azure Quantum Resource Estimator assumes that T gates are produced in parallel, and synthesis can be done in parallel using additional synthesis qubits (these can be allocated next to the T factories so they can consume T states in parallel), but after that getting the rotations from the synthesis qubits to the algorithm qubits is done sequentially, since the paths required for this are likely to overlap and interfere with each other.

Under these assumptions, doing synthesis directly on algorithm qubits as you show would require consuming each T gate sequentially, so the depth would actually be 6.

P.S. You have to remember that the resource estimates produced here are, well, estimates, not exact numbers supported by compiling the algorithm to a point where it's ready to run on a specific runtime on a specific device. These estimates are done to give fast order-of-magnitude evaluation of the required resources, so there are assumptions made to simplify the computation and make it faster. For example, the last sentence in that section of the paper says explicitly that several types of qubits involved in the synthesis and transport of the T gates is not counted towards the qubit number estimates. For another example, the tool doesn't do actual Clifford+T synthesis for each rotation gate to ascertain the number of T gates used in it, but uses a fixed number for each rotation (unless it's an equivalent of a single T or a Clifford gate).

Number of logical cycles to implement $R_Z(\theta)$ gates in lattice surgery: why isn't it the number of layers $\times$ the $T$-count in $R_Z$?

1 Answers1