3

I am currently trying to compute logical error rates for the surface code using Stim's detector error models and PyMatching for different distances and noise strengths.

tl;dr : What is the best strategy for parallelising this process over different cores?

In doing so, I am trying to optimise the number of cores at my disposal by parallelising the different computations across multiple cores. My current workflow leverages joblib for this task

from joblib import Parallel, delayed, parallel_backend

distances = [...] # A set of distances noise_strengths = [...] # A set of noise strengths num_shots = ... # Some number of shots to sample num_cycles = ... # Number of cycles over which to perform syndrome extraction

threads = ... # Maximum number of threads allowed jobs = ... # Maximum number of jobs allowed

with parallel_backend("loky", inner_max_num_threads=threads): joblib_results = Parallel(n_jobs=jobs)(delayed(compute_logical_error_rate)(distance,noise,n_shots) for distance in distances for noise in noise_strengths)

where the logical error rate is computed using stim and pymatching in the following way

def compute_logical_error_rate(distance,noise,num_shots,num_cycles):
   sc_circuit = surface_code(distance,noise,num_cycles) # Define circuit
   sc_model = sc_circuit.detector_error_model(decompose_errors=True) # Detector error model

   sampler = sc_circuit.compile_detector_sampler() # Define sampler
   syndrome, measured_obs = sampler.sample(shots=num_shots, separate_observables=True) # Extract samples

   matching = Matching.from_detector_error_model(sc_model) # Matching graph
   expected_obs = matching.decode_batch(syndrome) # Perform decoding

   num_errors = np.sum(np.any(expected_obs != measured_obs, axis=1)) # Compute errors
   logical_error_rate = num_errors/num_shots # Compute logical error rate

   return logical_error_rate

This workflow allows me to assign a different simulation to each core, i.e. each core will perform the computation of the logical error rate for a given distance and noise strength. However, upon increasing the number of samples that I take, defined as num_shots, I start running into memory problems. I have identified the reason to be the syndrome variable, which is a numpy array whose size scales linearly with number of samples considered syndrome.shape = (num_shots,...). The way I am parallelising naturally results in storing this variable multiple times (once for each job/core), thus leading to a memory overload.

I am interested in understanding how I can best leverage multiple cores to make this whole set of simulations more efficient. Would it make more sense to tackle one job at a time (fixed distance and noise strength) and parallelise the sampling across different cores? (If so, how can that be done?) - There seems to be a comment about this approach on the original Stim paper, but I do not see how to replicate this behaviour with my code.

FDGod
  • 2,901
  • 2
  • 6
  • 31

1 Answers1

4

(Sounds like you want sinter. In particular you should check out sinter.collect. It's for exactly this task, but supports additional features such as the ability to save progress to a file so it can resume work later if your machine lost power.)

First, you should divide up the work into batches. Stim is optimized to work best when you keep asking for batches of shots of the same size, where that size is a multiple of 256.

Second, you should ask for bit packed data. This makes the data 8x denser, so there's a lot less copying.

Third, you should return how many samples were taken and how many errors were seen, instead of their ratio. This allows you to run the same circuit on multiple threads and easily combine the result afterwards by adding the returned numbers of shots and errors.

def count_logical_errors(distance,noise,num_shots,num_cycles):
   sc_circuit = surface_code(distance,noise,num_cycles) # Define circuit
   sc_model = sc_circuit.detector_error_model(decompose_errors=True) # Detector error model       
   sampler = sc_circuit.compile_detector_sampler() # Define sampler
   matching = Matching.from_detector_error_model(sc_model) # Matching graph

shots_left = num_shots num_errors = 0 while shots_left > 0: batch_size = min(1024, shots_left) shots_left -= batch_size syndrome, measured_obs = sampler.sample( shots=batch_size, separate_observables=True, bit_packed=True, ) # Extract samples

   expected_obs = matching.decode_batch(syndrome) # Perform decoding

   num_errors += np.sum(np.any(expected_obs != measured_obs, axis=1)) # Compute errors

return num_errors, num_shots

Incidentally, your method will be much more flexible if you pass in the circuit instead of the parameters used to make the circuit.

Craig Gidney
  • 44,299
  • 1
  • 41
  • 116