Optimizing SymPy Implementation of prime factorization in form of QUBO

Question

I'm trying to reproduce a paper on Prime Factorization. This paper converts the problem into a QUBO form, which then we can map it to an Ising Minimizer. I've basically done everything, and I've obtained the final QUBO expression of page 3. My only problem is that my implementation gets slow for larger numbers really fast. The bottleneck of my code is in the reduced_f function. I also suspect that all of these operations are possible without symbolic calculations and only with matrix multiplication, but I don't know how to implement it like that.

from sympy import IndexedBase, expand, Indexed, Mul
import numpy as np
from tqdm import tqdm
x = IndexedBase('x')
def high_degree_f(N, l1 = 2, l2 = 3):
    # p and q are binary numbers of length l1 and l2
    p = 1
    q = 1
for i in range(1, l1):
    p += x[i] * 2**i

for idx, l in enumerate(range(l1, l1+l2 - 1)):
    q += x[l] * 2**(idx+1)

f = (N - p * q ) **2
f = expand(f)
r = len(f.free_symbols) + 1

# replace x[i]**k by x[i] as x[i] is binary
for i in range(1, r):
    for k in range(2, 4):
        f = f.subs(x[i]**k, x[i])
return (f, p, q)


def out_degrees_more_than_2(f):
    # Iterate over the terms in the expanded polynomial
    for term in f.as_ordered_terms():
        # Check if the term is a product
        if isinstance(term, Mul):
            # Count the number of Indexed instances in the term
            variable_count = sum(isinstance(factor, Indexed) for factor in term.args)
            # Check if there are more than two variables in the product
            if variable_count > 2:
                # Print the term
                yield (variable_count, term)
def max_degree(f):
    max_degree = 0
    for (variable_count, term) in out_degrees_more_than_2(f):
        max_degree = max(max_degree, variable_count)
    return max_degree
def reduced_f(N, l1 = 2, l2 = 3):
(f, p, q) = high_degree_f(N, l1 = l1, l2 = l2)
number_of_variables = len(f.free_symbols) - 1

while max_degree(f) &gt; 2:
    for (variable_count, term) in tqdm(out_degrees_more_than_2(f)):
        if variable_count == 3:

            # extract the numerical coefficient of the term:
            coefficient, v = term.as_coeff_Mul()
            variables = v.args

            new_term = coefficient* (variables[2] * x[number_of_variables+1] + 2 * ( variables[0] * variables[1] - 2 * variables[0] * x[number_of_variables+1] - 2 * variables[1] * x[number_of_variables+1] + 3 * x[number_of_variables+1]))
            number_of_variables += 1

            # substitute the term with the new term
            f = f.subs(term, new_term)
        elif variable_count == 4:
            pass
        else:
            raise ValueError(&quot;Unexpected number of variables in term&quot;)
    f = expand(f)

return (f, p, q)


N = 15
l1 = 2
l2 = 3
(f, p, q) = reduced_f(N, l1 = l1, l2 = l2)
print("N:\t\t", N)
print("p:\t\t", p)
print("q:\t\t", q)
print("Reduced f:\t", f)

N:               15
p:               2*x[1] + 1
q:               2*x[2] + 4*x[3] + 1
Reduced f:       200*x[1]*x[2] - 48*x[1]*x[3] - 512*x[1]*x[4] - 52*x[1] + 16*x[2]*x[3] - 512*x[2]*x[4] - 52*x[2] + 128*x[3]*x[4] - 96*x[3] + 768*x[4] + 196

expression from paper:

$$\begin{array}{rcl}f^{\prime} (x) & = & 200{x}_{1}{x}_{2}-48{x}_{1}{x}_{3}-512{x}_{1}{x}_{4}+16{x}_{2}{x}_{3}-512{x}_{2}{x}_{4}+128{x}_{3}{x}_{4}\\ & & -52{x}_{1}-52{x}_{2}-96{x}_{3}+768{x}_{4}+\mathrm{196,}\end{array}$$

But for N = 10403, l1 = 7, l2 = 7, it takes minutes to get the final expression. Can anyone suggest any improvement to my implementation?

score 3 · Answer 1 · answered Nov 09 '24 at 04:43

First of all, I read the paper and it seems that the values of l1 and l2 are wrong for N = 10403.

According to the paper; $$ {l}_{1}=\lfloor {\mathrm{log}}_{2}(p)\rfloor $$ $${l}_{2}=\lfloor {log}_{2}(q)\rfloor$$

The prime factors for 10403 are 103 and 101, which means the values of p = 103 which implies l1 = 6 and q=101 which implies l2 = 6

So, running the code:

N = 10403
l1 = 6
l2 = 6
import time
start_time = time.time()
(f, p, q) = reduced_f(N, l1 = l1, l2 = l2)
print("Time taken:",time.time() - start_time)
print("N:\t\t", N)
print("p:\t\t", p)
print("q:\t\t", q)
print("Reduced f:\t", f)

The output is:

200it [00:27,  7.34it/s]
268it [01:56,  2.29it/s]
126it [01:36,  1.31it/s]
Time taken: 245.14731359481812
N:       10403
p:       2*x[1] + 4*x[2] + 8*x[3] + 16*x[4] + 32*x[5] + 1
q:       32*x[10] + 2*x[6] + 4*x[7] + 8*x[8] + 16*x[9] + 1
Reduced f:   Reduced f:  -36864*x[100]*x[4] - 36864*x[100]*x[6] + 9216*x[100]*x[8] + 55296*x[100] - 73728*x[101]*....

And for the setting with l1 = 7 and l2 = 7, the output is:

405it [01:33,  4.35it/s]
550it [07:40,  1.19it/s]
340it [09:01,  1.59s/it]
Time taken: 1105.8728091716766
N:       10403
p:       2*x[1] + 4*x[2] + 8*x[3] + 16*x[4] + 32*x[5] + 64*x[6] + 1
q:       16*x[10] + 32*x[11] + 64*x[12] + 2*x[7] + 4*x[8] + 8*x[9] + 1
Reduced f:   540672*x[100]*x[12] - 2162688*x[100]*x[1] - 2162688*x[100]*x[5] + 3244032*x[100] + 1081344*x[101]*x[12] - 4325376*x[101]*x[1].....

Although it doesn't give any error, a very lengthy expression is evaluated which is wrong, and as a result, consumes a lot more time.

To answer your question about optimizing the runtime, with a gigantic expression, sympy eventually leads to longer runtimes. You can try using pytensor or symengine. I'd go for pytensor as it uses JIT compilation.

score 1 · Answer 2 · answered Jun 11 '24 at 10:06

To make it fast, I would guess a good option is to ditch sympy and simply use a dictionary and strings to create the QUBO, probably something like dic[(x_1,x_2,x_3)] = alpha_(x_1,x_2,x_3) which takes the (sorted) tuple of variables to corresponding coeffecient in the QUBO and then worry about making all tuples of length 2.

From a pragmatic point of view, I would define all the 'product variables' from $p\times q$ first and sub them into $(N-pq)^2$. This way, $N-p\times q$ is linear and and no further reductions will be needed. How these variables are chosen may change the final QUBO though which might not be what you are after.

Optimizing SymPy Implementation of prime factorization in form of QUBO

2 Answers2

Linked