rsa 103 – cracking – Simulacra and Simulation

As said before, RSA algorithm rests on simple fact: multiplying two large numbers is easy, but factoring the result into original primes is incredibly difficult. Lets see how difficult it is. I prepared tool for that: https://github.com/o-lenczyk/rsa/blob/master/crack_rsa.py and did the measurements using my pc (11th Gen Core i7-11370H)

In RSA, the public key consists of a large number: n (the modulus) and an exponent – e. The private key – d – can only be calculated by someone who knows prime factors of n – p and q. By finding that values, we can calculate d and decrypt the message.

Let me present you couple of algorithms, from most simple implementations, to highly optimized, and compare the results.

Trial division (simple implementation)

Bit Size	Time
8bit	0.0000 seconds
16bit	0.0022 seconds
32bit	2.57 minutes
64bit	~21,000 years
128bit	~3.8×1023 years (or about 28 trillion times the age of the universe)

Trial division is exponential time algorithm. It means it use most straightforward method, but the runtime grows explosively with complexity, making it impractical for large numbers. The strategy is essentially brute-force search.

Trial division is trying to divide number n by every odd integer. It is not quite practical, because there is huge gap between 32bit primes used, and 64bit

gmpy2 – trial division

Bit Size	Time
8bit	0.0000 seconds
16bit	0.0016 seconds
32bit	5.50 minutes
64bit	~45,000 years
128bit	~8.3×1023 years (or about 60 trillion times the age of the universe)

gmpy2 is a python library that uses low-level c library. But for this particular usecase, repeatedly calling gmpy2.next_prime() did not help a lot. Overhead of calls was greater than using non-prime numbers for trial division. It turns out, that simply checking for odd numbers is better here.

Pollard’s Rho (custom implementation)

Bit Size	Time
8bit	0.0000 seconds
16bit	0.0005 seconds
32bit	0.0457 seconds
64bit	103 minutes
128bit	~400,000 years

Pollard’s rho is another exponential time algorithm, but with significant improvement over trial division. Instead of a linear search, it takes a “random walk” through the numbers, and uses cycle-finding trick to discover a number. Because of randomness, it is stochastic algorithm, and there is element of luck involved in finding the solution.

SymPy factorint (trial division only)

Bit Size	Time
8bit	0.0001 seconds
16bit	0.0039 seconds
32bit	2.47 minutes
34bit	7.06 minutes
36bit	34.05 minutes
38bit	~2,5h
64bit	~17,000years
128bit	~320 quadrillion years (about 23 trillion times the current age of the universe)

Sympy is a python library for computer algebra. It has tools related to number theory. I have used factorint function for cracking RSA. By default, it automatically selecting best algorithm, but i wanted to check how each implementation behaves. It is slightly faster than most basic trial division implementation.

SymPy factorint (Pollard’s Rho only)

Bit Size	Time
8bit	0.0001 seconds
16bit	0.0031 seconds
32bit	0.0730 seconds
34bit	0.5716 seconds
36bit	1.1293 seconds
38bit	1.4727 seconds
40bit	1.2474 seconds
64bit	105.75 minutes
128bit	~1.5 to 2.5 million years, on average.

Sympy Pollard’s rho results are almost identical as custom implementation

SymPy factorint (Pollard’s p-1 only)

Bit Size	Time
8bit	0.0001 seconds
16bit	0.0011 seconds
32bit	0.0045 seconds
34bit	0.2573 seconds
36bit	0.4208 seconds
38bit	0.2225 seconds
40bit	0.4577 seconds
64bit	will not work
128bit	will not work

Pollard’s p-1 is a special purpose algorithm, that is not guarantee to work. If the number has a special property, then is broken into primes instantly. Otherwise, the algorithm will not work at all. I will not go into the much details, but it is using fermat’s little theorem, and “smoothness” concept. Key take is: p-1 and q-1 should not be “smooth” if we want to implement secure RSA

SymPy factorint (ECM only)

Bit Size	Time
8bit	0.0001 seconds
16bit	did not work
32bit	did not work
34bit	did not work
36bit	did not work
38bit	did not work
40bit	did not work
64bit	7.1949 seconds
128bit	534.05 minutes
256bit	~100 years

ECM (Elliptic Curve Method) is an enhancement of Pollard’s p-1 method, acting as a generalization that adds “another dimension” to the attack.

While the p-1 method is stuck with the single, fixed number p-1, ECM’s breakthrough is its ability to create “alternative universes.” Each new elliptic curve it tries generates a completely different mathematical group, which has its own unique size (or “order”). The algorithm then bets that one of these new orders will be “smooth”.

It is still stochastic algorithm, and sometimes fail, due to “bad luck”.

SymPy factorint (auto algorithm selection)

Bit Size	Time
8bit	0.0001 seconds
16bit	0.0095 seconds
32bit	0.0108 seconds
34bit	0.1922 seconds
36bit	0.6174 seconds
38bit	0.1154 seconds
40bit	0.2449 seconds
64bit	6.8013 seconds
128bit	???

Finally, I let the sympy choose the best algorithm itself. By mixing different methods, the results are clearly the best so far.

SageMath (factorization using Sage)

Bit Size	Time
8bit	0.6318 seconds
16bit	0.0001 seconds
32bit	0.0003 seconds
34bit	0.0007 seconds
38bit	0.0016 seconds
40bit	0.0010 seconds
64bit	0.0206 seconds
128bit	2.60 minutes
256bit	~250 years

SageMath is computer algebra system, with dozen of specialized libraries, easy to use with Python. It includes number theory tools like factorization, that uses highly optimized PARI/GP C library. The results are two orders of magnitude better than plain python.

cypari2 (Pari/GP)

Bit Size	Time
8bit	0.0281 seconds
16bit	0.0063 seconds
32bit	0.0068 seconds
34bit	0.0077 seconds
36bit	0.0072 seconds
38bit	0.0076 seconds
40bit	0.0078 seconds
64bit	0.0270 seconds
128bit	2.59 minutes
256bit	~250 years

cypari2 serves the similar purpose, but instead giving an access to broad collection of tools, it provides python interface for PARI/GP alone. Results are almost identical.

python-flint

Bit Size	Time
8bit	0.0002 seconds
16bit	0.0001 seconds
32bit	0.0008 seconds
34bit	0.0168 seconds
36bit	0.0149 seconds
38bit	0.0085 seconds
40bit	0.0104 seconds
64bit	0.0479 seconds
128bit	2.80 minutes
256bit	~250 years

Python library that gives an access to FLINT – Fast Library for Number Theory. It is a modern, highly optimized C library for advanced number theory calculations. It is a direct competitor to PARI/GP. Both libraries uses algorithm cascade similar to this:

Trial division
Pollard’s rho
Pollard’s p-1
ECM
Quadratic Sieve (QS)
Number Field Sieve (NFS)

Both QS and NFS are sub-exponential sieve methods. Far more complex algorithms that work by collecting thousands of mathematical relationships and then using linear algebra on massive matrices to find the factors. They are used only by most sophisticated tools.

QS – core concept is that if:

x2≡y2(modN))

then most likely:

gcd(x - y, N) will be a factor of N

Next step is finding x and y, where again, “smooth” concept is used. Finally, when enough relations is collected, it looks for numbers multiplied together forms a perfect square

NFS – the most powerful factorization algorithm known today. Unbeatable for numbers over 110 digits. Has enormous complexity. It is an abstract over QS. NFS searches for polynomial equation, which has special relationship with number N, we are searching. It is scoring each equation, and spends a lot of time to find best equation, before solving it.

YAFU (standalone tool)

Bit Size	Time
8bit	0.0330 seconds
16bit	0.0218 seconds
32bit	0.0265 seconds
34bit	0.0327 seconds
36bit	0.0454 seconds
38bit	0.0740 seconds
40bit	0.1599 seconds
64bit	0.2154 seconds
128bit	56.2382 seconds
256bit	~80-100 years

YAFU – yet another factoring utility. Free, high-performance standalone command line tool. One of most the fastest and most powerful tools for factorization. Created for single purpose, and was 3x faster than SageMath multi tool. It uses own algorithmic cascade and Self-Initializing Quadratic Sieve (SIQS) – more advanced version of QS. SIQS tests few routes first, before digging deeper. Self-Initializing means it is creating next polynomial quickly, instead doing it from scratch. Thanks to that, most time consuming part is done in productive range.

CADO-NFS (standalone tool)

Bit Size	Time
8bit	too small
16bit	too small
32bit	too small
64bit	too small
128bit	4.50 minutes
130bit	5.56 minutes
132bit	8min
256bit	~50 years

CADO-NFS is research grade tool for high performance, distributed computing and world record factorizations. It was slower than YAFU, because it is designed for much higher numbers, and more computing power. The overhead was too much for 128x128bit prime.

FactorDB (online database)

Bit Size	Time
8bit	0.2378 seconds
16bit	0.1419 seconds
32bit	too long
64bit	too long
128bit	too long

Online database that stores known factorizations. Instead of computing, we are asking for the result. Works only for really small numbers

Wolfram Alpha (online API)

Bit Size	Time
8bit	2.0049 seconds
16bit	1.7616 seconds
32bit	1.6952 seconds
34bit	1.4012 seconds
36bit	1.8980 seconds
38bit	1.8980 seconds
40bit	1.7296 seconds
64bit	1.7817 seconds
128bit	too long

Online engine, powered by Mathematica. Can compute much higher numbers than factordb, but still relatively small for RSA key. For 64x64bit number, it was faster than fastest python implementation.

summary

Right algorithm choice can lead to astronomical differences
C libraries are hundreds time faster than python implementations
huge complexity jump with doubling key size. That is why modern RSA keys are at least 2048 bit length.
some algorithms base on pure luck. they can solve the problem fast, but are unreliable
most powerful tools use cascade of tools to solve problem efficiently
together with small key size “smoothness” is critical key vulnerability

rsa 103 – cracking