As said before, RSA algorithm rests on simple fact: multiplying two large numbers is easy, but factoring the result into original primes is incredibly difficult. Lets see how difficult it is. I prepared tool for that: https://github.com/o-lenczyk/rsa/blob/master/crack_rsa.py and did the measurements using my pc (11th Gen Core i7-11370H)
In RSA, the public key consists of a large number: n (the modulus) and an exponent – e. The private key – d – can only be calculated by someone who knows prime factors of n – p and q. By finding that values, we can calculate d and decrypt the message.
Let me present you couple of algorithms, from most simple implementations, to highly optimized, and compare the results.
Trial division (simple implementation)
| Bit Size | Time |
|---|---|
| 8bit | 0.0000 seconds |
| 16bit | 0.0022 seconds |
| 32bit | 2.57 minutes |
| 64bit | ~21,000 years |
| 128bit | ~3.8×1023 years (or about 28 trillion times the age of the universe) |
Trial division is exponential time algorithm. It means it use most straightforward method, but the runtime grows explosively with complexity, making it impractical for large numbers. The strategy is essentially brute-force search.
Trial division is trying to divide number n by every odd integer. It is not quite practical, because there is huge gap between 32bit primes used, and 64bit
gmpy2 – trial division
| Bit Size | Time |
|---|---|
| 8bit | 0.0000 seconds |
| 16bit | 0.0016 seconds |
| 32bit | 5.50 minutes |
| 64bit | ~45,000 years |
| 128bit | ~8.3×1023 years (or about 60 trillion times the age of the universe) |
gmpy2 is a python library that uses low-level c library. But for this particular usecase, repeatedly calling gmpy2.next_prime() did not help a lot. Overhead of calls was greater than using non-prime numbers for trial division. It turns out, that simply checking for odd numbers is better here.
Pollard’s Rho (custom implementation)
| Bit Size | Time |
|---|---|
| 8bit | 0.0000 seconds |
| 16bit | 0.0005 seconds |
| 32bit | 0.0457 seconds |
| 64bit | 103 minutes |
| 128bit | ~400,000 years |
Pollard’s rho is another exponential time algorithm, but with significant improvement over trial division. Instead of a linear search, it takes a “random walk” through the numbers, and uses cycle-finding trick to discover a number. Because of randomness, it is stochastic algorithm, and there is element of luck involved in finding the solution.
SymPy factorint (trial division only)
| Bit Size | Time |
|---|---|
| 8bit | 0.0001 seconds |
| 16bit | 0.0039 seconds |
| 32bit | 2.47 minutes |
| 34bit | 7.06 minutes |
| 36bit | 34.05 minutes |
| 38bit | ~2,5h |
| 64bit | ~17,000years |
| 128bit | ~320 quadrillion years (about 23 trillion times the current age of the universe) |
Sympy is a python library for computer algebra. It has tools related to number theory. I have used factorint function for cracking RSA. By default, it automatically selecting best algorithm, but i wanted to check how each implementation behaves. It is slightly faster than most basic trial division implementation.
SymPy factorint (Pollard’s Rho only)
| Bit Size | Time |
|---|---|
| 8bit | 0.0001 seconds |
| 16bit | 0.0031 seconds |
| 32bit | 0.0730 seconds |
| 34bit | 0.5716 seconds |
| 36bit | 1.1293 seconds |
| 38bit | 1.4727 seconds |
| 40bit | 1.2474 seconds |
| 64bit | 105.75 minutes |
| 128bit | ~1.5 to 2.5 million years, on average. |
Sympy Pollard’s rho results are almost identical as custom implementation
SymPy factorint (Pollard’s p-1 only)
| Bit Size | Time |
|---|---|
| 8bit | 0.0001 seconds |
| 16bit | 0.0011 seconds |
| 32bit | 0.0045 seconds |
| 34bit | 0.2573 seconds |
| 36bit | 0.4208 seconds |
| 38bit | 0.2225 seconds |
| 40bit | 0.4577 seconds |
| 64bit | will not work |
| 128bit | will not work |
Pollard’s p-1 is a special purpose algorithm, that is not guarantee to work. If the number has a special property, then is broken into primes instantly. Otherwise, the algorithm will not work at all. I will not go into the much details, but it is using fermat’s little theorem, and “smoothness” concept. Key take is: p-1 and q-1 should not be “smooth” if we want to implement secure RSA
SymPy factorint (ECM only)
| Bit Size | Time |
|---|---|
| 8bit | 0.0001 seconds |
| 16bit | did not work |
| 32bit | did not work |
| 34bit | did not work |
| 36bit | did not work |
| 38bit | did not work |
| 40bit | did not work |
| 64bit | 7.1949 seconds |
| 128bit | 534.05 minutes |
| 256bit | ~100 years |
ECM (Elliptic Curve Method) is an enhancement of Pollard’s p-1 method, acting as a generalization that adds “another dimension” to the attack.
While the p-1 method is stuck with the single, fixed number p-1, ECM’s breakthrough is its ability to create “alternative universes.” Each new elliptic curve it tries generates a completely different mathematical group, which has its own unique size (or “order”). The algorithm then bets that one of these new orders will be “smooth”.
It is still stochastic algorithm, and sometimes fail, due to “bad luck”.
SymPy factorint (auto algorithm selection)
| Bit Size | Time |
|---|---|
| 8bit | 0.0001 seconds |
| 16bit | 0.0095 seconds |
| 32bit | 0.0108 seconds |
| 34bit | 0.1922 seconds |
| 36bit | 0.6174 seconds |
| 38bit | 0.1154 seconds |
| 40bit | 0.2449 seconds |
| 64bit | 6.8013 seconds |
| 128bit | ??? |
Finally, I let the sympy choose the best algorithm itself. By mixing different methods, the results are clearly the best so far.
SageMath (factorization using Sage)
| Bit Size | Time |
|---|---|
| 8bit | 0.6318 seconds |
| 16bit | 0.0001 seconds |
| 32bit | 0.0003 seconds |
| 34bit | 0.0007 seconds |
| 38bit | 0.0016 seconds |
| 40bit | 0.0010 seconds |
| 64bit | 0.0206 seconds |
| 128bit | 2.60 minutes |
| 256bit | ~250 years |
SageMath is computer algebra system, with dozen of specialized libraries, easy to use with Python. It includes number theory tools like factorization, that uses highly optimized PARI/GP C library. The results are two orders of magnitude better than plain python.
cypari2 (Pari/GP)
| Bit Size | Time |
|---|---|
| 8bit | 0.0281 seconds |
| 16bit | 0.0063 seconds |
| 32bit | 0.0068 seconds |
| 34bit | 0.0077 seconds |
| 36bit | 0.0072 seconds |
| 38bit | 0.0076 seconds |
| 40bit | 0.0078 seconds |
| 64bit | 0.0270 seconds |
| 128bit | 2.59 minutes |
| 256bit | ~250 years |
cypari2 serves the similar purpose, but instead giving an access to broad collection of tools, it provides python interface for PARI/GP alone. Results are almost identical.
python-flint
| Bit Size | Time |
|---|---|
| 8bit | 0.0002 seconds |
| 16bit | 0.0001 seconds |
| 32bit | 0.0008 seconds |
| 34bit | 0.0168 seconds |
| 36bit | 0.0149 seconds |
| 38bit | 0.0085 seconds |
| 40bit | 0.0104 seconds |
| 64bit | 0.0479 seconds |
| 128bit | 2.80 minutes |
| 256bit | ~250 years |
Python library that gives an access to FLINT – Fast Library for Number Theory. It is a modern, highly optimized C library for advanced number theory calculations. It is a direct competitor to PARI/GP. Both libraries uses algorithm cascade similar to this:
- Trial division
- Pollard’s rho
- Pollard’s p-1
- ECM
- Quadratic Sieve (QS)
- Number Field Sieve (NFS)
Both QS and NFS are sub-exponential sieve methods. Far more complex algorithms that work by collecting thousands of mathematical relationships and then using linear algebra on massive matrices to find the factors. They are used only by most sophisticated tools.
QS – core concept is that if:
- x2≡y2(modN))
then most likely:
gcd(x - y, N)will be a factor ofN
Next step is finding x and y, where again, “smooth” concept is used. Finally, when enough relations is collected, it looks for numbers multiplied together forms a perfect square
NFS – the most powerful factorization algorithm known today. Unbeatable for numbers over 110 digits. Has enormous complexity. It is an abstract over QS. NFS searches for polynomial equation, which has special relationship with number N, we are searching. It is scoring each equation, and spends a lot of time to find best equation, before solving it.
YAFU (standalone tool)
| Bit Size | Time |
|---|---|
| 8bit | 0.0330 seconds |
| 16bit | 0.0218 seconds |
| 32bit | 0.0265 seconds |
| 34bit | 0.0327 seconds |
| 36bit | 0.0454 seconds |
| 38bit | 0.0740 seconds |
| 40bit | 0.1599 seconds |
| 64bit | 0.2154 seconds |
| 128bit | 56.2382 seconds |
| 256bit | ~80-100 years |
YAFU – yet another factoring utility. Free, high-performance standalone command line tool. One of most the fastest and most powerful tools for factorization. Created for single purpose, and was 3x faster than SageMath multi tool. It uses own algorithmic cascade and Self-Initializing Quadratic Sieve (SIQS) – more advanced version of QS. SIQS tests few routes first, before digging deeper. Self-Initializing means it is creating next polynomial quickly, instead doing it from scratch. Thanks to that, most time consuming part is done in productive range.
CADO-NFS (standalone tool)
| Bit Size | Time |
|---|---|
| 8bit | too small |
| 16bit | too small |
| 32bit | too small |
| 64bit | too small |
| 128bit | 4.50 minutes |
| 130bit | 5.56 minutes |
| 132bit | 8min |
| 256bit | ~50 years |
CADO-NFS is research grade tool for high performance, distributed computing and world record factorizations. It was slower than YAFU, because it is designed for much higher numbers, and more computing power. The overhead was too much for 128x128bit prime.
FactorDB (online database)
| Bit Size | Time |
|---|---|
| 8bit | 0.2378 seconds |
| 16bit | 0.1419 seconds |
| 32bit | too long |
| 64bit | too long |
| 128bit | too long |
Online database that stores known factorizations. Instead of computing, we are asking for the result. Works only for really small numbers
Wolfram Alpha (online API)
| Bit Size | Time |
|---|---|
| 8bit | 2.0049 seconds |
| 16bit | 1.7616 seconds |
| 32bit | 1.6952 seconds |
| 34bit | 1.4012 seconds |
| 36bit | 1.8980 seconds |
| 38bit | 1.8980 seconds |
| 40bit | 1.7296 seconds |
| 64bit | 1.7817 seconds |
| 128bit | too long |
Online engine, powered by Mathematica. Can compute much higher numbers than factordb, but still relatively small for RSA key. For 64x64bit number, it was faster than fastest python implementation.
summary
- Right algorithm choice can lead to astronomical differences
- C libraries are hundreds time faster than python implementations
- huge complexity jump with doubling key size. That is why modern RSA keys are at least 2048 bit length.
- some algorithms base on pure luck. they can solve the problem fast, but are unreliable
- most powerful tools use cascade of tools to solve problem efficiently
- together with small key size “smoothness” is critical key vulnerability
Leave a Reply