Primality record with ECPP – 109297 digits
ECPP
Heuristically, the ECPP algorithm (more precisely, its FastECPP variant) is the fastest algorithm for certifying generic primes (as opposed to merely providing a one-sided test that can only certify that a prime is composite, or find out that with high probability it is prime without being certain). As an additional advantage, it provides a certificate that can be checked in polynomial time with a lower exponent than the certificate creation. My CM software has been used for most of the current record computations (it corresponds to all the user codes starting with the letter "E", the following number encodes the set of participants to the computation). My previous record for a repunit with 86453 digits, that is, the number (1086453 - 1)/9 consisting of 86453 successive digits 1, has been described in some detail in the publications FastECPP over MPI at the International Congress on Mathematical Software 2024, and it has been publicised in the Dutch newspaper NRC. This little note gives an update for the most recent record, the repunit with 109297 digits, an endeavour undertaken jointly with Paul Underwood using my CM software.
The two latest ECPP records
Let me first summarise a few numbers about the previous proof of the 86453 digit repunit. The first phase, which determines the parameters of a sequence of auxiliary elliptic curves, has been run on the PlaFRIM cluster, repeatedly with checkpointing for three days in a row, on a varying number of 759 to 2639 cores, depending on how busy the cluster was. In total, it has taken 383 CPU years and less than 4 months in real time, determining successively the parameters of 2979 elliptic curves. The CM software communicates via MPI over TCP, which enables it to run on a heterogeneous cluster of machines; it has mainly used the standard nodes of the plafrim cluster, in particular the zonda and diablo machines with 32-core AMD Zen2 Rome EPYC 7452 @ 2.35 GHz processors and other diablo machines with 64-core AMD Zen2 Rome EPYC 7702 @ 2 GHz and 64-core AMD Zen3 Milan EPYC 7763 @ 2.45 processors. The second phase, which computes the elliptic curves using the complex multiplication method, has been run on brise, a single machine with 96 cores Intel Xeon E7-8890 v4 @ 2.2GHz, which are older and slower, but the machine has 1TB of RAM, which is important during this part of the algorithm. Being slower, it is also less popular, so that I could use it for more than three days in a row. (Thanks to the administrators for being flexible and helpful!) The second phase has taken about 25 CPU years and also about 4 months of real time. I have verified the certificate with PARI/GP (it is a good idea to use different software for the verification) again on brise with its 96 cores, which took 190 CPU days, or almost two days of real time.
For the new record of a 109297 digit number, Paul Underwood has carried out the first phase on a single machine with 64 cores AMD Ryzen Threadripper 3990X Zen2 @ 2.9GHz, which took 87 CPU years and 21 months of real time, for a certificate with 6847 steps. The second phase has been carried out by me on the same machine with 96 cores as the previous record, which took 133 CPU years and also 21 months of real time. We ran the two phases mostly in parallel, so that the whole effort took about two years.
Comparing the first phases
As a first observation, it is remarkable that the first phase on a larger number took actually less CPU time! This is due to the fact that this phase is not embarrassingly parallel, unlike many computations in number theory; and as noticed in the accompanying publication, the 86453 digit record was probably over-parallelised. To measure things, we need to consider the size of the problem, which here is given by the number L of digits of the number to be proved prime. The proportion of sizes between the records is 109297/86453, or approximately 1.26. Since both phases of the algorithm have a complexity of O~(L4), one would expect the new record to take about 1.264 or 2.6 times as long as the old one. But in reality, the CPU time was lower by a factor of about 4.4; so between the actual performance and the prediction by the asymptotic complexity there is a factor of about 11.
Part of this can be explained by different single-core performance. Paul's machine has a higher clock rate, and he has modified the CM code to link with gwnum, a library for integer arithmetic using a floating-point FFT, which may lead to errors depending on the choice of parameters (but these errors can be found by doing an even faster test modulo a one-word prime, so they have no impact on the correctness of the final computation; and in any case, we end up with a certificate that is checked independently). So the computations are faster than with GMP, in experiments by a factor of about 1.6 for the size of numbers under consideration. Unfortunately, gwnum is not free software, so not taken into account for the CM development.
Concerning parallelism, during the first phase there is a test whether a number consists of a smooth part multiplied by a prime, where smooth means that the number completely factors into primes less than some bound B. Each core covers a range of about 229, so that B is about 229 multiplied by the number of cores. Thus with more cores, on average a larger smooth part is factored out. The next step continues with the new, smaller prime number, so that in fact the size of the smooth part corresponds to the number of digits gained in one step of the algorithm. Unfortunately this is not proportional to B, but only to log B. Dividing the size of the numbers by the length of the certificate, we see that we gained on average 96 bits per step in the old record and only 53 bits per step in the new record. If we assume that the first record was carried out with about 1500 cores (where in reality the number varied over the course of the computation) and the second one with 64 cores, then the ratio between the gains per step should be about (log (229) · 1500) / (log (229) · 64) or 1.13, which on one hand is a small gain for employing 20 times as many cores; and on the other hand does not match the observed factor of 96/53 or 1.8. However, there is another impact of using more cores: It increases the number of suitable curves found at any given step, of which only one can be kept. When a round of computations returns several candidates, the CM software chooses the one with the greatest gain in bits. With only 64 cores, there will often be no choice; with 1500 one can often choose between several ones. So to make a long explanation short: Using more cores leads to shorter certificates, which are found in less wallclock time, but at the expense of an increase in total CPU time.
Comparing the second phases
In the second phase, which for both records has been carried out on the same machine with the same multiprecision library so that the comparison becomes easier, the CPU time has indeed increased compared to the previous record, but by a factor of 133/25 or 5.3 instead of the theoretically predicted (109297/86453)4 or 2.6, which may be surprising at first. But we can do a more precise estimate, which takes the certificate length into account: The power 4 comes from a power 3 related to the arithmetic (root finding of the class polynomial modulo the prime to be certified) plus one power from the length of the certificate. Since the certificate for the new record has overproportionally more steps due to less parallelisation as seen above, we should use the better estimate (109297/86453)3 · (6847/2979) or 4.6, which essentially explains the running time. So with less parallelisation in the first phase, there is also a price to pay in the second phase.
Conclusion
The certificate files for the 109297 digit repunit are available from the CM ECPP page.
As is often the case with complex algorithms that depend on the careful choice of several parameters, it is not easy to predict their running times on inputs of varying size. The above musings try to explain our running time observations, but cannot claim to be the absolute truth. One thing is certain: For now, proving the next prime candidate for a repunit is out of reach. OEIS A004023 lists it as the repunit with 270343 digits; using the asymptotic complexity to obtain an estimated running time, we find that it will take (270343/109297)4 or 37 times as long as the current record. Otherwise said, instead of 21 months, it should take 66 years! So if you want to prove a new record prime, make sure to choose it from a different source.