Recent postsXML

Goblins for number theory, part 4

2025-03-17
After having introduced the basic concepts of Goblins, and in particular promises; after having looked at parallelisation over the network; and after an excursion to persistence, it is now time to get to the main topic. We would like more flexibility, as well in the behaviour of the clients as in the nature of the tasks they are handling and the control flow in the server. Clients should be able to come and go, maybe complete only one task never to be seen again (we will not handle the case of faults, however, that is, clients accepting a task and disappearing before completing it). Tasks could be heterogeneous, that is, take more or less time, or, equivalently, the clients could run on heterogeneous machines, and it would be nice to give out a new task to a client as soon as it finishes the previous one. And we would like the server to be able to work in rounds; in essence, distribute unrelated tasks corresponding to a loop, then gather the results and start with the next loop.

Goblins for number theory, part 3

2025-03-07
In previous posts we have seen how to solve our toy problem of computing the euclidian length of a vector in a distributed fashion using Goblins, with a client script that runs in several copies, carries out most of the work and reports back to a server script, which collects the partial results into a solution to the problem. The clients could in principle live on distant machines and communicate over the Tor network. For testing in a local setting, however, letting them run on the same machine as the server and communicating over TCP turns out to be more efficient. So far, our architecture is rather inflexible: We assume that the server knows the number of participating clients beforehand, and that all tasks take more or less the same time so that distributing them evenly to the clients is an optimal scheduling strategy. The logical next step is to overcome these limitations. My initial solution for a more general framework, however, turned out to be very inefficient. Jessica Tallon and David Thompson of the Spritely Institute (many thanks to them!) kindly had a look at it and came up with a much better solution; but our discussions also helped me understand Goblins better and inspired ideas on how to improve the current client and server scripts. So before going for more generality in the next post, let us do a pirouette with the current framework and also explore some interesting side tracks that did not make it into the previous post.

Goblins for number theory, part 2

2025-02-25
After seeing how to use the programming concepts of Goblins for a toy problem the structure of which resembles algorithms encountered in number theory, let us turn our attention to parallelising, or rather distributing the code. We keep the running example of computing the length of a vector, by giving out the tasks of squaring to the clients, and leaving the task of adding up the squares and taking the final square root to the server.

Goblins for number theory, part 1

2024-09-04
Most of my code in algorithmic number theory is written in C and runs in a parallelised version using MPI on a cluster. The C language is mandatory for efficiency reasons; MPI is mostly a convenience. Indeed number theoretic code is often embarrassingly parallel. For instance my ECPP implementation for primality proving essentially consists of a number of for loops running one after the other. A server process distributes evaluations of the function inside the loop to the available clients, which take a few seconds or even minutes to report back their respective results. These are then handled by the server before entering the next loop. In a cluster environment with a shared file system, this could even be realised by starting a number of clients over SSH, starting a server, and then using numbered files to exchange function arguments and results, or touch on files to send “signals” between the server and the clients. Computation time is the bottleneck, communication is minimal, so even doing this over NFS is perfectly feasible (and I have written and deployed such code in the past). MPI then provides a convenience layer that makes the process look more professional, and also integrates more smoothly with the batch submission approach of computation clusters.