Recent posts
Goblins for number theory, part 4
2025-03-17
After having
introduced the basic concepts
of Goblins, and in particular promises;
after having looked at
parallelisation over the network;
and after an excursion to
persistence,
it is now time to get to the main topic.
We would like more flexibility, as well in the behaviour of the clients
as in the nature of the tasks they are handling and the control flow
in the server.
Clients should be able to come and go, maybe complete only one task never
to be seen again (we will not handle the case of faults, however, that is,
clients accepting a task and disappearing before completing it).
Tasks could be heterogeneous, that is, take more or less time, or,
equivalently, the clients could run on heterogeneous machines, and it would
be nice to give out a new task to a client as soon as it finishes the
previous one.
And we would like the server to be able to work in rounds; in essence,
distribute unrelated tasks corresponding to a loop, then gather the results
and start with the next loop.
⟶
Goblins for number theory, part 3
2025-03-07
In previous posts we have seen how to solve our
toy problem of computing the
euclidian length of a vector in a
distributed fashion using Goblins,
with a client script that runs in several copies, carries out most of the
work and reports back to a server script, which collects the partial
results into a solution to the problem.
The clients could in principle live on distant machines and communicate
over the Tor network. For testing in a local setting, however, letting
them run on the same machine as the server and communicating over TCP
turns out to be more efficient.
So far, our architecture is rather inflexible: We assume that the server
knows the number of participating clients beforehand, and that all tasks
take more or less the same time so that distributing them evenly to the
clients is an optimal scheduling strategy.
The logical next step is to overcome these limitations.
My initial solution for a more general framework, however, turned out to
be very inefficient. Jessica Tallon and David Thompson of the
Spritely Institute (many thanks
to them!) kindly had a look at it and came up with a much better solution;
but our discussions also helped me understand Goblins better and inspired
ideas on how to improve the current client and server scripts.
So before going for more generality in the next post, let us do a pirouette
with the current framework and also explore some interesting side tracks
that did not make it into the previous post.
⟶
Goblins for number theory, part 2
2025-02-25
After seeing how to use the
programming concepts
of Goblins
for a toy problem the structure of which resembles algorithms encountered
in number theory, let us turn our attention to parallelising, or rather
distributing the code. We keep the running example of computing the length
of a vector, by giving out the tasks of squaring to the clients, and leaving
the task of adding up the squares and taking the final square root to the
server.
⟶
Goblins for number theory, part 1
2024-09-04
Most of my code in algorithmic number theory is written in C and runs in a
parallelised version using MPI on a cluster.
The C language is mandatory for efficiency reasons;
MPI is mostly a convenience. Indeed number theoretic code is often
embarrassingly parallel. For instance my
ECPP
implementation for primality proving essentially consists of a number
of for
loops running one after the other. A server process
distributes evaluations of the function inside the loop to the available
clients, which take a few seconds or even minutes to report back their
respective results. These are then handled by the server before entering
the next loop.
In a cluster environment with a shared file system, this could even be
realised by starting a number of clients over SSH, starting a server, and
then using numbered files to exchange function arguments and results,
or touch
on files to send “signals” between the server and
the clients.
Computation time is the bottleneck, communication is minimal, so even
doing this over NFS is perfectly feasible (and I have written and deployed
such code in the past).
MPI then provides a convenience layer that makes the process look more
professional, and also integrates more smoothly with the batch submission
approach of computation clusters.
⟶