Recent postsXML

Tiny Build Farm for Guix, part 1

2025-08-27
One of the oft-cited reasons people give for not switching to Guix is that their favourite software is too outdated, and a look at Repology shows that they are not wrong. Now the number of active committers in the Guix project is amazingly small, and even counting all contributors I am impressed by what these few people actually achieve. Nevertheless I wondered how I could improve the situation at least a little bit for packages I am interested in, that is, for the science team; and the first step is to get an account of what actually builds and what does not. So I decided to set up my own little build farm, limited to the packages in the scope of the science team, using the same technology that powers the bordeaux build farm. I call it the Tiny Build Farm for Guix, or TBFG for short, and this post is the first one in (hopefully) a series of blog posts about the topic; at the time of starting this series, the TBFG does not actually exist yet, so wish me luck.   ⟶

Wireguard VPN with Guix

2025-08-07
Recently I changed my ISP, and the new one uses Carrier-grade NAT, or CGNAT, by default. While this sounds fancy and professional, it is in fact even worse than conventional NAT: Not only do all my devices share the same IPv4, but I share one IPv4 with several other customers! Apparently I am only assigned a few out of the 65535 ports, and this assignment may change from day to day, which implies that I cannot connect from the outside to any of my home devices. However, I do have a separate IPv4 of my own for a virtual machine at Aquilenet, and it should be possible to use this as a trampoline to access my home through a virtual private network. We are already employing WireGuard for one of the Guix build farms, so it felt like a natural choice. Guix provides the wireguard-service-type, which is documented with all its options in the manual; but without an explanation of the general concepts behind the service it is a bit difficult to set up. The Guix Cookbook has an entry on WireGuard, but it is concerned with kernel modules and connecting to an existing WireGuard VPN, while my goal was to set one up in the first place. This turned out to be surprisingly easy.   ⟶

Goblins for number theory, part 4

2025-03-17
After having introduced the basic concepts of Goblins, and in particular promises; after having looked at parallelisation over the network; and after an excursion to persistence, it is now time to get to the main topic. We would like more flexibility, as well in the behaviour of the clients as in the nature of the tasks they are handling and the control flow in the server. Clients should be able to come and go, maybe complete only one task never to be seen again (we will not handle the case of faults, however, that is, clients accepting a task and disappearing before completing it). Tasks could be heterogeneous, that is, take more or less time, or, equivalently, the clients could run on heterogeneous machines, and it would be nice to give out a new task to a client as soon as it finishes the previous one. And we would like the server to be able to work in rounds; in essence, distribute unrelated tasks corresponding to a loop, then gather the results and start with the next loop.   ⟶

Goblins for number theory, part 3

2025-03-07
In previous posts we have seen how to solve our toy problem of computing the euclidian length of a vector in a distributed fashion using Goblins, with a client script that runs in several copies, carries out most of the work and reports back to a server script, which collects the partial results into a solution to the problem. The clients could in principle live on distant machines and communicate over the Tor network. For testing in a local setting, however, letting them run on the same machine as the server and communicating over TCP turns out to be more efficient. So far, our architecture is rather inflexible: We assume that the server knows the number of participating clients beforehand, and that all tasks take more or less the same time so that distributing them evenly to the clients is an optimal scheduling strategy. The logical next step is to overcome these limitations. My initial solution for a more general framework, however, turned out to be very inefficient. Jessica Tallon and David Thompson of the Spritely Institute (many thanks to them!) kindly had a look at it and came up with a much better solution; but our discussions also helped me understand Goblins better and inspired ideas on how to improve the current client and server scripts. So before going for more generality in the next post, let us do a pirouette with the current framework and also explore some interesting side tracks that did not make it into the previous post.   ⟶

Goblins for number theory, part 2

2025-02-25
After seeing how to use the programming concepts of Goblins for a toy problem the structure of which resembles algorithms encountered in number theory, let us turn our attention to parallelising, or rather distributing the code. We keep the running example of computing the length of a vector, by giving out the tasks of squaring to the clients, and leaving the task of adding up the squares and taking the final square root to the server.   ⟶

Goblins for number theory, part 1

2024-09-04
Most of my code in algorithmic number theory is written in C and runs in a parallelised version using MPI on a cluster. The C language is mandatory for efficiency reasons; MPI is mostly a convenience. Indeed number theoretic code is often embarrassingly parallel. For instance my ECPP implementation for primality proving essentially consists of a number of for loops running one after the other. A server process distributes evaluations of the function inside the loop to the available clients, which take a few seconds or even minutes to report back their respective results. These are then handled by the server before entering the next loop. In a cluster environment with a shared file system, this could even be realised by starting a number of clients over SSH, starting a server, and then using numbered files to exchange function arguments and results, or touch on files to send “signals” between the server and the clients. Computation time is the bottleneck, communication is minimal, so even doing this over NFS is perfectly feasible (and I have written and deployed such code in the past). MPI then provides a convenience layer that makes the process look more professional, and also integrates more smoothly with the batch submission approach of computation clusters.   ⟶