<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom"><title>enge@inria</title><id>https://enge.math.u-bordeaux.fr/feed.xml</id><subtitle>Recent Posts</subtitle><updated>2026-04-16T16:17:46Z</updated><link href="https://enge.math.u-bordeaux.fr/feed.xml" rel="self" /><link href="https://enge.math.u-bordeaux.fr" /><entry><title>Autonomie et souveraineté numerique</title><id>https://enge.math.u-bordeaux.fr/blog/souverainete.html</id><author><name>Andreas Enge and Michaël Ferrec</name><email>andreas.enge@inria.fr</email></author><updated>2026-04-14T00:00:00Z</updated><link href="https://enge.math.u-bordeaux.fr/blog/souverainete.html" rel="alternate" /><content type="html">&lt;div&gt;

&lt;p&gt;
Le texte suivant a été écrit en collaboration avec
&lt;a href=&quot;https://fr.linkedin.com/in/michaelferrec&quot;&gt;Michaël Ferrec&lt;/a&gt;
d'&lt;a href=&quot;https://www.inspeere.com/&quot;&gt;Inspeere&lt;/a&gt;.
Il a été inspiré par des discussions au sein du groupe de travail sur la
souveraineté numérique d'&lt;a href=&quot;https://pole-enter.org/&quot;&gt;ENTER&lt;/a&gt;,
le pôle de compétitivité
&lt;i&gt;Excellence Numérique au service des Transitions Environnementales et
Responsables&lt;/i&gt; en
&lt;a href=&quot;https://www.nouvelle-aquitaine.fr/&quot;&gt;Nouvelle Aquitaine&lt;/a&gt;.
Il tente justement de proposer une définition du terme
sous-jacent aux travaux du groupe.
&lt;/p&gt;


&lt;h2&gt;Définition et terminologie&lt;/h2&gt;

&lt;p&gt;
La notion de souveraineté s'applique strictement a un état ou un territoire.
Pour une organisation (entreprise, collectivité, administration) ou un
individu, il convient de parler plutôt d'&lt;i&gt;autonomie&lt;/i&gt;: la capacité de
faire des choix sans interférence extérieure et de les faire évoluer dans
le temps. La résilience face aux chocs extérieurs en fait partie.
Dans les deux cas, il s'agit d'un axe, pas d'un état binaire. Cet axe va
de la dependance totale à la souveraineté idéale, théorique et
inatteignable. Ce qui définit la position d'un acteur sur cet axe, c'est sa
capacité à identifier, mesurer et gérer ses risques de dépendance.
L'autonomie n'est pas une fin en soi: elle s'évalue en regard des objectifs
et des moyens disponibles.
&lt;/p&gt;
&lt;p&gt;
&lt;i&gt;Principe fondateur&lt;/i&gt;:
On ne peut pas déléguer son independance. Confier sa souveraineté à un
tiers, c'est y renoncer par définition, quelle que soit l'etiquette
apposée sur le service.
&lt;/p&gt;


&lt;h2&gt;Les enjeux selon l'échelle&lt;/h2&gt;

&lt;h3&gt;Au niveau de l'organisation&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
Continuité opérationnelle:
un fournisseur racheté ou soumis à une juridiction étrangere peut
interrompre l'activité du jour au lendemain.
&lt;/li&gt;&lt;li&gt;
Risque juridique:
l'hébergement par un acteur soumis au Cloud Act ou FISA expose à des
accès non maîtrisés.
&lt;/li&gt;&lt;li&gt;
Dépendance économique:
la captivité technique réduit la capacité à négocier et à migrer, et
génère une hausse structurelle des couts.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;Au niveau du territoire&lt;/h3&gt;

&lt;p&gt;
À cette échelle, les enjeux sont géostratégiques. La dépendance
technologique n'est plus seulement un risque opérationnel: c'est une
vulnérabilite nationale en cas de crise.
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Autonomie décisionnelle:
en situation de crise géopolitique ou militaire, une nation dépendante
d'acteurs étrangers sur ses infrastructures critiques ne peut plus décider
librement.
&lt;/li&gt;&lt;li&gt;
Rapports de force:
la dépendance technologique est devenue un levier de pression diplomatique
(sanctions, restrictions d'exportation, décisions unilatérales).
&lt;/li&gt;&lt;li&gt;
Développement industriel:
externaliser massivement vers des acteurs extra-européens revient a financer
des filières étrangeres. La commande publique orientée filière est le
principal levier correctif.
&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;Les dimensions d'analyse&lt;/h2&gt;

&lt;p&gt;
Sept dimensions permettent de mesurer la position sur l'axe.
Elles sont cumulatives et interdépendantes.
&lt;/p&gt;

&lt;h3&gt;Logiciel&lt;/h3&gt;

&lt;p&gt;
L'autonomie logicielle est étroitement lieé aux
&lt;a href=&quot;https://www.gnu.org/philosophy/free-sw.fr.html#four-freedoms&quot;&gt;libertes
du logiciel libre&lt;/a&gt;:
liberté d'exécuter, d'adapter, de redistribuer.
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Je suis autonome si j'utilise un logiciel libre développé en interne.
&lt;/li&gt;&lt;li&gt;
Je suis moins autonome si j'utilise un logiciel libre développé par
d'autres, ou un logiciel propriétaire développé en interne.
&lt;/li&gt;&lt;li&gt;
Je suis moins autonome si j'utilise un logiciel propriétaire développé
par un tiers.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;Données&lt;/h3&gt;

&lt;p&gt;
L'autonomie se mesure par l'utilisation de formats ouverts et
interopérables. La nature des logiciels disponibles pour traiter
les données entre egalement en jeu.
&lt;/p&gt;

&lt;h3&gt;Matériel&lt;/h3&gt;

&lt;p&gt;
L'essentiel du materiel numérique etant concu et fabriqué à l'étranger,
la question se rapproche de celle de la souveraineté et pose le probleme
de la résilience des chaînes d'approvisionnement.
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Je suis autonome si j'emploie du matériel de ma propre conception,
fabriqué en Europe.
&lt;/li&gt;&lt;li&gt;
Je suis moins autonome si j'emploie du matériel standard disponible
chez un grand nombre de fournisseurs en compétition.
&lt;/li&gt;&lt;li&gt;
Je suis moins autonome si j'ai besoin de matériel dedié, fabriqué en
petites series, ou concu comme boite noire non modifiable.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;Infrastructure et hébergement&lt;/h3&gt;

&lt;p&gt;
Il s'agit de l'endroit ou logiciels et données sont mis ensemble,
pour un usage interne ou externe.
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Je suis autonome si j'héberge mes logiciels et données sur mon propre
matériel (&lt;i&gt;on premise&lt;/i&gt;).
&lt;/li&gt;&lt;li&gt;
Je suis moins autonome si j'héberge chez quelqu'un d'autre
(&lt;i&gt;datacentre&lt;/i&gt; tiers).
&lt;/li&gt;&lt;li&gt;
Je suis moins autonome si mes données sont hébergees et traitées par des
logiciels tiers en ligne (&lt;i&gt;cloud&lt;/i&gt;).
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;Juridique&lt;/h3&gt;

&lt;p&gt;
Le cadre juridique peut limiter l'autonomie ou la garantir. On est plus
autonome dans un état de droit liberal et dans sa propre juridiction.
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Je suis autonome si j'opère mes activités numériques en Europe.
&lt;/li&gt;&lt;li&gt;
Je suis moins autonome si j'opère dans un état de droit hors Europe.
&lt;/li&gt;&lt;li&gt;
Je suis moins autonome si j'opère dans un régime juridique opaque
ou instable.
&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;Connaissances&lt;/h3&gt;

&lt;p&gt;
L'autonomie augmente si l'on choisit des solutions bien documentées pour
lesquelles il existe un grand nombre de personnes formées.
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Je suis autonome si j'emploie des solutions largement répandues, enseignées
et bien documentées.
&lt;/li&gt;&lt;li&gt;
Je suis moins autonome si j'emploie des solutions ilôt pour lesquelles
il y a un monopole sur la connaissance.
&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;Économique&lt;/h3&gt;

&lt;p&gt;
La finitude des ressources limite l'autonomie. L'absence de concurrence peut
entraîner une dépendance vis-à-vis d'un fournisseur et rendre des choix
difficilement réversibles.
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Je suis autonome si je choisis des solutions compatibles avec mes moyens,
sur un marché concurrentiel.
&lt;/li&gt;&lt;li&gt;
Je suis moins autonome si je dépends d'un fournisseur unique qui à le
monopole sur la solution.
&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;La limite de la délégation&lt;/h2&gt;

&lt;p&gt;
Une organisation ne peut pas être spécialiste de toute son infrastructure.
Il est légitime de confier l'exploitation à un tiers. La question est de
savoir à quel moment cette delegation devient une perte de maîtrise.
La distinction fondamentale: on peut déléguer l'exploitation, pas la
maîtrise. La question n'est pas &amp;quot;est-ce que je gère moi-même?&amp;quot;
mais &amp;quot;est-ce que je peux reprendre la main?&amp;quot;
&lt;/p&gt;&lt;p&gt;
Quatre critères permettent d'évaluer si une délégation est saine
ou si elle constitue une perte de contrôle :
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;i&gt;Réversibilité&lt;/i&gt;:
puis-je partir et récuperer mes données dans des formats exploitables?
&lt;/li&gt;&lt;li&gt;
&lt;i&gt;Criticité&lt;/i&gt;:
ce composant est-il central à mon activité ou périphérique?
&lt;/li&gt;&lt;li&gt;
&lt;i&gt;Transparence&lt;/i&gt;:
sais-je ce qui se passe sur mes données? Un audit est-il possible?
&lt;/li&gt;&lt;li&gt;
&lt;i&gt;Concurrence&lt;/i&gt;:
existe-t-il des alternatives crédibles, ou suis-je face à un monopole
de fait?
&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;
La souveraineté parfaite n'existe pas, ni pour une organisation ni pour
un territoire. L'objectif n'est pas l'indépendance absolue mais la
maîtrise consciente de ses dépendances: connaitre sa position sur l'axe,
comprendre les risques associés, et progresser méthodiquement sur les
dimensions les plus critiques.
L'absence de politique assumée à ce sujet est elle-même un choix, dont le
coût ne devient pleinement visible qu'en temps de crise, quand les rapports
de force s'exercent et que la dépendance se transforme en vulnerabilité.
&lt;/p&gt;

&lt;/div&gt;</content></entry><entry><title>Primality record with ECPP – 109297 digits</title><id>https://enge.math.u-bordeaux.fr/blog/ecpp-109297.html</id><author><name>Andreas Enge</name><email>andreas.enge@inria.fr</email></author><updated>2026-02-11T00:00:00Z</updated><link href="https://enge.math.u-bordeaux.fr/blog/ecpp-109297.html" rel="alternate" /><content type="html">&lt;div&gt;

&lt;h2&gt;ECPP&lt;/h2&gt;

&lt;p&gt;
Heuristically, the
&lt;a href=&quot;https://en.wikipedia.org/wiki/Elliptic_curve_primality&quot;&gt;ECPP
algorithm&lt;/a&gt; (more precisely, its FastECPP variant) is the fastest
algorithm for certifying generic primes (as opposed to merely providing
a one-sided test that can only certify that a prime is composite, or find
out that with high probability it is prime without being certain).
As an additional advantage, it provides a certificate that can be checked
in polynomial time with a lower exponent than the certificate creation.
My &lt;a href=&quot;https://www.multiprecision.org/cm/ecpp.html&quot;&gt;CM software&lt;/a&gt;
has been used for most of the current
&lt;a href=&quot;https://t5k.org/top20/page.php?id=27&quot;&gt;record computations&lt;/a&gt;
(it corresponds to all the user codes starting with the letter &amp;quot;E&amp;quot;, the
following number encodes the set of participants to the computation).
My previous record for a &lt;i&gt;repunit&lt;/i&gt; with 86453 digits, that is,
the number (10&lt;sup&gt;86453&lt;/sup&gt; - 1)/9 consisting of 86453 successive
digits 1, has been described in some detail in the publications
&lt;a href=&quot;https://hal.science/hal-04522492/&quot;&gt;FastECPP over MPI&lt;/a&gt;
at the
&lt;a href=&quot;https://doi.org/10.1007/978-3-031-64529-7_4&quot;&gt;International
Congress on Mathematical Software 2024&lt;/a&gt;, and it has been publicised
in the Dutch newspaper
&lt;a href=&quot;https://www.nrc.nl/nieuws/2023/06/10/8645386453-a4166620&quot;&gt;NRC&lt;/a&gt;.
This little note gives an update for the most recent record, the repunit
with 109297 digits, an endeavour undertaken jointly with
&lt;a href=&quot;http://worldofprimes.co.uk/introdcution&quot;&gt;Paul Underwood&lt;/a&gt;
using my CM software.
&lt;/p&gt;

&lt;h2&gt;The two latest ECPP records&lt;/h2&gt;

&lt;p&gt;
Let me first summarise a few numbers about the previous proof of the
86453 digit repunit. The first phase, which determines the parameters
of a sequence of auxiliary elliptic curves, has been run on the
&lt;a href=&quot;https://plafrim-users.gitlabpages.inria.fr/doc/&quot;&gt;PlaFRIM&lt;/a&gt;
cluster, repeatedly with checkpointing for three days in a row,
on a varying number of 759 to 2639 cores, depending on how busy the
cluster was. In total, it has taken 383 CPU years and less than 4 months
in real time, determining successively the parameters of 2979 elliptic
curves.
The CM software communicates via MPI over TCP, which enables it to run
on a heterogeneous cluster of machines; it has mainly used the
&lt;a href=&quot;https://plafrim-users.gitlabpages.inria.fr/doc/#standard_nodes&quot;&gt;standard
nodes&lt;/a&gt; of the plafrim cluster, in particular the
zonda and diablo machines with 32-core AMD Zen2 Rome EPYC 7452 @ 2.35 GHz
processors and other
diablo machines with 64-core AMD Zen2 Rome EPYC 7702 @ 2 GHz and
64-core AMD Zen3 Milan EPYC 7763 @ 2.45 processors.
The second phase, which computes the elliptic curves using the complex
multiplication method, has been run on
&lt;a href=&quot;https://plafrim-users.gitlabpages.inria.fr/doc/#big_memory_nodes&quot;&gt;brise&lt;/a&gt;,
a single machine with 96 cores Intel Xeon E7-8890 v4 @ 2.2GHz,
which are older and slower, but the machine has 1TB of RAM, which is
important during this part of the algorithm. Being slower, it is also
less popular, so that I could use it for more than three days in a row.
(Thanks to the administrators for being flexible and helpful!)
The second phase has taken about 25 CPU years and also about 4 months
of real time.
I have verified the certificate with
&lt;a href=&quot;https://pari.math.u-bordeaux.fr/&quot;&gt;PARI/GP&lt;/a&gt;
(it is a good idea to use &lt;i&gt;different&lt;/i&gt; software for the verification)
again on brise with its 96 cores, which took 190 CPU days,
or almost two days of real time.
&lt;/p&gt;
&lt;p&gt;
For the new record of a 109297 digit number, Paul Underwood has carried
out the first phase on a single machine with 64 cores
AMD Ryzen Threadripper 3990X Zen2 @ 2.9GHz,
which took 87 CPU years and 21 months of real time, for a certificate
with 6847 steps. The second phase has been carried out by me on the same
machine with 96 cores as the previous record, which took 133 CPU years
and also 21 months of real time.
We ran the two phases mostly in parallel, so that the whole effort took
about two years.
&lt;/p&gt;

&lt;h2&gt;Comparing the first phases&lt;/h2&gt;

&lt;p&gt;
As a first observation, it is remarkable that the first phase on a larger
number took actually less CPU time! This is due to the fact that this phase
is &lt;i&gt;not&lt;/i&gt; embarrassingly parallel, unlike many computations in number
theory; and as noticed in the accompanying publication, the 86453 digit
record was probably over-parallelised.
To measure things, we need to consider the size of the problem, which here
is given by the number L of digits of the number to be proved prime.
The proportion of sizes between the
records is 109297/86453, or approximately 1.26. Since both phases of the
algorithm have a complexity of O~(L&lt;sup&gt;4&lt;/sup&gt;), one would expect the
new record to take about 1.26&lt;sup&gt;4&lt;/sup&gt; or 2.6 times as long as the
old one. But in reality, the CPU time was &lt;i&gt;lower&lt;/i&gt; by a factor of
about 4.4; so between the actual performance and the prediction by the
asymptotic complexity there is a factor of about 11.
&lt;/p&gt;
&lt;p&gt;
Part of this can be explained by different single-core performance.
Paul's machine has a higher clock rate, and he has modified the CM code
to link with gwnum, a library for integer arithmetic using a
floating-point FFT, which may lead to errors depending on the choice of
parameters (but these errors can be found by doing an even faster test
modulo a one-word prime, so they have no impact on the correctness of
the final computation; and in any case, we end up with a certificate
that is checked independently). So the computations are faster than with
&lt;a href=&quot;https://gmplib.org/&quot;&gt;GMP&lt;/a&gt;, in experiments by a factor of
about 1.6 for the size of numbers under consideration. Unfortunately,
gwnum is not free software, so not taken into account for the
CM development.
&lt;/p&gt;
&lt;p&gt;
Concerning parallelism, during the first phase there is a test whether
a number consists of a smooth part multiplied by a prime, where
&lt;i&gt;smooth&lt;/i&gt; means that the number completely factors into primes less
than some bound B. Each core covers a range of about 2&lt;sup&gt;29&lt;/sup&gt;,
so that B is about 2&lt;sup&gt;29&lt;/sup&gt; multiplied by the number of cores.
Thus with more cores, on average a larger smooth part is factored out.
The next step continues with the new, smaller prime number, so that in
fact the size of the smooth part corresponds to the number of digits
gained in one step of the algorithm.
Unfortunately this is not proportional to B, but
only to log B. Dividing the size of the numbers by the length of the
certificate, we see that we gained on average 96 bits per step in the
old record and only 53 bits per step in the new record.
If we assume that the first record was carried out with about 1500 cores
(where in reality the number varied over the course of the computation)
and the second one with 64 cores, then the ratio between the gains per
step should be about
(log (2&lt;sup&gt;29&lt;/sup&gt;) · 1500) / (log (2&lt;sup&gt;29&lt;/sup&gt;) · 64)
or 1.13, which on one hand is a small gain for employing 20 times as
many cores; and on the other hand does not match the observed factor
of 96/53 or 1.8.
However, there is another impact of using more cores: It increases the
number of suitable curves found at any given step, of which only one
can be kept. When a round of computations returns several candidates,
the CM software chooses the one with the greatest gain in bits.
With only 64 cores, there will often be no choice; with 1500
one can often choose between several ones.
So to make a long explanation short:
Using more cores leads to shorter certificates, which are found in
less wallclock time, but at the expense of an increase in total CPU time.
&lt;/p&gt;

&lt;h2&gt;Comparing the second phases&lt;/h2&gt;

&lt;p&gt;
In the second phase, which for both records has been carried out on the
same machine with the same multiprecision library so that the comparison
becomes easier, the CPU time has indeed increased compared to the previous
record, but by a factor of 133/25 or 5.3 instead of the theoretically
predicted (109297/86453)&lt;sup&gt;4&lt;/sup&gt; or 2.6, which may be surprising
at first.
But we can do a more precise estimate, which takes the certificate length
into account: The power 4 comes from a power 3 related to the arithmetic
(root finding of the class polynomial modulo the prime to be certified)
plus one power from the length of the certificate. Since the certificate
for the new record has overproportionally more steps due to less
parallelisation as seen above, we should use the better estimate
(109297/86453)&lt;sup&gt;3&lt;/sup&gt; · (6847/2979) or 4.6,
which essentially explains the running time.
So with less parallelisation in the first phase, there is also a price
to pay in the second phase.
&lt;/p&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;
The certificate files for the 109297 digit repunit are available
from the
&lt;a href=&quot;https://www.multiprecision.org/cm/ecpp.html&quot;&gt;CM ECPP&lt;/a&gt;
page.
&lt;/p&gt;
&lt;p&gt;
As is often the case with complex algorithms that depend on the careful
choice of several parameters, it is not easy to predict their running
times on inputs of varying size. The above musings try to explain our
running time observations, but cannot claim to be the absolute truth.
One thing is certain: For now, proving the next prime candidate for a
repunit is out of reach.
&lt;a href=&quot;https://oeis.org/A004023&quot;&gt;OEIS A004023&lt;/a&gt;
lists it as the repunit with 270343 digits; using the asymptotic
complexity to obtain an estimated running time, we find that it will
take
(270343/109297)&lt;sup&gt;4&lt;/sup&gt; or 37 times as long as the current record.
Otherwise said, instead of 21 months, it should take 66 years!
So if you want to prove a new record prime, make sure to choose it from
a different source.
&lt;/p&gt;

&lt;/div&gt;</content></entry><entry><title>Tiny Build Farm for Guix, part 2</title><id>https://enge.math.u-bordeaux.fr/blog/tbfg-2.html</id><author><name>Andreas Enge</name><email>andreas.enge@inria.fr</email></author><updated>2025-10-24T00:00:00Z</updated><link href="https://enge.math.u-bordeaux.fr/blog/tbfg-2.html" rel="alternate" /><content type="html">&lt;div&gt;

&lt;h1&gt;Building science packages&lt;/h1&gt;

&lt;p&gt;
In our efforts to create a Tiny Build Farm for Guix, that is supposed
to report on the status of the packages assigned to the science team,
so far we have seen how to
&lt;a href=&quot;tbfg-1.html&quot;&gt;set up&lt;/a&gt;
the required infrastructure.
On a dedicated machine with Guix as its operating system, we have added
several Shepherd services:
the Guix Build Coordinator together with a build agent;
and the web server part of the BFFE, which enables us to follow the
activity of the builders.
For performance reasons, we have renounced at installing an instance of the
Guix Data Service, and opt instead for talking to the instance operated
by the Guix project at &lt;code&gt;https://data.guix.gnu.org/&lt;/code&gt;, which
continually evaluates the Guix master branch and creates derivations for
all packages in the distribution.
The next step is to explore how to programmatically talk to the remote
data server from a Guile script, how to extract derivations we are
interested in, and how to submit them for building to our instance of the
build coordinator.
&lt;/p&gt;


&lt;h2&gt;Getting information from the data service&lt;/h2&gt;

&lt;p&gt;
We need to install the two packages
&lt;code&gt;guix-data-service&lt;/code&gt; and (for later use)
&lt;code&gt;guix-build-coordinator&lt;/code&gt; on the TBFG machine, which contain
Guile libraries with the necessary functionality.
&lt;/p&gt;
&lt;p&gt;
⚠ If installed into a user profile, both packages pull in the
&lt;code&gt;guix&lt;/code&gt; package as a propagated input, which prevents the user
from updating it through &lt;code&gt;guix pull&lt;/code&gt;.
It is thus recommended to run
&lt;/p&gt;
&lt;pre&gt;
guix shell guile-next guix-data-service guix-build-coordinator
&lt;/pre&gt;
&lt;p&gt;
instead. At the time of writing, the &lt;code&gt;guile&lt;/code&gt; package in Guix
is at version 3.0.9, while the data service library requires
&lt;code&gt;guile-next&lt;/code&gt;, which is at version 3.0.10.
&lt;/p&gt;
&lt;p&gt;
Let us open a Guile REPL and execute the following code
(to ease copy-pasting, I omit the prompt of the REPL;
lines starting with a $ sign and a number correspond to results).
&lt;/p&gt;
&lt;pre&gt;
$ guile
(use-modules (guix-data-service client))
(define my-data-service &amp;quot;https://data.guix.gnu.org/&amp;quot;)

(define json
  (guix-data-service-request my-data-service
                             &amp;quot;repository/1/branch/master.json&amp;quot;))
json
$1 = ((&amp;quot;revisions&amp;quot; . #(((&amp;quot;data_available&amp;quot; . #f) (&amp;quot;commit-hash&amp;quot; . &amp;quot;cb47639a8081e8e2d651ad1612bbd1e482766469&amp;quot;) …
&lt;/pre&gt;
&lt;p&gt;
The call to &lt;code&gt;guix-data-service-request&lt;/code&gt;
is equivalent to opening the URL
&lt;a href=&quot;https://data.guix.gnu.org/repository/1/branch/master.json&quot;&gt;&lt;code&gt;https://data.guix.gnu.org/repository/1/branch/master.json&lt;/code&gt;&lt;/a&gt;,
which executes the same query as the URL
&lt;a href=&quot;https://data.guix.gnu.org/repository/1/branch/master&quot;&gt;&lt;code&gt;https://data.guix.gnu.org/repository/1/branch/master&lt;/code&gt;&lt;/a&gt;
without the &lt;code&gt;.json&lt;/code&gt; at the end, but it returns the result in
&lt;a href=&quot;https://en.wikipedia.org/wiki/JSON#Data_types&quot;&gt;JSON&lt;/a&gt; format.
Moreover, the function call transforms the JSON into a Guile data structure
through the
&lt;a href=&quot;https://github.com/aconchillo/guile-json&quot;&gt;guile-json&lt;/a&gt;
library; in particular, JSON arrays become Guile
&lt;a href=&quot;https://www.gnu.org/software/guile/manual/guile.html#Vectors&quot;&gt;vectors&lt;/a&gt;
and JSON objects become Guile
&lt;a href=&quot;https://www.gnu.org/software/guile/manual/guile.html#Association-Lists&quot;&gt;association
lists&lt;/a&gt;, or &lt;i&gt;alists&lt;/i&gt; for short (these are lists of key-value pairs,
so brace yourself for lots of parentheses in a row).
Thus parsing the result and extracting the information we are interested in
amounts to unwrapping these successive layers; in true Scheme/Lisp style
we will also usually transform the vectors into lists using the
&lt;code&gt;vector-&amp;gt;list&lt;/code&gt; function.
The JSON we asked for is an object with a unique field
&lt;code&gt;revisions&lt;/code&gt;, which contains an array of revisions, that is,
git commits on the master branch;
every revision is an object with the three fields
&lt;code&gt;date&lt;/code&gt;, &lt;code&gt;commit-hash&lt;/code&gt; (these are strings)
and &lt;code&gt;data_available&lt;/code&gt;, a boolean indicating whether the data
service has computed the derivations for this commit or not
(which corresponds to the green or grey badges on the website).
This structure can be derived by looking at and playing with the variables
in the REPL, or probably more conveniently by opening the corresponding URL
in a web browser, which should show the JSON in a special mode.
We can now write a small function (or maybe two even smaller functions)
that query the data service and return a list of revisions for which the
data service has computed the derivations:
&lt;/p&gt;
&lt;pre&gt;
(define (data-available? revision)
  ;; Given a REVISION, check whether it has been treated by the
  ;; data service.
  (assoc-ref revision &amp;quot;data_available&amp;quot;))

(define (get-revisions data-service)
  ;; Query DATA-SERVICE for the list of revisions it has successfully
  ;; treated in the master branch.
  (filter data-available?
    (vector-&amp;gt;list
      (assoc-ref
        (guix-data-service-request data-service
          &amp;quot;repository/1/branch/master.json&amp;quot;)
        &amp;quot;revisions&amp;quot;))))

(define revisions (get-revisions my-data-service))
revisions
$2 = (((&amp;quot;data_available&amp;quot; . #t) (&amp;quot;commit-hash&amp;quot; . &amp;quot; …
&lt;/pre&gt;
&lt;p&gt;
In the following, we will work with revisions in this form, although mainly
the commit hashes are of interest. We could print them as follows:
&lt;/p&gt;
&lt;pre&gt;
(define commits
  (map (lambda (revision)
         (assoc-ref revision &amp;quot;commit-hash&amp;quot;))
       revisions))
commits
$3 = (&amp;quot;b966f4007c8492ad89eedf32dd91b3352dba594e&amp;quot; &amp;quot;8a1f56cf8710fc142a2f8ef2e52be82e8aa9f53e&amp;quot; …
(length commits)
$4 = 46
(define commit (car commits))
commit
$5 = b966f4007c8492ad89eedf32dd91b3352dba594e
&lt;/pre&gt;
&lt;p&gt;
By default the data service returns 100 revisions (including those for which
no data is available), which will be amply enough for our purposes.
&lt;/p&gt;
&lt;p&gt;
The next step is to obtain the derivations for a given revision, say the
newest one with data available. Again this is most easily
reverse-engineered from the web interface of the data service:
Click on the latest revision with a green badge, then on
&lt;i&gt;View package derivations&lt;/i&gt;; this shows how the URL is to be formed.
Since we need all derivations, we also have to tick the &lt;i&gt;All results&lt;/i&gt;
checkbox; on the other hand, we may limit to one architecture, say
&lt;code&gt;x86_64-linux&lt;/code&gt; as &lt;i&gt;System&lt;/i&gt;, and not consider
cross-compilation by choosing &lt;code&gt;(no target)&lt;/code&gt; for &lt;i&gt;Target&lt;/i&gt;.
These choices add GET parameters to the query, which can be passed
as an alist for the optional third parameter of
&lt;code&gt;guix-data-service-request&lt;/code&gt;. Again adding &lt;code&gt;.json&lt;/code&gt;
to the URL (in front of the &lt;code&gt;?&lt;/code&gt;) shows the structure of the
resulting JSON.
It is then easy to end up with the following function; notice the use
of the
&lt;a href=&quot;https://www.gnu.org/software/guile/manual/guile.html#index-quasiquote&quot;&gt;quasiquote&lt;/a&gt;
&lt;code&gt;`&lt;/code&gt; and the
&lt;a href=&quot;https://www.gnu.org/software/guile/manual/guile.html#index-quasiquote&quot;&gt;unquote&lt;/a&gt;
&lt;code&gt;,&lt;/code&gt;:
&lt;/p&gt;
&lt;pre&gt;
(define (get-derivations data-service commit system)
  ;; Query DATA-SERVICE for the list of derivations for the given COMMIT
  ;; and SYSTEM.
  (map
    (lambda (p)
      (assoc-ref p &amp;quot;derivation&amp;quot;))
    (vector-&amp;gt;list
      (assoc-ref
        (guix-data-service-request data-service
          (string-append &amp;quot;revision/&amp;quot; commit &amp;quot;/package-derivations.json&amp;quot;)
          `((system . ,system) (target . &amp;quot;none&amp;quot;) (all_results . &amp;quot;on&amp;quot;)))
        &amp;quot;derivations&amp;quot;))))

(define derivations
  (get-derivations my-data-service commit &amp;quot;x86_64-linux&amp;quot;))
(length derivations)
$6 = 29531
(car derivations)
$7 = &amp;quot;/gnu/store/000lxmn2d17bv2v6znvf6z5vi7ndy8q4-r-janeaustenr-1.0.0.drv&amp;quot;
&lt;/pre&gt;
&lt;p&gt;
So the derivations are simply strings pointing to files in the store
(of the data service, so far they are not yet available on the TBFG
machine).
&lt;/p&gt;


&lt;h2&gt;Filtering out team packages&lt;/h2&gt;

&lt;p&gt;
29000 derivations are more than our poor tiny machine can handle; the next
step is to filter out those that correspond to packages in the science team.
The team is responsible for certain package modules (or equivalently, for
&lt;code&gt;.scm&lt;/code&gt; files in the &lt;code&gt;gnu/packages/&lt;/code&gt; directory);
which ones can be seen in the file &lt;code&gt;CODEOWNERS&lt;/code&gt; checked into the
Guix git repository, itself derived from &lt;code&gt;etc/teams.scm&lt;/code&gt;.
As it does not change very often, for simplicity we may determine the list
of modules by hand, which may require us to resolve regular expressions
(here: &lt;code&gt;fortran(-.+|)&lt;/code&gt;) into lists of actually present modules;
here we end up with the following:
&lt;/p&gt;
&lt;pre&gt;
(define my-locations
  '(&amp;quot;algebra&amp;quot; &amp;quot;astronomy&amp;quot; &amp;quot;chemistry&amp;quot; &amp;quot;fortran-check&amp;quot; &amp;quot;fortran-xyz&amp;quot;
  &amp;quot;geo&amp;quot; &amp;quot;graph&amp;quot; &amp;quot;lean&amp;quot; &amp;quot;maths&amp;quot; &amp;quot;medical&amp;quot; &amp;quot;sagemath&amp;quot; &amp;quot;statistics&amp;quot;))
&lt;/pre&gt;
&lt;p&gt;
When starting the project, I had hoped to extract the interesting packages
directly from the (strings representing) derivations, given a fixed list
of package names.
But it is a truth universally acknowledged that a programmer never has the
singularly good fortune of such simplicity, whatever their feelings or views
when first entering the neighbourhood of a problem.
Here two reasons speak against it: First of all, the packages of a team may
change over time as packages are added, removed or moved to a different
module. More immediately, though, only the &lt;i&gt;combination&lt;/i&gt; of package
name and version can be easily recovered from the derivation by removing a
fixed prefix, the hash and a fixed suffix, using the following function:
&lt;/p&gt;
&lt;pre&gt;
(define (derivation-&amp;gt;name+version derivation)
  ;; Given a DERIVATION (by a string of the form &amp;quot;/gnu/store/...&amp;quot;),
  ;; return the part of it that encodes the name and the version
  ;; of the underlying package.
  (string-drop (basename derivation &amp;quot;.drv&amp;quot;) 33))
&lt;/pre&gt;
&lt;p&gt;
Thus
&lt;code&gt;/gnu/store/000lxmn2d17bv2v6znvf6z5vi7ndy8q4-r-janeaustenr-1.0.0.drv&lt;/code&gt;
becomes
&lt;code&gt;r-janeaustenr-1.0.0&lt;/code&gt;, which is the concatenation of the package
name (which is mostly fixed over different revisions) and the package
version (which usually increases over time) with a hyphen in-between.
More often than not it is possible to guess the two components: Here they
are &lt;code&gt;r-janeaustenr&lt;/code&gt; and &lt;code&gt;1.0.0&lt;/code&gt;.
Package names often contain hyphens (like here, they serve to separate
a language part, &lt;code&gt;r&lt;/code&gt;, and the upstream name,
&lt;code&gt;janeaustenr&lt;/code&gt;, see the Guix
&lt;a href=&quot;https://guix.gnu.org/manual/devel/en/html_node/Package-Naming.html&quot;&gt;naming
conventions&lt;/a&gt;); this could be handled by splitting at the last hyphen,
but versions may also contain hyphens. Both can contain alphabetic and
numeric components. Thus it would be quite possible that the above
derivation is for the flourishingly named version
&lt;code&gt;janeaustenr-1.0.0&lt;/code&gt; of the &lt;code&gt;r&lt;/code&gt; package.
&lt;/p&gt;
&lt;p&gt;
So we need more code to extract the desired information. Luckily the data
service knows about the packages in a revision, with their names and their
versions in different fields; and also about their locations, that is,
the files in which they are defined.
&lt;/p&gt;
&lt;pre&gt;
(define (get-packages data-service commit)
  ;; Query DATA-SERVICE for the list of packages for the given COMMIT.
  (vector-&amp;gt;list
    (assoc-ref
      (guix-data-service-request data-service
        (string-append &amp;quot;revision/&amp;quot; commit &amp;quot;/packages.json&amp;quot;)
        `((field . &amp;quot;version&amp;quot;) (field . &amp;quot;location&amp;quot;) (all_results . &amp;quot;on&amp;quot;)))
      &amp;quot;packages&amp;quot;)))

(define packages (get-packages my-data-service commit))
(car packages)
$8 = ((&amp;quot;location&amp;quot; (&amp;quot;column&amp;quot; . 2) (&amp;quot;line&amp;quot; . 8273) (&amp;quot;file&amp;quot; . &amp;quot;gnu/packages/games.scm&amp;quot;)) (&amp;quot;version&amp;quot; . &amp;quot;0.27.1&amp;quot;) (&amp;quot;name&amp;quot; . &amp;quot;0ad&amp;quot;))
&lt;/pre&gt;
&lt;p&gt;
It is now enough to compare the file name with our list of locations to
extract the packages we are interested in.
&lt;/p&gt;
&lt;pre&gt;
(define (location-package? package locations)
  ;; Check whether the PACKAGE comes from the list of LOCATIONS.
  (let* ((file (assoc-ref (assoc-ref package &amp;quot;location&amp;quot;) &amp;quot;file&amp;quot;))
         (module (basename file &amp;quot;.scm&amp;quot;)))
        (member module locations)))

(use-modules (srfi srfi-26))
(define (packages-name-version data-service commit locations)
  ;; Query DATA-SERVICE for a list of packages for the given COMMIT
  ;; that come from the list of LOCATIONS. Return a list of two-element
  ;; lists with the names and versions of these packages.
  (map
    (lambda (package)
      (list (assoc-ref package &amp;quot;name&amp;quot;) (assoc-ref package &amp;quot;version&amp;quot;)))
    (filter
      (cut location-package? &amp;lt;&amp;gt; locations)
      (get-packages data-service commit))))

(define team-name-versions
  (packages-name-version my-data-service commit my-locations))
(car team-name-versions)
$9 = (&amp;quot;4ti2&amp;quot; &amp;quot;1.6.12&amp;quot;)
&lt;/pre&gt;
&lt;p&gt;
Finally we &lt;i&gt;just&lt;/i&gt; need to compare the extracted team package names
and their versions with the derivations. Unfortunately this can be
quite costly; the following code presents a somewhat
optimised solution with memory usage linear in the result, but a quadratic
number of comparisons (thanks to Liliana Prikler for suggesting the
use of &lt;code&gt;filter-map&lt;/code&gt; to me):
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1))
(define (special-cartesian-product X Y)
  ;; Let X and Y be lists of two element lists of the form (x z) and (y z),
  ;; respectively. Return a list of all the (x y) such that there is an
  ;; element z with (x z) in X and (y z) in Y.
  (fold cons '()
        (filter-map (lambda (xz)
                      (let ((yz (find (lambda (yz)
                                        (equal? (cadr xz) (cadr yz)))
                                      Y)))
                        (if yz
                            (list (car xz) (car yz))
                            #f)))
                    X)))

(define (team-derivations data-service commit system locations)
  ;; Query DATA-SERVICE for the list of derivations for the given COMMIT
  ;; and SYSTEM, filtered by the LOCATIONS of the packages.
  ;; To memorise the computed information, return a list of two element
  ;; lists, each containing a derivation and the corresponding name.
  (let* ((derivations (get-derivations data-service commit system))
         (X (map
              (lambda (d)
                (list d (derivation-&amp;gt;name+version d)))
            derivations))
         (name-versions
           (packages-name-version data-service commit locations))
         (Y (map
              (lambda (nv)
                (list (car nv) (string-append (car nv) &amp;quot;-&amp;quot; (cadr nv))))
            name-versions)))
    (special-cartesian-product X Y)))

(define (sort-derivation-names derivation-names)
  ;; Just for the fun of it, sort DERIVATION-NAMES, a list of two element
  ;; lists containing derivations and their names, by names.
  (sort derivation-names
        (lambda (x y)
          (string&amp;lt;? (cadr x) (cadr y)))))

(define good-derivation-names
  (sort-derivation-names
    (team-derivations my-data-service commit &amp;quot;x86_64-linux&amp;quot; my-locations)))
(define derivation-name
        (find (lambda (dn)
                (equal? (cadr dn) &amp;quot;lrslib&amp;quot;))
              good-derivation-names))
derivation-name
$10 = (&amp;quot;/gnu/store/3pxq1g2java4f8nwfq7n98qjvhkr1b34-lrslib-7.2.drv&amp;quot; &amp;quot;lrslib&amp;quot;)
&lt;/pre&gt;
&lt;p&gt;
Strictly speaking, the function
&lt;code&gt;team-derivations&lt;/code&gt; is not correct; if there were
&lt;i&gt;simultaneously&lt;/i&gt; a derivation for the package
&lt;code&gt;r-jauneaustenr&lt;/code&gt; at version &lt;code&gt;1.0.0&lt;/code&gt;
&lt;i&gt;and&lt;/i&gt; a derivation for the package
&lt;code&gt;r&lt;/code&gt; at version &lt;code&gt;jauneaustenr-1.0.0&lt;/code&gt;,
then either both or none of them would match, while it is possible that
only one of the packages is covered by the science team, a situation
not yet encountered; at worst, we would capture one too many derivations.

For testing purposes during the
development of the TBFG, we additionally check whether the name equals
&lt;code&gt;lrslib&lt;/code&gt;; in this way only one derivation is returned (while
at the time of writing there are more than 700 packages covered by the
science team).
Moreover the package in question is a self-contained C program (without
any inputs), which compiles rather quickly.
&lt;/p&gt;


&lt;h2&gt;Submitting builds&lt;/h2&gt;

&lt;p&gt;
Now that we have a list of derivations, we would like to submit them from
our Guile script to the build coordinator. This is not very different from
the approach seen
&lt;a href=&quot;tbfg-1.html&quot;&gt;last time&lt;/a&gt;
for submitting from the command line.
Again it is recommended to open a browser window on the
&lt;code&gt;/activity&lt;/code&gt; page of the BFFE to see the build coordinator and
the agent in action.
&lt;/p&gt;
&lt;pre&gt;
(use-modules (guix-build-coordinator client-communication))

(define my-build-coordinator &amp;quot;http://localhost:8746&amp;quot;)
(define ignore-if-build-for-derivation-exists? #f)
(define ignore-if-build-for-outputs-exists? #f)
(define ensure-all-related-derivation-outputs-have-builds? #f)
(define priority 0)

(define (submit-build build-coordinator data-service derivation tags)
  ;; Given a DERIVATION (as a string), submit it to BUILD-COORDINATOR
  ;; together with TAGS;
  ;; DATA-SERVICE is passed through and used by the build coordinator to
  ;; obtain the derivation file and further references contained in
  ;; DERIVATION.
  (send-submit-build-request
    build-coordinator derivation (list data-service) 0 priority
    ignore-if-build-for-derivation-exists?
    ignore-if-build-for-outputs-exists?
    ensure-all-related-derivation-outputs-have-builds?
    tags))

(submit-build my-build-coordinator my-data-service (car derivation-name) '())
$11 = ((&amp;quot;build-submitted&amp;quot; . &amp;quot;8f8f1cad-fe9c-462c-bc59-3d1f87abf942&amp;quot;))
$12 = #&amp;lt;&amp;lt;response&amp;gt; …
&lt;/pre&gt;
&lt;p&gt;
The global variables, which we pass on to the &lt;code&gt;submit-build&lt;/code&gt;
function, determine the behaviour of the build coordinator.
If &lt;code&gt;ignore-if-build-for-derivation-exists?&lt;/code&gt; is true,
then the build will not be carried out a second time if it was already tried
(successfully or not) by the build coordinator before.
In production, it will thus be preferable to set it to &lt;code&gt;#t&lt;/code&gt;;
while still experimenting, we are likely to submit the same derivation
several times. Setting the value to &lt;code&gt;#f&lt;/code&gt; would also make sense
to check that rebuilding the same package works.
The variable &lt;code&gt;ignore-if-build-for-outputs-exists?&lt;/code&gt; goes a bit
further; if set to &lt;code&gt;#t&lt;/code&gt;, then the build will not be carried out
if a different derivation with the same output was already tried (a very
technical distinction; I would recommend to leave it at &lt;code&gt;#f&lt;/code&gt;).
If &lt;code&gt;ensure-all-related-derivation-outputs-have-builds?&lt;/code&gt; is
&lt;code&gt;#t&lt;/code&gt;,
then the build coordinator will recursively submit builds for all the
derivations required as inputs to a given derivation. While this sounds
reasonable at first, it can go very far, since the coordinator does not
look at the store, but at the builds it has handled itself and recorded
in its database. This means that the first build submission, when the
database is still empty, will entail a complete bootstrap of the Guix
distribution. So I would recommend to leave it also at &lt;code&gt;#f&lt;/code&gt;.
Then the build works as follows: The coordinator sends the derivation
to an agent. The agent tries to download all required inputs from a
substitute server and if successful, will build only the derivation it is
asked to build. Otherwise, it reports back to the coordinator that it has
encountered a set-up failure, together with a list of missing inputs.
This triggers a hook in the coordinator, and the default hook is to add
the missing inputs to the list of outstanding builds, as well as the
failed build itself to try it again once the inputs are available.
In this way, even if
&lt;code&gt;ensure-all-related-derivation-outputs-have-builds?&lt;/code&gt; is
&lt;code&gt;#f&lt;/code&gt;, all really missing inputs will be built recursively,
until the build succeeds or a real failure in one of its inputs is
encountered.
&lt;/p&gt;
&lt;p&gt;
The submission immediately returns
&lt;a href=&quot;https://www.gnu.org/software/guile/manual/guile.html#Multiple-Values&quot;&gt;two
values&lt;/a&gt;,
without waiting for the package build to finish. The first return value
can be used to link the submitted derivation to the shown UUID of the
build, which is a key in the build coordinator database. The second
return value is the HTTP response, which we will ignore from now on.
&lt;/p&gt;
&lt;p&gt;
Tags can be added in a parenthesis rich format; the parameter is a list
of tags, where each tag is a two element list (not a pair!), in which both
elements are pairs. The first one pairs the keyword &lt;code&gt;key&lt;/code&gt;
to a value, the second one pairs the keyword &lt;code&gt;value&lt;/code&gt; to a
value (the values are used to construct the URL and can be strings or
numbers). So the following would work:
&lt;/p&gt;
&lt;pre&gt;
(define tags `(((key . &amp;quot;commit&amp;quot;)(value . ,commit))
               ((key . &amp;quot;name&amp;quot;)(value . ,(cadr derivation-name)))
               ((key . &amp;quot;build&amp;quot;)(value . 2))))
(submit-build my-build-coordinator my-data-service (car derivation-name) tags)
$13 ((&amp;quot;build-submitted&amp;quot; . &amp;quot;82a56cac-1e93-4b4a-926f-d8762f919219&amp;quot;))
$14 = #&amp;lt;&amp;lt;response&amp;gt; …
&lt;/pre&gt;
&lt;p&gt;
The tags are shown in the activity window and are also recorded in the
build coordinator database; as shown here, they can encode arbitrary
additional information of a build, such as the commit it comes from, the
package name or the submission count for a given derivation.
&lt;/p&gt;


&lt;h2&gt;Code&lt;/h2&gt;

&lt;p&gt;
For ease of use, the code developed in this post is made available, under
GPLv3 or later, in a dedicated
&lt;a href=&quot;https://codeberg.org/enge/tbfg&quot;&gt;git repository&lt;/a&gt;
on
&lt;a href=&quot;https://codeberg.org/&quot;&gt;Codeberg&lt;/a&gt;.
More precisely, it is collected in the file
&lt;a href=&quot;https://codeberg.org/enge/tbfg/src/commit/51eb5c6d45c66d15b7c14340ec3af0732b5b66fd/tbfg.scm&quot;&gt;tbfg.scm&lt;/a&gt;
at commit 51eb5c6d45c66d15b7c14340ec3af0732b5b66fd.
&lt;/p&gt;


&lt;h2&gt;Outlook&lt;/h2&gt;

&lt;p&gt;
We have queried the data service and used the resulting information on
packages and derivations to submit build jobs to the build coordinator.
But so far we have no programmatical access to the build results; we only
saw the builds flicker by on the BFFE website.
It would be nice to record success or failure, and more generally to keep
track of the builds; this will be our next step.
Since we do not want to operate a substitute server, but rather follow the
state of the packages under the responsibility of the science team, unlike
the official build farms we are not necessarily interested in obtaining the
build results. These are sent from the build agents to the build coordinator;
on the bordeaux build farm the
&lt;a href=&quot;https://codeberg.org/guix/nar-herder&quot;&gt;nar herder&lt;/a&gt;
shovels them to a separate substitute server.
For us everything is on the same machine, which will thus contain
successfully built packages in its store (at least until the next
&lt;code&gt;guix gc&lt;/code&gt; run). If desired, these could be made available using
&lt;a href=&quot;https://guix.gnu.org/manual/devel/en/html_node/Invoking-guix-publish.html&quot;&gt;&lt;code&gt;guix
publish&lt;/code&gt;&lt;/a&gt;.
&lt;/p&gt;

&lt;/div&gt;</content></entry><entry><title>Tiny Build Farm for Guix, part 1</title><id>https://enge.math.u-bordeaux.fr/blog/tbfg-1.html</id><author><name>Andreas Enge</name><email>andreas.enge@inria.fr</email></author><updated>2025-08-27T00:00:00Z</updated><link href="https://enge.math.u-bordeaux.fr/blog/tbfg-1.html" rel="alternate" /><content type="html">&lt;div&gt;

&lt;h1&gt;Setting up scores of services&lt;/h1&gt;

&lt;p&gt;
One of the oft-cited reasons people give for not switching to
&lt;a href=&quot;https://guix.gnu.org/&quot;&gt;Guix&lt;/a&gt; is that their favourite software
is too outdated, and a look at
&lt;a href=&quot;https://repology.org/&quot;&gt;Repology&lt;/a&gt; shows that they are not wrong.
Now the number of active committers in the Guix project is amazingly small,
and even counting all contributors I am impressed by what these few people
actually achieve. Nevertheless I wondered how I could improve the situation
at least a little bit for packages I am interested in, that is, for the
science team; and the first step is to get an account of what actually
builds and what does not.
So I decided to set up my own little build farm, limited to the packages
in the scope of the science team, using the same technology that powers the
&lt;a href=&quot;https://bordeaux.guix.gnu.org/&quot;&gt;bordeaux&lt;/a&gt; build farm.
I call it the &lt;i&gt;Tiny Build Farm for Guix&lt;/i&gt;, or &lt;i&gt;TBFG&lt;/i&gt; for short,
and this post is the first one in (hopefully) a series of blog posts
about the topic; at the time of starting this series, the TBFG does not
actually exist yet, so wish me luck.
&lt;/p&gt;


&lt;h2&gt;Motivation&lt;/h2&gt;

&lt;p&gt;
Before trying to solve a technical problem, let me digress a little bit:
Is there actually a problem? And if yes, why?
As has become my conviction over the years, the really difficult and major
problems in a project such as Guix are actually social and not technical.
They are rooted in the structure of Guix as a loosely coupled group of
volunteers who work on a common goal, mostly in their spare time; but when
I speak about a common goal, every volunteer has in fact their own goals,
and arriving at a coherent whole is partially due to the internal
structuring of the social project, and partially an emerging property
of a complex system.
Concretely, it happens often that contributors propose a package for a
software project they like; if it concerns free software and follows our
&lt;a href=&quot;https://guix.gnu.org/manual/devel/en/html_node/Packaging-Guidelines.html&quot;&gt;packaging
guidelines&lt;/a&gt;, it usually ends up being committed to the software
distribution. The original contributor may leave, the package may bitrot
and stop being buildable due to changes made in other parts of the
distribution. Sometimes people submit a bug report, sometimes committers
without interest in the actual software provide a fix, but maybe nobody
uses the software anymore, and it happens that over several years nobody
notices it is broken. This is problematic since even broken packages use
resources on the build farms. And when introducing, say, an update to a
library package, it becomes difficult to say whether the failure of a
dependency is a new phenomenon due to the change or whether it was already
present. So there are good reasons to strive for a distribution that is
100% buildable at all times.
And while I alone certainly cannot reach this for the currently more than
&lt;a href=&quot;https://repology.org/repository/gnuguix&quot;&gt;28000 packages&lt;/a&gt; in
Guix, doing it only for the science team, or maybe only the
&lt;code&gt;algebra&lt;/code&gt; and &lt;code&gt;maths&lt;/code&gt; modules, in which I am
particularly interested, appears to be a reachable goal.
&lt;/p&gt;
&lt;p&gt;
But is a tiny build farm really needed? The honest answer is “no”, since
the information is already out there in the big build farms.
For historical reasons, Guix has two of them.
One is
&lt;a href=&quot;https://ci.guix.gnu.org/&quot;&gt;CI&lt;/a&gt;, also called &lt;i&gt;berlin&lt;/i&gt; for
the location of most of its build machines; it relies on
&lt;a href=&quot;https://codeberg.org/guix/cuirass&quot;&gt;Cuirass&lt;/a&gt;,
a continuous integration system written purposefully for Guix.
It shows the
&lt;a href=&quot;https://ci.guix.gnu.org/jobset/master&quot;&gt;state of the master
branch&lt;/a&gt; and provides a dashboard from which the desired information
could certainly be extracted automatically.
On the other hand there is the
&lt;a href=&quot;https://bordeaux.guix.gnu.org/&quot;&gt;bordeaux&lt;/a&gt; build farm,
named after the location of its head node; it runs a suite of continuous
integration tools written purposefully for Guix by
&lt;a href=&quot;https://www.cbaines.net/&quot;&gt;Christopher Baines&lt;/a&gt;.
One of its parts is the
&lt;a href=&quot;https://data.guix.gnu.org/&quot;&gt;Guix Data Service&lt;/a&gt;,
and its REST API with a JSON frontend makes it again possible to extract
the information I am interested in – a system that knows about &lt;i&gt;all&lt;/i&gt;
packages in Guix by definition knows about the packages in the realm
of the science team.
&lt;/p&gt;
&lt;p&gt;
So setting up my TBFG is mainly an educational project – I would like to
learn how our technology works. But the TBFG can also be used to obtain
information about a collection of packages that is not part of Guix proper,
or to look at the impact of local changes.
Many people and projects have successfully set up Cuirass to manage their
local package collection; this is, for instance, the case for the
&lt;a href=&quot;https://codeberg.org/guix-science/guix-science&quot;&gt;Guix Science&lt;/a&gt;
and
&lt;a href=&quot;https://hpc.guix.info/&quot;&gt;Guix HPC&lt;/a&gt; projects through the
&lt;a href=&quot;https://guix.bordeaux.inria.fr/&quot;&gt;build server&lt;/a&gt;
at INRIA Bordeaux.
The software behind the bordeaux build farm is more complex and consists
of several interconnected
&lt;a href=&quot;https://www.gnu.org/software/shepherd/&quot;&gt;Shepherd&lt;/a&gt; services,
so that it may in fact be less suited for a personal project.
However, not least because I host part of that build farm at home,
I am more interested in understanding this software stack, which is also
less documented. So I am going to build the TBFG on top of this technology.
Before diving in, I take the opportunity to thank Christopher Baines for
his precious help during a Guix/Nix hackers' meeting; without him, I would
not have been able to launch myself into this endeavour.
&lt;/p&gt;


&lt;h2&gt;Build coordinator and agent&lt;/h2&gt;

&lt;p&gt;
As a first step, we need to install the Guix Build Coordinator and one or
more build agents. For a really tiny TBFG, I will keep everything on only
one machine set aside for the purpose; it is called &lt;i&gt;bedok&lt;/i&gt; and is
one of the &lt;a href=&quot;https://foundation.guix.info/assets/index.html&quot;&gt;Lenovo
Thinkpad X1 Gen9&lt;/a&gt; with a four core
11th Gen Intel Core i7-1165G7 processor running at 2.80GHz graciously
donated by &lt;a href=&quot;https://www.tweag.io/&quot;&gt;Tweag&lt;/a&gt;.
For the build coordinator, this is trivial; simply add the two lines
&lt;/p&gt;
&lt;pre&gt;
(service guix-build-coordinator-service-type
  (guix-build-coordinator-configuration))
&lt;/pre&gt;
&lt;p&gt;
to the services configuration of the Guix system declaration and reconfigure
the machine. For a start, the default configuration options explained in
more detail in the
&lt;a href=&quot;https://guix.gnu.org/manual/devel/en/html_node/Guix-Services.html#index-guix_002dbuild_002dcoordinator_002dservice_002dtype&quot;&gt;manual&lt;/a&gt;
are appropriate (things will become more complicated if anything is to be
done with the build results, which would require to set the
&lt;code&gt;hooks&lt;/code&gt; field of the
&lt;a href=&quot;https://guix.gnu.org/manual/devel/en/html_node/Guix-Services.html#index-guix_002dbuild_002dcoordinator_002dservice_002dtype&quot;&gt;configuration
record&lt;/a&gt;).
See also the documentation in the
&lt;a href=&quot;https://codeberg.org/guix/build-coordinator&quot;&gt;git repository&lt;/a&gt;
of the project.
&lt;/p&gt;
&lt;p&gt;
Next we need the &lt;code&gt;guix-build-coordinator&lt;/code&gt; package, which provides
a command line interface to the coordinator. It could be installed into an
arbitrary profile since the security model of the build coordinator is very
basic: It is assumed that the coordinator runs on a server of its own,
and anybody with access to the machine has control over the service.
&lt;/p&gt;
&lt;p&gt;
⚠ However installing the package into a user profile currently has a big
drawback: It propagates the guix package, and the corresponding guix takes
precedence over the one in
&lt;code&gt;$HOME/.config/guix/current/bin&lt;/code&gt;.
The latter one is updated by &lt;code&gt;guix pull&lt;/code&gt;, but the former one
is not; so without a special approach, this prevents the user from
updating Guix.
Instead one can start a Guix shell as follows:
&lt;/p&gt;
&lt;pre&gt;
guix shell guix-build-coordinator
&lt;/pre&gt;
&lt;p&gt;
Running
&lt;/p&gt;
&lt;pre&gt;
$ guix-build-coordinator agent list
&lt;/pre&gt;
&lt;p&gt;
shows nothing, so the next step is to set up a build agent, which requires
some preparations on the machine where the build server is running.
Executing
&lt;/p&gt;
&lt;pre&gt;
$ guix-build-coordinator agent new
e092df28-3418-4f94-b7f0-a214b03291ee
&lt;/pre&gt;
&lt;p&gt;
creates and prints a new random version 4
&lt;a href=&quot;https://en.wikipedia.org/wiki/Universally_unique_identifier&quot;&gt;UUID&lt;/a&gt;
and stores it in the agents table in its internal database.
The next step is to set up authentication; for a small number of build
agents, a
&lt;a href=&quot;https://guix.gnu.org/manual/devel/en/html_node/Guix-Services.html#index-guix_002dbuild_002dcoordinator_002dagent_002dpassword_002dfile_002dauth&quot;&gt;password
file&lt;/a&gt; is a suitable approach, otherwise an
&lt;a href=&quot;https://guix.gnu.org/manual/devel/en/html_node/Guix-Services.html#index-guix_002dbuild_002dcoordinator_002dagent_002ddynamic_002dauth&quot;&gt;authentication
token&lt;/a&gt;,
which may be shared among several agents, can be used.
So we create a password for the agent by running
&lt;/p&gt;
&lt;pre&gt;
$ guix-build-coordinator agent e092df28-3418-4f94-b7f0-a214b03291ee password new
new password: IUwGrsEklfu0kVf_QquCOdzfa6-P52qPlcwBd5YB
&lt;/pre&gt;
&lt;p&gt;
which again prints the password and saves it into the coordinator
database.
&lt;/p&gt;
&lt;p&gt;
To simplify things for the human brain, we can give the agent a name; since
there is not yet a command line argument for this, we take it as a pretense
to look more closely at the SQLite database structure. So install the
&lt;code&gt;sqlite&lt;/code&gt; package into the &lt;code&gt;root&lt;/code&gt; profile, launch
&lt;code&gt;sqlite3&lt;/code&gt; on the coordinator database, and run the following
commands:
&lt;/p&gt;
&lt;pre&gt;
# sqlite3 /var/lib/guix-build-coordinator/guix_build_coordinator.db
sqlite&amp;gt; .tables
agent_passwords
agent_tags
agents
…
builds
…
tags
…
sqlite&amp;gt; select * from agent_passwords;
1|e092df28-3418-4f94-b7f0-a214b03291ee|IUwGrsEklfu0kVf_QquCOdzfa6-P52qPlcwBd5YB|2025-08-13 00:00:00
sqlite&amp;gt; .schema agents
CREATE TABLE agents (
       id TEXT PRIMARY KEY,
       description TEXT
, name TEXT, active NOT NULL DEFAULT 1);
&lt;/pre&gt;
&lt;p&gt;
There are quite a few tables, but for now we are mainly interested in
those related to the agents. We can recover the password in case we forgot
to write it down (alternatively we could run
&lt;code&gt;guix-build-coordinator agent e092df28-3418-4f94-b7f0-a214b03291ee password&lt;/code&gt;),
and we see that agents can have a name and a description, and be active
or not.
So still in SQLite run
&lt;/p&gt;
&lt;pre&gt;
sqlite&amp;gt; update agents set name='bedok', description='TBFG agent' where id='e092df28-3418-4f94-b7f0-a214b03291ee';
sqlite&amp;gt; select * from agents;
e092df28-3418-4f94-b7f0-a214b03291ee|TBFG agent|bedok|1
&lt;/pre&gt;
&lt;p&gt;
Alternatively, to check that everything has gone well, run (not necessarily
as root anymore):
&lt;/p&gt;
&lt;pre&gt;
$ guix-build-coordinator agent list
e092df28-3418-4f94-b7f0-a214b03291ee: bedok
  description:
  TBFG agent  active?: true
  0 allocated builds:
  requested systems:
  tags:
&lt;/pre&gt;
&lt;p&gt;
Now it is finally time to really set up the build agent! On the machine
where it is supposed to run (in our case, this is &lt;i&gt;bedok&lt;/i&gt; again, but
it could be an arbitrary machine somewhere on the Internet, since all
communication would take place over https), we need to handle the password.
As usual in Guix, secrets are not saved in configuration that is publicly
visible in the store, but rather as separate state; so as &lt;code&gt;root&lt;/code&gt;
create a file &lt;code&gt;/etc/guix-build-coordinator/agent-bedok-passwd&lt;/code&gt;
containing the password
&lt;code&gt;IUwGrsEklfu0kVf_QquCOdzfa6-P52qPlcwBd5YB&lt;/code&gt;
created above by the coordinator.
Then add the following snippet to the server part of the operating system
configuration:
&lt;/p&gt;
&lt;pre&gt;
(service guix-build-coordinator-agent-service-type
  (guix-build-coordinator-agent-configuration
    (authentication
      (guix-build-coordinator-agent-password-file-auth
        (uuid &amp;quot;e092df28-3418-4f94-b7f0-a214b03291ee&amp;quot;)
        (password-file
          &amp;quot;/etc/guix-build-coordinator/agent-bedok-passwd&amp;quot;)))
    (derivation-substitute-urls
      '(&amp;quot;https://data.guix.gnu.org&amp;quot;))
    (non-derivation-substitute-urls
      '(&amp;quot;https://bordeaux.guix.gnu.org&amp;quot;))
    (systems '(&amp;quot;x86_64-linux&amp;quot; &amp;quot;i686-linux&amp;quot;))
    (max-parallel-builds 4)
    (max-parallel-uploads 2)
    (max-1min-load-average 6)))
&lt;/pre&gt;
&lt;p&gt;
and reconfigure the machine.
Concerning the different parameters, see the
&lt;a href=&quot;https://guix.gnu.org/manual/devel/en/html_node/Guix-Services.html#index-guix_002dbuild_002dcoordinator_002dagent_002dservice_002dtype&quot;&gt;documentation&lt;/a&gt;.
For authentication, we need to provide our &lt;code&gt;uuid&lt;/code&gt; and the
location of the &lt;code&gt;password-file&lt;/code&gt;.
Since the agent runs on the same machine as the coordinator, I kept the
&lt;code&gt;coordinator&lt;/code&gt; field at its default
&lt;code&gt;&amp;quot;http://localhost:8745&amp;quot;&lt;/code&gt;; otherwise &lt;code&gt;localhost&lt;/code&gt; needs
to be replaced by the host name of the coordinator, and the protocol should
be set to &lt;code&gt;https&lt;/code&gt;.
The &lt;code&gt;derivation-substitute-urls&lt;/code&gt; field has no default; we will
discuss it in the next section.
The &lt;code&gt;non-derivation-substitute-urls&lt;/code&gt; field also needs to be set
to avoid compiling each and every package input locally; here I chose to
only use the bordeaux build farm, but one could add
&lt;code&gt;&amp;quot;https://ci.guix.gnu.org&amp;quot;&lt;/code&gt; to also fetch packages from berlin.
If the &lt;code&gt;system&lt;/code&gt; field is not set, then only packages for the
system on which the agent is running (most likely &lt;code&gt;x86_64-linux&lt;/code&gt;)
are handled; here it is useful to add &lt;code&gt;i686-linux&lt;/code&gt;.
Or on an ARM machine, the combo
&lt;code&gt;'(&amp;quot;aarch64-linux&amp;quot; &amp;quot;armhf-linux&amp;quot;)&lt;/code&gt; makes sense.
The numerical parameters can be left at their defaults; here I am trying
to limit the load on my four core processor.
&lt;/p&gt;
&lt;p&gt;
If all goes well, we should see lines in the agent logfile
&lt;code&gt;/var/log/guix-build-coordinator/agent.log&lt;/code&gt;
looking like
&lt;/p&gt;
&lt;pre&gt;
2025-08-13 00:00:00 (INFO ): starting agent e092df28-3418-4f94-b7f0-a214b03291ee
2025-08-13 00:00:00 (INFO ): connecting to coordinator http://localhost:8745
2025-08-13 00:00:00 (INFO ): running 0 threads, currently allocated 0 builds
2025-08-13 00:00:00 (INFO ): starting 0 new builds
&lt;/pre&gt;
&lt;p&gt;
and running
&lt;/p&gt;
&lt;pre&gt;
$ guix-build-coordinator agent list
&lt;/pre&gt;
&lt;p&gt;
on the coordinator machine again should now print two entries
beneath &lt;code&gt;requested systems&lt;/code&gt;.
&lt;/p&gt;
&lt;p&gt;
As a last step, get back to the
&lt;code&gt;/etc/guix-build-coordinator/agent-bedok-passwd&lt;/code&gt; file,
which is probably world-readable. Starting the build agent has created
a user &lt;code&gt;guix-build-coordinator-agent&lt;/code&gt;, and I would recommend
to have the file be owned by that user, and remove permissions from all
other users.
&lt;/p&gt;


&lt;h2&gt;Data service&lt;/h2&gt;

&lt;p&gt;
This is the elephant in the room, almost literally. When starting my project
of the TBFG, I had intended to set up the full stack of software needed
to run the bordeaux build farm, and the
&lt;a href=&quot;https://codeberg.org/guix/data-service/&quot;&gt;Guix Data Service&lt;/a&gt;
is a very important part of it. But the code base is massive (more than
30000 lines of Scheme code at the time of writing), and also the required
ressources are massive: The service spends its time polling the git
repository of Guix, compiling the sources and computing all the derivations
for all the packages, which are then stored in a database. Rinse and repeat
for the next commit. (In reality, the data service does even more, in
particular it also queries build servers and stores information about
builds; but the above functionality is everything we need for the TBFG.)
As can be seen, not even the
&lt;a href=&quot;https://data.guix.gnu.org/repository/1/branch/master&quot;&gt;official
data service&lt;/a&gt;, running continuously on a powerful server, manages to
do that for all commits: Some of them are marked as green, others, the
grey ones, are skipped, in particular at times of high commit activity.
&lt;/p&gt;
&lt;p&gt;
So I have decided to rely on the central Guix data service by configuring
the corresponding &lt;code&gt;derivation-substitute-urls&lt;/code&gt; field of the
build coordinator as &lt;code&gt;'(https://data.guix.gnu.org)&lt;/code&gt;.
All information obtained by clicking through the data service website
can also be obtained as JSON through its REST API, which we will use in our
scripts to determine the derivations of science team packages to be built.
&lt;/p&gt;
&lt;p&gt;
For this to work, we also need the Guix daemon to accept the signing key
of the data service; so the &lt;code&gt;services&lt;/code&gt; field of the operating
system declaration should look like this:
&lt;/p&gt;
&lt;pre&gt;
(services
  (append
    (modify-services %base-services
      (guix-service-type config =&amp;gt;
        (guix-configuration
          (substitute-urls '(&amp;quot;https://bordeaux.guix.gnu.org&amp;quot;))
          (authorized-keys
            (list
              (local-file &amp;quot;keys/guix/bordeaux.guix.gnu.org-export.pub&amp;quot;)
              (local-file &amp;quot;keys/guix/data.guix.gnu.org.pub&amp;quot;)))
          (max-silent-time (* 24 3600))
          (timeout (* 48 3600)))))
    (list
      (service guix-build-coordinator-service-type
        (guix-build-coordinator-configuration))
      …)))
&lt;/pre&gt;
&lt;p&gt;
where the key files are copy-pasted into the local
&lt;code&gt;keys/guix&lt;/code&gt; subdirectory from the
&lt;a href=&quot;https://codeberg.org/guix/maintenance/src/commit/master/hydra/keys/guix&quot;&gt;corresponding
place&lt;/a&gt;
in the
&lt;a href=&quot;https://codeberg.org/guix/maintenance&quot;&gt;guix/maintenance&lt;/a&gt;
git repository.
&lt;/p&gt;


&lt;h2&gt;BFFE – Build Farm Front End&lt;/h2&gt;

&lt;p&gt;
We could start submitting build jobs now, but to visualise what is
happening, we need another service, &lt;code&gt;bffe&lt;/code&gt;. The
&lt;i&gt;build farm frontend&lt;/i&gt; actually serves two purposes:
On one hand on the bordeaux build farm, it submits build jobs for the
master branch to ensure continuous substitute availability (and a
different service, &lt;code&gt;qa-frontpage&lt;/code&gt;, submits build jobs for
testing branches and pull requests to the same build coordinator instance).
On the other hand, it provides a web server that connects to the
build coordinator and shows information about its status.
We will only need the second functionality. For this, add the following
snippet to the service configuration of the TBFG machine:
&lt;/p&gt;
&lt;pre&gt;
(service bffe-service-type
  (bffe-configuration
    (arguments
      #~(list
        #:web-server-args
          '(#:event-source &amp;quot;http://localhost:8746&amp;quot;
            #:controller-args (#:title &amp;quot;Science team build farm&amp;quot;))))))
&lt;/pre&gt;
&lt;p&gt;
We use the &lt;code&gt;#:web-server-args&lt;/code&gt; argument and provide as
&lt;code&gt;event-source&lt;/code&gt; the local build coordinator instance, which
communicates with clients on port 8746 (while communication with the
agents runs on port 8745 as seen above).
For more details on the optional &lt;code&gt;#:build&lt;/code&gt; argument, see
the
&lt;a href=&quot;https://guix.gnu.org/manual/devel/en/html_node/Guix-Services.html#index-bffe_002dservice_002dtype&quot;&gt;documentation&lt;/a&gt;
in the Guix manual or the
&lt;a href=&quot;https://codeberg.org/guix/bffe&quot;&gt;source code&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;
Reconfigure the system and open the BFFE website, either at
&lt;a href=&quot;http://localhost:8767&quot;&gt;&lt;code&gt;http://localhost:8767&lt;/code&gt;&lt;/a&gt;
locally on the TBFG machine, or at
&lt;a href=&quot;http://192.168.1.80:8767&quot;&gt;&lt;code&gt;http://192.168.1.80:8767&lt;/code&gt;&lt;/a&gt;
from another machine in your local network, where you have to adapt the
IP address to your situation.
The result is a rather empty web page, but it should at least show the title
we have chosen.
For our purposes, we are interested in the page obtained by appending
&lt;a href=&quot;http://192.168.1.80:8767/activity&quot;&gt;&lt;code&gt;/activity&lt;/code&gt;&lt;/a&gt;
to the URL.
This shows a box &lt;i&gt;Recent activity&lt;/i&gt;, which is rightfully empty;
and a list of agents per architecture and their current occupation,
which should also be void, except possibly for the percentage giving the
CPU load in case the agent also has other business.
Clicking on the name of an agent sends us to yet another page with
more details (among which the description we provided previously) .
&lt;/p&gt;


&lt;h2&gt;Submitting a build&lt;/h2&gt;

&lt;p&gt;
After all these preparations, we can finally submit our first build job!
For this, keep the &lt;code&gt;/activity&lt;/code&gt; page open, and run
&lt;/p&gt;
&lt;pre&gt;
$ DRV=`guix build hello --derivations`
$ guix-build-coordinator build --derivation-substitute-urls=https://data.guix.gnu.org $DRV
build submitted as 7bdd2249-1214-431a-aa61-de4d907f1b32
&lt;/pre&gt;
&lt;p&gt;
Providing a URL from which to fetch derivations is necessary because the
build coordinator receives only the store path of the derivation and not
the actual file; the same then recursively holds for inputs referenced
in the submitted derivation. This could be a global parameter of the
&lt;code&gt;guix-build-coordinator-configuration&lt;/code&gt;, but currently needs
to be specified by hand and can thus vary over time or depending on the
build.
&lt;/p&gt;
&lt;p&gt;
If all goes well, a UUID for the build is printed in the terminal, and
three lines appear on the BFFE web page in the &lt;i&gt;Recent activity&lt;/i&gt; box:
&lt;i&gt;Build submitted&lt;/i&gt;, &lt;i&gt;Build started&lt;/i&gt; and &lt;i&gt;Build succeeded&lt;/i&gt;.
The &lt;code&gt;/var/log/guix-build-coordinator/coordinator.log&lt;/code&gt; and
&lt;code&gt;/var/log/guix-build-coordinator/agent.log&lt;/code&gt; files should also
contain matching information.
And you can go to the
&lt;code&gt;http://192.168.1.80:8767/build/7bdd2249-1214-431a-aa61-de4d907f1b32&lt;/code&gt;
URL to see more detailed information on the build, with potentially a link
to the build log.
&lt;/p&gt;
&lt;p&gt;
This link unfortunately does not work out of the box, and some more
configuration is required to make the build logs available.
&lt;/p&gt;


&lt;h2&gt;Nginx for the build logs&lt;/h2&gt;

&lt;p&gt;
One possibility for accessing the build logs is by directly looking them
up in the place where they are stored by the build coordinator, the
directory &lt;code&gt;/var/lib/guix-build-coordinator/build-logs/&lt;/code&gt;.
Each build corresponds to a subdirectory named after its UUID and containing
the file &lt;code&gt;log.gz&lt;/code&gt;.
&lt;/p&gt;
&lt;p&gt;
Alternatively, we can imitate the
&lt;a href=&quot;https://codeberg.org/guix/maintenance/src/commit/bc7f188a027313437f0afb977bfe802d307d8dd3/hydra/bayfront.scm#L902-L918&quot;&gt;behaviour&lt;/a&gt;
of the bordeaux build farm and set up a separate
&lt;a href=&quot;https://guix.gnu.org/manual/devel/en/html_node/Web-Services.html#index-nginx_002dservice_002dtype&quot;&gt;nginx&lt;/a&gt;
web server using the following service snippet:
&lt;/p&gt;
&lt;pre&gt;
(service nginx-service-type
  (nginx-configuration
    (server-blocks
      (list
	(nginx-server-configuration
	  (listen '(&amp;quot;80&amp;quot; &amp;quot;[::]:80&amp;quot;))
	  (locations
	    (list
	      (nginx-location-configuration
		(uri &amp;quot;~ \&amp;quot;\\/build\\/([a-z0-9-]{36})/log$\&amp;quot;&amp;quot;)
                (body '(&amp;quot;alias /var/lib/guix-build-coordinator/build-logs/$1/log;&amp;quot;
                        &amp;quot;add_header Content-Type 'text/plain; charset=UTF-8';&amp;quot;
                        &amp;quot;gzip_static always;&amp;quot;
                        &amp;quot;gunzip on;&amp;quot;)))
              (nginx-location-configuration
                (uri &amp;quot;/&amp;quot;)
                (body '(&amp;quot;proxy_pass http://localhost:8767;&amp;quot;
                        &amp;quot;proxy_http_version 1.1;&amp;quot;
                        &amp;quot;proxy_set_header Connection \&amp;quot;\&amp;quot;;&amp;quot;)))
              (nginx-location-configuration
                (uri &amp;quot;/events&amp;quot;)
                (body '(&amp;quot;proxy_pass http://localhost:8767;&amp;quot;
                        &amp;quot;proxy_http_version 1.1;&amp;quot;
                        &amp;quot;proxy_buffering off;&amp;quot;
                        &amp;quot;proxy_set_header Connection \&amp;quot;\&amp;quot;;&amp;quot;))))))))))
&lt;/pre&gt;
&lt;p&gt;
The first
&lt;a href=&quot;https://guix.gnu.org/manual/devel/en/html_node/Web-Services.html#index-nginx_002dlocation_002dconfiguration&quot;&gt;&lt;code&gt;nginx-location-configuration&lt;/code&gt;&lt;/a&gt;
serves the build logs, while the other two are reverse proxies towards
the pages provided by BFFE at port &lt;code&gt;8767&lt;/code&gt;.
If the activity page is now accessed through the nginx web server at the
standard port through the URL
&lt;a href=&quot;http://192.168.1.80/activity&quot;&gt;&lt;code&gt;http://192.168.1.80/activity&lt;/code&gt;&lt;/a&gt;,
it presents links to builds and their log files, which can be clicked on,
and the uncompressed log files are shown directly in the browser.
&lt;/p&gt;


&lt;h2&gt;Outlook&lt;/h2&gt;

&lt;p&gt;
This was a lot of work for setting up the necessary services!
But hopefully it was also a good occasion to understand how the different
components interact.
In the
&lt;a href=&quot;tbfg-2.html&quot;&gt;next installment&lt;/a&gt;
we will start writing our own scripts to communicate with these services;
in particular we will work with the data service to retrieve information
about the packages we are interested in.
&lt;/p&gt;

&lt;/div&gt;</content></entry><entry><title>Wireguard VPN with Guix</title><id>https://enge.math.u-bordeaux.fr/blog/wireguard.html</id><author><name>Andreas Enge</name><email>andreas.enge@inria.fr</email></author><updated>2025-08-07T00:00:00Z</updated><link href="https://enge.math.u-bordeaux.fr/blog/wireguard.html" rel="alternate" /><content type="html">&lt;div&gt;

&lt;h2&gt;Needing a VPN&lt;/h2&gt;

&lt;p&gt;
Recently I changed my ISP, and the new one uses
&lt;a href=&quot;https://en.wikipedia.org/wiki/Carrier-grade_NAT&quot;&gt;Carrier-grade NAT&lt;/a&gt;,
or CGNAT, by default. While this sounds fancy and professional, it is in
fact even worse than conventional NAT: Not only do all my devices share the
same IPv4, but I share one IPv4 with several other customers!
Apparently I am only assigned a few out of the 65535 ports, and this
assignment may change from day to day, which implies that I cannot connect
from the outside to any of my home devices.
However, I do have a separate IPv4 of my own for a
&lt;a href=&quot;https://www.aquilenet.fr/services/h%C3%A9bergement-serveur/&quot;&gt;virtual
machine&lt;/a&gt;
at &lt;a href=&quot;https://www.aquilenet.fr/&quot;&gt;Aquilenet&lt;/a&gt;, and it should be
possible to use this as a trampoline to access my home through a virtual
private network.
We are already employing
&lt;a href=&quot;https://en.wikipedia.org/wiki/WireGuard&quot;&gt;WireGuard&lt;/a&gt;
for one of the
&lt;a href=&quot;https://guix.gnu.org/&quot;&gt;Guix&lt;/a&gt; build farms, so it felt like
a natural choice.
Guix provides the &lt;code&gt;wireguard-service-type&lt;/code&gt;, which is
&lt;a href=&quot;https://guix.gnu.org/manual/devel/en/html_node/VPN-Services.html&quot;&gt;documented&lt;/a&gt;
with all its options in the manual; but without an explanation of the
general concepts behind the service it is a bit difficult to set up.
The &lt;a href=&quot;https://guix.gnu.org/cookbook/en/html_node/&quot;&gt;Guix Cookbook&lt;/a&gt;
has an
&lt;a href=&quot;https://guix.gnu.org/cookbook/en/guix-cookbook.html#Connecting-to-Wireguard-VPN&quot;&gt;entry&lt;/a&gt;
on WireGuard, but it is concerned with kernel modules and connecting to an
existing WireGuard VPN, while my goal was to set one up in the first place.
This turned out to be surprisingly easy.
&lt;/p&gt;

&lt;p&gt;
The &lt;code&gt;wireguard-tools&lt;/code&gt; package comes with an executable
&lt;code&gt;wg&lt;/code&gt;, and running &lt;code&gt;wg --help&lt;/code&gt; is enough to guess
how WireGuard works; essentially we need the following two subcommands:
&lt;/p&gt;
&lt;pre&gt;
genkey: Generates a new private key and writes it to stdout
pubkey: Reads a private key from stdin and writes a public key to stdout
&lt;/pre&gt;
&lt;p&gt;
Unlike other VPN, WireGuard appears to be more symmetric in the
sense that it does not distinguish between servers and clients; to talk to
each other, two participants just need to create a pair of public and
private keys each, and then to be made aware of the other's public key.
In our asymmetric situation in which only one of them has a public IPv4,
we will nevertheless distinguish the &lt;i&gt;server&lt;/i&gt;, which is reachable
from everywhere thanks to its IP, and the &lt;i&gt;clients&lt;/i&gt; hidden behind
the CGNAT.
&lt;/p&gt;


&lt;h2&gt;Creating key pairs&lt;/h2&gt;

&lt;p&gt;
In a first step, we create a &lt;i&gt;private&lt;/i&gt; key for the server.
In Guix, secrets are (so far) not handled through
the world-readable store, but as state directly on the machine, and
&lt;code&gt;wireguard-service-type&lt;/code&gt; expects by default the private key in
the file &lt;code&gt;/etc/wireguard/private.key&lt;/code&gt;. So we connect as root
to the server machine and execute
&lt;/p&gt;
&lt;pre&gt;
mkdir /etc/wireguard
umask 077
wg genkey &amp;gt; /etc/wireguard/private.key
&lt;/pre&gt;
&lt;p&gt;
The call to &lt;code&gt;umask&lt;/code&gt; is needed (at least with my shell settings)
to placate the WireGuard warning that the private key file is
world-readable, which indeed defies its purpose. The file contains a short
&lt;a href=&quot;https://en.wikipedia.org/wiki/Base64&quot;&gt;base64&lt;/a&gt;
encoded number such as
&lt;code&gt;GEhlpFGslXfo9We9jhrXham4LztmqSmpdE4ivML4qXc=&lt;/code&gt;.
Given the size (or rather lack thereof) of this number, it looks like
WireGuard uses elliptic curve cryptography with a fixed elliptic curve
and a fixed basepoint of 256 bits, so with a security level of 128 bits.
&lt;/p&gt;
&lt;p&gt;
In a second step, we need to create the corresponding public key using
the command
&lt;/p&gt;
&lt;pre&gt;
wg pubkey &amp;lt; /etc/wireguard/private.key
&lt;/pre&gt;
&lt;p&gt;
This outputs a base64 encoded number of similar size; in our example,
&lt;code&gt;AFL8UecS3GFX3hK8e6yWOK4s5RVrTpvTq2A0pdGuylQ=&lt;/code&gt;.
This public key is in fact not needed by the server, but only by the clients
wishing to connect to it (following a basic principle of asymmetric
cryptography), so we need to stow it away in a file. But since the process
of deriving the public key from a private key is deterministic, we may
actually forget the public key and recreate it when needed. And notice that
the public key is so short that it could even be exchanged on a postcard
or over the phone.
&lt;/p&gt;
&lt;p&gt;
This process of creating a key pair needs to be repeated on each client,
or more generally, each participant in the VPN. Let us assume we have one
client with private key
&lt;code&gt;0G4uhLLeY5NYmg/FobRB0p75wMrGwmmzhuoAdfX243I=&lt;/code&gt; in
&lt;code&gt;/etc/wireguard/private.key&lt;/code&gt; and corresponding public key
&lt;code&gt;BgMzEZPUGAtbSqVPRdzgdLVhAPMLaOzHe7uNFAMVLCk=&lt;/code&gt;.
&lt;/p&gt;


&lt;h2&gt;Setting up the server&lt;/h2&gt;

&lt;p&gt;
Each participant in the WireGuard network uses a private IPv4,
usually from the &lt;code&gt;10.0.0.0/8&lt;/code&gt; range.
We give &lt;code&gt;10.0.0.1&lt;/code&gt; to the server and &lt;code&gt;10.0.0.2&lt;/code&gt;
to the client, and can now follow the
&lt;a href=&quot;https://guix.gnu.org/manual/devel/en/html_node/VPN-Services.html&quot;&gt;documentation&lt;/a&gt;
(you need to scroll down a bit) of &lt;code&gt;wireguard-service-type&lt;/code&gt;
to write the corresponding block in the Guix operating system configuration
of the server:
&lt;/p&gt;
&lt;pre&gt;
(service wireguard-service-type
  (wireguard-configuration
    (addresses '(&amp;quot;10.0.0.1/32&amp;quot;))
    (peers
      (list
        (wireguard-peer
          (name &amp;quot;client&amp;quot;)
          (public-key &amp;quot;BgMzEZPUGAtbSqVPRdzgdLVhAPMLaOzHe7uNFAMVLCk=&amp;quot;)
          (allowed-ips '(&amp;quot;10.0.0.2/32&amp;quot;)))))))
&lt;/pre&gt;
&lt;p&gt;
The &lt;code&gt;addresses&lt;/code&gt; field is in fact the default;
there is also an optional &lt;code&gt;port&lt;/code&gt; field with default 51820.
The &lt;code&gt;peers&lt;/code&gt; field is a list of, well, peers in the VPN which are
allowed to connect to the server (so it is in theory possible to create
strange connection graphs); in this case, we register only one client peer
with an arbitrary name, its public key created above, and its assigned
private IP address.
That is all! Now we can &lt;code&gt;guix system reconfigure&lt;/code&gt;, and
the server is ready.
&lt;/p&gt;


&lt;h2&gt;Setting up the client&lt;/h2&gt;

&lt;p&gt;
As stated above, in principle there does not seem to be a distinction
between clients and server in WireGuard, so the operating system
declaration on the client is similar to that on the server. But I think
that nevertheless, it is necessary to bootstrap the network topology.
And in our case, the inherent distinction between the server machine which
is, say, publicly reachable on the IPv4 &lt;code&gt;198.51.100.0&lt;/code&gt; under the
name &lt;code&gt;vpn.example.org&lt;/code&gt;, and the client hidden by CGNAT needs to
be taken into account. So we need the client to punch a hole into the NAT
and to reach out to the server, which leads to the following service block:
&lt;/p&gt;
&lt;pre&gt;
(service wireguard-service-type
  (wireguard-configuration
    (addresses '(&amp;quot;10.0.0.2/32&amp;quot;))
    (peers
      (list
        (wireguard-peer
          (name &amp;quot;server&amp;quot;)
          (public-key &amp;quot;AFL8UecS3GFX3hK8e6yWOK4s5RVrTpvTq2A0pdGuylQ=&amp;quot;)
          (allowed-ips '(&amp;quot;10.0.0.1/32&amp;quot;))
          (endpoint &amp;quot;198.51.100.0:51820&amp;quot;)
          (keep-alive 60))))))
&lt;/pre&gt;
&lt;p&gt;
The first fields are symmetric to the corresponding fields on the server.
But the additional &lt;code&gt;endpoint&lt;/code&gt; and &lt;code&gt;keep-alive&lt;/code&gt; fields
tell the client to connect to the server on its public IPv4 address (and
the default port 51820) for the initial handshake establishing the session,
and to keep it alive by reconnecting every 60 seconds.
I have tried to use the host name &lt;code&gt;vpn.example.org&lt;/code&gt; instead of
the IPv4, but this ended up being resolved to an IPv6, which did not work.
So &lt;code&gt;guix system reconfigure&lt;/code&gt; the client, wait for at most one
minute, and the VPN is running!
&lt;/p&gt;


&lt;h2&gt;Looking behind the scenes&lt;/h2&gt;

&lt;p&gt;
The following is not necessary for setting up the VPN, but it may be
helpful for trouble shooting; and I was curious to see how the VPN
manifested itself.
Running &lt;code&gt;ifconfig&lt;/code&gt; as root on the client, say, shows a new
interface
&lt;/p&gt;
&lt;pre&gt;
wg0 Link encap:(hwtype unknown)
    inet addr:10.0.0.2  P-t-P:10.0.0.2  Mask:255.255.255.255
…
&lt;/pre&gt;
&lt;p&gt;
and running &lt;code&gt;wg&lt;/code&gt; (again as root) shows the information about
the VPN that we entered into the service description:
&lt;/p&gt;
&lt;pre&gt;
interface: wg0
  public key: BgMzEZPUGAtbSqVPRdzgdLVhAPMLaOzHe7uNFAMVLCk=
  private key: (hidden)
  listening port: 51820

peer: AFL8UecS3GFX3hK8e6yWOK4s5RVrTpvTq2A0pdGuylQ=
  endpoint: 198.51.100.0:51820
  allowed ips: 10.0.0.1/32
  latest handshake: 1 minute, 15 seconds ago
  transfer: 555.77 KiB received, 1.38 MiB sent
  persistent keepalive: every 1 minute
&lt;/pre&gt;


&lt;h2&gt;Finally, connecting from the outside!&lt;/h2&gt;

&lt;p&gt;
But let us not get carried away by the beauty of technology (and
cryptography), but get back to our initial concern: Connect from the
outside to the machine in the home network, which we know under the name
of &lt;i&gt;client&lt;/i&gt;.
This is now just a matter of two hops with &lt;code&gt;ssh&lt;/code&gt;:
First do an &lt;code&gt;ssh vpn.example.org&lt;/code&gt;, and once on the VPN server
machine, a second &lt;code&gt;ssh 10.0.0.2&lt;/code&gt;.
This can be automated by the following entry in &lt;code&gt;.ssh/config&lt;/code&gt;
on the machine from which we try to connect:
&lt;/p&gt;
&lt;pre&gt;
Host client
  Hostname 10.0.0.2
  ProxyJump vpn.example.org
&lt;/pre&gt;
&lt;p&gt;
so that from now on, &lt;code&gt;ssh client&lt;/code&gt; will send us into the
home network.
Voilà, problem solved!
&lt;/p&gt;


&lt;h2&gt;Epilogue&lt;/h2&gt;

&lt;p&gt;
After installing my WireGuard VPN, I talked with a fellow geek from
Aquilenet, who has the same ISP, and who suggested an alternative
solution to me. I should call the hotline and pronounce the magic words
that I would like an “IP rollback” so that servers at home become
accessible. This helps passing the barrier of the first support level.
The next level then initiates the “rollback”, which means going from
IPv6 (plus IPv4 with CGNAT) back to regular IPv4 (without IPv6). After a
few hours or days, the new more or less static (not guaranteed to be so,
but actually not changing) IPv4 address is established, and IPv6 is disabled
in the wireless router provided by the ISP. One can then reenable IPv6
in the router and lives in the best of all worlds – with a static IPv4
address, IPv6 and a WireGuard VPN on top of it all.
&lt;/p&gt;

&lt;/div&gt;</content></entry><entry><title>Goblins for number theory, part 4</title><id>https://enge.math.u-bordeaux.fr/blog/goblins-4.html</id><author><name>Andreas Enge</name><email>andreas.enge@inria.fr</email></author><updated>2025-03-17T00:00:00Z</updated><link href="https://enge.math.u-bordeaux.fr/blog/goblins-4.html" rel="alternate" /><content type="html">&lt;div&gt;

&lt;h1&gt;Client and server Goblins&lt;/h1&gt;

&lt;p&gt;
After having
&lt;a href=&quot;goblins-1.html&quot;&gt;introduced&lt;/a&gt; the basic concepts
of Goblins, and in particular promises;
after having looked at 
&lt;a href=&quot;goblins-2.html&quot;&gt;parallelisation&lt;/a&gt; over the network;
and after an excursion to
&lt;a href=&quot;goblins-3.html&quot;&gt;persistence&lt;/a&gt;,
it is now time to get to the main topic.
We would like more flexibility, as well in the behaviour of the clients
as in the nature of the tasks they are handling and the control flow
in the server.
Clients should be able to come and go, maybe complete only one task never
to be seen again (we will not handle the case of faults, however, that is,
clients accepting a task and disappearing before completing it).
Tasks could be heterogeneous, that is, take more or less time, or,
equivalently, the clients could run on heterogeneous machines, and it would
be nice to give out a new task to a client as soon as it finishes the
previous one.
And we would like the server to be able to work in rounds; in essence,
distribute unrelated tasks corresponding to a loop, then gather the results
and start with the next loop.
&lt;/p&gt;
&lt;p&gt;
After David Thompson and Jessica Tallon had a look at my first solution,
which had performance problems I could not explain, they came up with a
much better idea, so I will present their solution without losing time
in explaining what went wrong. Suffice it to say that one should avoid
nesting &lt;code&gt;with-vat&lt;/code&gt; expressions. In my experience, doing so with
the same vat leads to a deadlock; doing so with different vats seems to
work, but cause a severe performance penalty.
&lt;/p&gt;


&lt;h2&gt;Queueing up clients, tasks and other promises&lt;/h2&gt;

&lt;p&gt;
Our current solution already keeps a list of clients at the server, to
which clients can register in the background. Instead of waiting until
a fixed number of clients have arrived, we should be more dynamic and
implement the server as follows. As long as there are tasks to be
submitted and the client list is not empty, the server removes a client
from the client list and submits a task to this client. If the client list
is empty while there are still unsubmitted tasks, the server waits until a
new client registers. So far, this scheme uses each client for exactly one
task, and works if more clients register than there are tasks.
The trick is to let the client do its computation, and at the end
register itself again with the server as being available for the next task.
My first solution used the existing &lt;code&gt;^registry&lt;/code&gt; actor, added an
&lt;code&gt;'unregister&lt;/code&gt; method used by the server to retrieve an available
client, and let the client call the &lt;code&gt;'register&lt;/code&gt; method after
finishing a task. The problem with this straightforward approach is that
one needs to have the server wait when no client is available, and this
risks stalling everything. In a sense, we are back to the problem discussed
in the &lt;a href=&quot;goblins-1.html&quot;&gt;first&lt;/a&gt; post:
Goblins work with promises, and waiting for their resolution is not
a Goblins concept; one should not try to master the time and write code
to be executed at a specific moment, but rather define call-backs that
are run when promises become true.
&lt;/p&gt;
&lt;p&gt;
Jessica and David pointed out to me that since version 0.14 of Goblins,
a suitable module is available in the actor library: the
&lt;a href=&quot;https://files.spritely.institute/docs/guile-goblins/0.15.0/Inbox.html&quot;&gt;inbox&lt;/a&gt;,
which is modelled after a post box that queues messages and delivers them
one by one on request. Actually it rather delivers parcels, since it is
a general first in, first out queue that can be filled with anything.
We will replace the current &lt;code&gt;clients&lt;/code&gt; list in a
&lt;code&gt;^cell&lt;/code&gt; actor by an &lt;code&gt;inbox&lt;/code&gt; that will contain
client actors.
The crucial difference with my home brew solution is that the level of
an inbox can go below zero without blocking:
If there are no elements in the queue, it nevertheless returns a promise to
a future element, in our case a client actor that will register later.
Thanks to
&lt;a href=&quot;https://files.spritely.institute/docs/guile-goblins/0.15.0/Promise-pipelining.html&quot;&gt;promise
pipelining&lt;/a&gt;, we can pretend that this empty promise is actually an actor
and send messages to it using &lt;code&gt;&amp;lt;-&lt;/code&gt;. Once a new actor
registers, the promise fulfills itself, and the new actor will receive
the message sent previously and act on it.
&lt;/p&gt;
&lt;p&gt;
The essential modifications occur in the &lt;code&gt;^worker&lt;/code&gt; type actor
in the client script:
&lt;/p&gt;
&lt;pre&gt;
(define-actor (^worker bcom server) #:self self
  (methods
    ((square x)
     (let ((res (* x x)))
          (format #t &amp;quot;square ~a\n&amp;quot; x)
          (sleep 3)
          (&amp;lt;- server self)
          res))
    ((finish)
     (signal-condition! end))))
&lt;/pre&gt;
&lt;p&gt;
After computing the result (and sleeping a little bit for testing purposes,
since the tasks are so short that otherwise one client would end up
grabbing all of them before we have a chance to start a second one), the
client sends itself back to the server.
To this purpose, it needs to know the server, which can be passed to it
upon spawning; and it needs to have a notion of itself. This is why we
have replaced the &lt;code&gt;define&lt;/code&gt; by the more general
&lt;code&gt;define-actor&lt;/code&gt;, in which the optional
&lt;code&gt;#:self self&lt;/code&gt; defines a formal parameter &lt;code&gt;self&lt;/code&gt; to
later… speak to oneself! 
Notice that we have also modified the client so that as in the first blog
posts, it does not register with its name (which thus is not passed on the
command line either anymore): Using an &lt;code&gt;inbox&lt;/code&gt; instead of the
custom registry function implies that we would need to encapsulate the
client actor into a composite data structure together with its name (for
instance,
&lt;a href=&quot;https://www.gnu.org/software/guile/manual/html_node/SRFI_002d9-Records.html&quot;&gt;SRFI-9
records&lt;/a&gt;), which is more hassle than warranted for our experimatal
code. To make up for it, we let the client itself print the computing tasks
it receives.
The complete client script, after some shuffling around so that things are
defined in the correct order, looks like this:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (fibers conditions)
             (goblins)
             (goblins actor-lib methods)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls))

(define vat (spawn-vat))
(define net (spawn-vat))
(define end (make-condition))

(define-actor (^worker bcom server) #:self self
  (methods
    ((square x)
     (let ((res (* x x)))
          (format #t &amp;quot;square ~a\n&amp;quot; x)
          (sleep 3)
          (&amp;lt;- server self)
          res))
    ((finish)
     (signal-condition! end))))

(define capn
  (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;))))

(define server
  (with-vat vat
    (&amp;lt;- capn 'enliven (string-&amp;gt;ocapn-id (second (command-line))))))

(define client
  (with-vat vat (spawn ^worker server)))
(with-vat net ($ capn 'register client 'tcp-tls))

(with-vat vat
  (&amp;lt;- server client))

(wait end)
&lt;/pre&gt;

&lt;p&gt;
In the server, we essentially replace the custom &lt;code&gt;^registry&lt;/code&gt;
by an &lt;code&gt;inbox&lt;/code&gt;, which results in the following code:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (srfi srfi-26)
             (goblins)
             (goblins actor-lib cell)
             (goblins actor-lib inbox)
             (goblins actor-lib joiners)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls)
             (goblins persistence-store syrup)
             (goblins vat))

(define persistence-vat (spawn-vat))
(define persistence-registry
  (with-vat persistence-vat
    (spawn ^persistence-registry)))

(define-values (net capn)
  (spawn-persistent-vat
    (make-persistence-env #:extends (list captp-env tcp-tls-netlayer-env))
    (lambda ()
      (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;)))
    (make-syrup-store &amp;quot;ocapn.syrup&amp;quot;)
    #:persistence-registry persistence-registry))

(define (print-id prefix id)
  (with-vat net
    (on id
      (lambda (sref)
        (format #t &amp;quot;~a ~a\n&amp;quot;
                   prefix (ocapn-id-&amp;gt;string sref))))))

(define-values (vat get-client put-client stop-clients)
  (spawn-persistent-vat
    (make-persistence-env
      #:extends inbox-env)
    (lambda ()
      (spawn-inbox))
    (make-syrup-store &amp;quot;registry.syrup&amp;quot;)
    #:persist-on #f
    #:persistence-registry persistence-registry))

(let ((id (with-vat net ($ capn 'register put-client 'tcp-tls))))
  (print-id &amp;quot;Server ID&amp;quot; id))

(define all-clients (with-vat vat (spawn ^cell '())))
(define v '(1 2 3 4 5))
(with-vat vat
  (let
    ((clients (map (lambda (x) (&amp;lt;- get-client)) v)))
    ($ all-clients (append clients ($ all-clients)))
    (on (all-of* (map (cut &amp;lt;- &amp;lt;&amp;gt; 'square &amp;lt;&amp;gt;) clients v))
      (lambda (res)
        (format #t &amp;quot;~a\n&amp;quot; (sqrt (fold + 0 res)))
        (on (all-of* ($ all-clients))
          (lambda (c)
            (map (cut &amp;lt;- &amp;lt;&amp;gt; 'finish) (delete-duplicates c))))))))

(sleep 3600)
&lt;/pre&gt;
&lt;p&gt;
The &lt;code&gt;spawn-inbox&lt;/code&gt; function does not return one actor, but
actually three at the same time: one for adding elements into the queue,
one for retrieving an element (or a promise thereof), and one for shutting
the &lt;code&gt;inbox&lt;/code&gt; down (which we will not use).
The expression
&lt;/p&gt;
&lt;pre&gt;
(map (lambda (x) (&amp;lt;- get-client)) v)
&lt;/pre&gt;
&lt;p&gt;
creates a list of (promises to) client actors that is as long as the size
of the vector. Then we send the tasks as before and use &lt;code&gt;all-of*&lt;/code&gt;
to wait for their results. There is a little subtlety for sending the
&lt;code&gt;'finish&lt;/code&gt; messages: Since the variable &lt;code&gt;clients&lt;/code&gt;
in general does not contain a list of client actors any more, but a list
of promises, we also need to use &lt;code&gt;(on (all-of* …))&lt;/code&gt; to
retrieve the actual list of actors. We go further by memorising all clients
ever encountered (with multiplicities, actually) in a separate cell
&lt;code&gt;all-clients&lt;/code&gt;. This is a bit convoluted at this point (since at
the end of the script, &lt;code&gt;($ all-clients)&lt;/code&gt; is the same as
&lt;code&gt;clients&lt;/code&gt;), but will make things easier later.
Without any extra code for sending the &lt;code&gt;'finish&lt;/code&gt; signal to the
clients, the main part of the server script could be condensed into only
a few lines:
&lt;/p&gt;
&lt;pre&gt;
(define v '(1 2 3 4 5))
(with-vat vat
  (on (all-of* (map (lambda (x) (&amp;lt;- (&amp;lt;- get-client) 'square x)) v))
    (lambda (res)
      (format #t &amp;quot;~a\n&amp;quot; (sqrt (fold + 0 res))))))
&lt;/pre&gt;
&lt;p&gt;
Notice that this solution is strictly more general than that of the
previous posts:
If only one client registers, it runs all the squaring tasks; if a second
one arrives, it obtains every other task; and so on.
And… that's it! We have parallelised a &lt;code&gt;for&lt;/code&gt; loop which may
contain tasks of differing (and a priori unknown) lengths, and it can
handle the situation where clients join at any time.
To handle clients that may leave after a task is completed, the
framework is essentially there: Instead of having the client
register again after each computing task, this could be
made dependent on a condition to be checked in the client.
For handling faults, that is, clients which disappear in the middle of
a task, one would need to add timeouts at the server level and requeue
tasks for which the result has not appeared after a reasonable waiting
time, which would depend on the application.
Then &lt;code&gt;all-of*&lt;/code&gt; would not be a suitable joiner, but one could use
&lt;a href=&quot;https://files.spritely.institute/docs/guile-goblins/0.15.0/Joiners.html#index-race&quot;&gt;&lt;code&gt;race&lt;/code&gt;&lt;/a&gt;,
which resolves as soon as one of several promises resolves.
Or one could use &lt;code&gt;all-of*&lt;/code&gt; on a list of promises created
by &lt;code&gt;race&lt;/code&gt; from a computation promise and a timeout promise,
as given precisely as an example in the documentation of &lt;code&gt;race&lt;/code&gt;.
We will not pursue the topic of faults in this post, but it is clear
that Goblins mechanisms could be used to solve the problem.
&lt;/p&gt;
&lt;p&gt;
In any case, our current Goblins code is already more flexible than the
MPI solution, which assumes that all clients are known at the beginning
of the computation and do not change throughout, and which also breaks
in the presence of faults.
&lt;/p&gt;


&lt;h2&gt;Time for crochet: loops after loops!&lt;/h2&gt;

&lt;p&gt;
A common situation is that after running one loop, one needs to start a
second round that continues the computations with the intermediate results
that have just been obtained. In what follows, we will modify the server
script accordingly, while keeping the client script unmodified, which may
be seen as a sign that the architecture developed so far makes sense.
&lt;/p&gt;
&lt;p&gt;
For instance, the following sequential
code computes the L4-norm of a vector, that is, the fourth root of the sum
of the fourth powers of its entries:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1))
(define (square x) (* x x))
(define v '(1 2 3 4 5))
(define w (map square v))
(define res (map square w))
(format #t &amp;quot;~a\n&amp;quot; (sqrt (sqrt (fold + 0 res))))
&lt;/pre&gt;
&lt;p&gt;
It consists of two loops, one for squaring each entry in &lt;code&gt;v&lt;/code&gt;
and putting the results into &lt;code&gt;w&lt;/code&gt;, and a second one for
squaring the entries in &lt;code&gt;w&lt;/code&gt; (which effectively computes the
fourth powers of the entries in &lt;code&gt;v&lt;/code&gt;).
&lt;/p&gt;
&lt;p&gt;
This can be goblinified quite naturally by nesting the task submission
and &lt;code&gt;(on (all-of* …))&lt;/code&gt; handling of the results.
Without &lt;code&gt;'finish&lt;/code&gt; signals, this results in the following code:
&lt;/p&gt;
&lt;pre&gt;
(define v '(1 2 3 4 5))
(with-vat vat
  (on (all-of* (map (lambda (x) (&amp;lt;- (&amp;lt;- get-client) 'square x)) v))
    (lambda (w)
      (on (all-of* (map (lambda (x) (&amp;lt;- (&amp;lt;- get-client) 'square x)) w))
        (lambda (res)
          (format #t &amp;quot;~a\n&amp;quot; (sqrt (sqrt (fold + 0 res)))))))))
&lt;/pre&gt;
&lt;p&gt;
Including &lt;code&gt;'finish&lt;/code&gt; handling, the following server script is not
the shortest solution, but its symmetries will be helpful in the next
section. To make the code more readable, we have moved some (repetitive)
code into the functions &lt;code&gt;submit-square-jobs&lt;/code&gt; and
&lt;code&gt;submit-finish-jobs&lt;/code&gt;.
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (srfi srfi-26)
             (goblins)
             (goblins actor-lib cell)
             (goblins actor-lib inbox)
             (goblins actor-lib joiners)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls)
             (goblins persistence-store syrup)
             (goblins vat))

(define persistence-vat (spawn-vat))
(define persistence-registry
  (with-vat persistence-vat
    (spawn ^persistence-registry)))

(define-values (net capn)
  (spawn-persistent-vat
    (make-persistence-env #:extends (list captp-env tcp-tls-netlayer-env))
    (lambda ()
      (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;)))
    (make-syrup-store &amp;quot;ocapn.syrup&amp;quot;)
    #:persistence-registry persistence-registry))

(define (print-id prefix id)
  (with-vat net
    (on id
      (lambda (sref)
        (format #t &amp;quot;~a ~a\n&amp;quot;
                   prefix (ocapn-id-&amp;gt;string sref))))))

(define-values (vat get-client put-client stop-clients)
  (spawn-persistent-vat
    (make-persistence-env
      #:extends inbox-env)
    (lambda ()
      (spawn-inbox))
    (make-syrup-store &amp;quot;registry.syrup&amp;quot;)
    #:persist-on #f
    #:persistence-registry persistence-registry))

(let ((id (with-vat net ($ capn 'register put-client 'tcp-tls))))
  (print-id &amp;quot;Server ID&amp;quot; id))

(define all-clients (with-vat vat (spawn ^cell '())))

(define (submit-square-jobs v)
  (let ((clients (map (lambda (x) (&amp;lt;- get-client)) v)))
    ($ all-clients (append clients ($ all-clients)))
    (map (cut &amp;lt;- &amp;lt;&amp;gt; 'square &amp;lt;&amp;gt;) clients v)))

(define (submit-finish-jobs clients)
  (map (cut &amp;lt;- &amp;lt;&amp;gt; 'finish) (delete-duplicates clients)))

(define v '(1 2 3 4 5))
(with-vat vat
  (on (all-of* (submit-square-jobs  v))
    (lambda (w)
      (on (all-of* (submit-square-jobs w))
        (lambda (res)
          (format #t &amp;quot;~a\n&amp;quot; (sqrt (sqrt (fold + 0 res))))
          (on (all-of* ($ all-clients))
            submit-finish-jobs))))))

(sleep 3600)
&lt;/pre&gt;


&lt;h2&gt;Untangling the threads: macros to the rescue&lt;/h2&gt;

&lt;p&gt;
The last block of the server code now clearly shows a recurring pattern:
&lt;/p&gt;
&lt;pre&gt;
(on (all-of* SUBMIT SOME JOBS)
  (lambda (VAR)
    DO SOMETHING WITH THE RESULT IN VAR
&lt;/pre&gt;
&lt;p&gt;
which is actually nested, since handling the results of the first round
requires to run the same pattern for the second round of job submissions.
Now a pattern can be handled by a Guile
&lt;a href=&quot;https://www.gnu.org/software/guile/manual/html_node/Defining-Macros.html&quot;&gt;macro&lt;/a&gt;,
for instance as follows:
&lt;/p&gt;
&lt;pre&gt;
(define-syntax submit-reduce
  (syntax-rules ()
    ((submit-reduce submit v reduce ...)
     (on (all-of* submit)
       (lambda (v)
         (begin reduce ...))))))
&lt;/pre&gt;
&lt;p&gt;
The line following the &lt;code&gt;syntax-rules ()&lt;/code&gt; contains a pattern
to be matched; the remainder of the macro is the Guile code above, with
placeholders replaced by parts of the matched pattern.
The first argument of the macro is a single expression corresponding to
&lt;code&gt;SUBMIT SOME JOBS&lt;/code&gt;; if several expressions are needed, they can
be transformed into only one using
&lt;a href=&quot;https://www.gnu.org/software/guile/manual/html_node/Local-Bindings.html#index-let_002a&quot;&gt;&lt;code&gt;let*&lt;/code&gt;&lt;/a&gt;,
for instance.
The second argument is the (formal) variable name &lt;code&gt;VAR&lt;/code&gt;.
All remaining arguments (of which there may be zero) correspond to
&lt;code&gt;DO SOMETHING WITH THE RESULT IN VAR&lt;/code&gt;; these will in general
use the formal variable.
&lt;/p&gt;
&lt;p&gt;
Using this macro, the main block of the server script can be compressed
as follows:
&lt;/p&gt;
&lt;pre&gt;
(define v '(1 2 3 4 5))
(with-vat vat
  (submit-reduce (submit-square-jobs  v) w
    (submit-reduce (submit-square-jobs w) res
      (format #t &amp;quot;~a\n&amp;quot; (sqrt (sqrt (fold + 0 res))))
      (on (all-of* ($ all-clients))
        submit-finish-jobs))))

&lt;/pre&gt;
&lt;p&gt;
It is also possible to let the macro itself handle the nesting as follows:
&lt;/p&gt;
&lt;pre&gt;
(define-syntax submit-reduce
  (syntax-rules ()
    ((submit-reduce reduce)
     reduce)
    ((submit-reduce submit v reduce ...)
     (on (all-of* submit)
       (lambda (v)
         (submit-reduce reduce ...))))))
&lt;/pre&gt;
&lt;p&gt;
If the macro is called with at least three arguments, then the second
pattern &lt;code&gt;(submit-reduce submit v reduce ...)&lt;/code&gt; is matched.
The first argument (a single expression) is considered to be the job
submission phase, the second argument the variable name for the results
of the first jobs; then the macro is called recursively, and more
job submission phases, alternated with variable names, are expected;
in the end, when only one argument remains, the first pattern
&lt;code&gt;(submit-reduce reduce)&lt;/code&gt; is matched, which corresponds to the
handling of the results of the final round of job submissions.
So to work, the macro requires an odd number of arguments, otherwise it
raises an error (using just one argument is possible, but makes no sense).
With this macro, the main server block looks as follows:
&lt;/p&gt;
&lt;pre&gt;
(define v '(1 2 3 4 5))
(with-vat vat
  (submit-reduce
    (submit-square-jobs v) w
    (submit-square-jobs w) res
    (begin
      (format #t &amp;quot;~a\n&amp;quot; (sqrt (sqrt (fold + 0 res))))
      (on (all-of* ($ all-clients))
        submit-finish-jobs))))
&lt;/pre&gt;
&lt;p&gt;
Notice that we needed to wrap the final reduction into
&lt;code&gt;(begin …)&lt;/code&gt; since it consists of several expressions.
This macro, without its additional nesting, makes the sequence of
submitting a series of tasks, submitting a new series of tasks depending
on the results of the previous series, and so on, until the final result
is handled through a side effect (to break out of the promises), quite
clear. When exactly three arguments are given, both macros are equivalent,
so that it is still possible to manually nest the macro invocations, and
the second macro is more powerful than the first one (except that the first
one admits several Guile expressions for the reduction phase).
&lt;/p&gt;
&lt;p&gt;
To illustrate the simplicity with which the pattern continues, here is the
complete server script for computing the L8-norm of a vector, that is, the
eightth root of the sum of the eigtth powers of its entries:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (srfi srfi-26)
             (goblins)
             (goblins actor-lib cell)
             (goblins actor-lib inbox)
             (goblins actor-lib joiners)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls)
             (goblins persistence-store syrup)
             (goblins vat))

(define persistence-vat (spawn-vat))
(define persistence-registry
  (with-vat persistence-vat
    (spawn ^persistence-registry)))

(define-values (net capn)
  (spawn-persistent-vat
    (make-persistence-env #:extends (list captp-env tcp-tls-netlayer-env))
    (lambda ()
      (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;)))
    (make-syrup-store &amp;quot;ocapn.syrup&amp;quot;)
    #:persistence-registry persistence-registry))

(define (print-id prefix id)
  (with-vat net
    (on id
      (lambda (sref)
        (format #t &amp;quot;~a ~a\n&amp;quot;
                   prefix (ocapn-id-&amp;gt;string sref))))))

(define-values (vat get-client put-client stop-clients)
  (spawn-persistent-vat
    (make-persistence-env
      #:extends inbox-env)
    (lambda ()
      (spawn-inbox))
    (make-syrup-store &amp;quot;registry.syrup&amp;quot;)
    #:persist-on #f
    #:persistence-registry persistence-registry))

(let ((id (with-vat net ($ capn 'register put-client 'tcp-tls))))
  (print-id &amp;quot;Server ID&amp;quot; id))

(define all-clients (with-vat vat (spawn ^cell '())))

(define (submit-square-jobs v)
  (let ((clients (map (lambda (x) (&amp;lt;- get-client)) v)))
    ($ all-clients (append clients ($ all-clients)))
    (map (cut &amp;lt;- &amp;lt;&amp;gt; 'square &amp;lt;&amp;gt;) clients v)))

(define (submit-finish-jobs clients)
  (map (cut &amp;lt;- &amp;lt;&amp;gt; 'finish) (delete-duplicates clients)))

(define-syntax submit-reduce
  (syntax-rules ()
    ((submit-reduce reduce)
     reduce)
    ((submit-reduce submit v reduce ...)
     (on (all-of* submit)
       (lambda (v)
         (submit-reduce reduce ...))))))

(define v '(1 2 3 4 5))
(with-vat vat
  (submit-reduce
    (submit-square-jobs v) w
    (submit-square-jobs w) t
    (submit-square-jobs t) res
    (begin
      (format #t &amp;quot;~a\n&amp;quot; (sqrt (sqrt (sqrt (fold + 0 res)))))
      (on (all-of* ($ all-clients))
        submit-finish-jobs))))

(sleep 3600)
&lt;/pre&gt;


&lt;h2&gt;(Preliminary) conclusion&lt;/h2&gt;

&lt;p&gt;
At this point, the goal set out at the beginning of this series of blog
posts is met. We have developed a client and server structure in which the
clients register with the server and the server hands them computation
tasks that correspond to a sequence of loops. As already said above, the
result is even a bit more flexible than with MPI: The number of clients
need not be known and communicated to the server in advance, but clients
can come and go, as long as they do not vanish in the middle of a task.
And Goblins make it possible to do so over the Internet, either with
TCP/TLS or even through the Tor network.
&lt;/p&gt;

&lt;/div&gt;</content></entry><entry><title>Goblins for number theory, part 3</title><id>https://enge.math.u-bordeaux.fr/blog/goblins-3.html</id><author><name>Andreas Enge</name><email>andreas.enge@inria.fr</email></author><updated>2025-03-07T00:00:00Z</updated><link href="https://enge.math.u-bordeaux.fr/blog/goblins-3.html" rel="alternate" /><content type="html">&lt;div&gt;

&lt;h1&gt;Ending and persisting&lt;/h1&gt;

&lt;p&gt;
In previous posts we have seen how to solve our
&lt;a href=&quot;goblins-1.html&quot;&gt;toy problem&lt;/a&gt; of computing the
euclidian length of a vector in a
&lt;a href=&quot;goblins-2.html&quot;&gt;distributed fashion&lt;/a&gt; using Goblins,
with a client script that runs in several copies, carries out most of the
work and reports back to a server script, which collects the partial
results into a solution to the problem.
The clients could in principle live on distant machines and communicate
over the Tor network. For testing in a local setting, however, letting
them run on the same machine as the server and communicating over TCP
turns out to be more efficient.
So far, our architecture is rather inflexible: We assume that the server
knows the number of participating clients beforehand, and that all tasks
take more or less the same time so that distributing them evenly to the
clients is an optimal scheduling strategy.
The logical next step is to overcome these limitations.
My initial solution for a more general framework, however, turned out to
be very inefficient. Jessica Tallon and David Thompson of the
&lt;a href=&quot;https://spritely.institute/&quot;&gt;Spritely Institute&lt;/a&gt; (many thanks
to them!) kindly had a look at it and came up with a much better solution;
but our discussions also helped me understand Goblins better and inspired
ideas on how to improve the current client and server scripts.
So before going for more generality in the next post, let us do a pirouette
with the current framework and also explore some interesting side tracks
that did not make it into the previous post.
&lt;/p&gt;


&lt;h2&gt;Spring cleaning&lt;/h2&gt;

&lt;p&gt;
Before doing anything substantial, let us clean up a few things in the
current code. The main actor in the server script is currently defined
through the type &lt;code&gt;^register&lt;/code&gt; as follows:
&lt;/p&gt;
&lt;pre&gt;
(define clients (with-vat vat (spawn ^cell '())))
(define (^register bcom)
  (lambda (id)
    ($ clients (cons (&amp;lt;- mycapn 'enliven id)
                     ($ clients)))
    (print-id &amp;quot;Registered&amp;quot; id)))
(define register (with-vat vat (spawn ^register)))
&lt;/pre&gt;
&lt;p&gt;
It captures the &lt;code&gt;clients&lt;/code&gt; variable in the closure defined by
&lt;code&gt;lambda&lt;/code&gt;, which works, but requires the variables to be defined
in this order. A more elegant solution is to pass &lt;code&gt;clients&lt;/code&gt;
as an argument. At the same time, we take the opportunity to rename the
verb &lt;code&gt;register&lt;/code&gt; to the noun &lt;code&gt;registry&lt;/code&gt;.
&lt;/p&gt;
&lt;pre&gt;
(define (^registry bcom clients)
  (lambda (id)
    ($ clients (cons (&amp;lt;- mycapn 'enliven id)
                     ($ clients)))
    (print-id &amp;quot;Registered&amp;quot; id)))
(define clients (with-vat vat (spawn ^cell '())))
(define registry (with-vat vat (spawn ^registry clients)))
&lt;/pre&gt;
&lt;p&gt;
Let us also get rid of some “overgoblinification”; indeed the actor of
type &lt;code&gt;^len&lt;/code&gt; in the server can be replaced by a simple function,
or (since the Goblins promises force us to work with side effects anyway)
by sequential code. We end up with the following server script
&lt;code&gt;server.scm&lt;/code&gt;:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (goblins)
             (goblins actor-lib cell)
             (goblins actor-lib joiners)
             (goblins actor-lib methods)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls))

(define vat (spawn-vat))
(define net (spawn-vat))

(define (print-id prefix id)
  (with-vat net
    (on id
      (lambda (sref)
        (format #t &amp;quot;~a ~a\n&amp;quot;
                   prefix (ocapn-id-&amp;gt;string sref))))))

(define capn
   (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;))))

(define (^registry bcom clients)
  (lambda (id)
    ($ clients (cons (&amp;lt;- capn 'enliven id)
                     ($ clients)))
    (print-id &amp;quot;Registered&amp;quot; id)))

(define clients (with-vat vat (spawn ^cell '())))
(define registry (with-vat vat (spawn ^registry clients)))
(let ((id (with-vat net ($ capn 'register registry 'tcp-tls))))
  (print-id &amp;quot;Server ID&amp;quot; id))

(while (not (eq? (length (with-vat vat ($ clients))) 2))
       (sleep 1))

(define v '(1 2 3 4 5))
(with-vat vat
  (while (&amp;lt; (length ($ clients)) (length v))
     (let ((c ($ clients)))
       ($ clients (append c c)))))
(with-vat vat
  (on (all-of* (map &amp;lt;- ($ clients) v))
      (lambda (res)
        (format #t &amp;quot;~a\n&amp;quot; (sqrt (fold + 0 res))))))

(sleep 3600)
&lt;/pre&gt;
&lt;p&gt;
and the following client script &lt;code&gt;client.scm&lt;/code&gt;:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (goblins)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls))

(define vat (spawn-vat))
(define net (spawn-vat))

(define (^square bcom)
  (lambda (x)
    (* x x)))
(define client
  (with-vat vat (spawn ^square)))

(define (print-id prefix id)
  (with-vat net
    (on id
      (lambda (sref)
        (format #t &amp;quot;~a ~a\n&amp;quot;
                   prefix (ocapn-id-&amp;gt;string sref))))))

(define capn
  (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;))))
(define id
  (with-vat net ($ capn 'register client 'tcp-tls)))
(print-id &amp;quot;Client ID&amp;quot; id)

(define server
  (with-vat vat
    (&amp;lt;- capn 'enliven (string-&amp;gt;ocapn-id (second (command-line))))))

(with-vat vat
  (on id
    (lambda (id)
      (&amp;lt;- server id))))

(sleep 3600)
&lt;/pre&gt;
&lt;p&gt;
Now run again
&lt;/p&gt;
&lt;pre&gt;
guile server.scm
&lt;/pre&gt;
&lt;p&gt;
in one terminal and two copies of the client script as
&lt;/p&gt;
&lt;pre&gt;
guile client.scm 'ocapn://…'
&lt;/pre&gt;
&lt;p&gt;
in two other terminals, where the ocapn URI has been replaced by the one
printed by the server, to compute the same result as before.
&lt;/p&gt;



&lt;h2&gt;Passing actors around&lt;/h2&gt;

&lt;p&gt;
After going through the
&lt;a href=&quot;https://files.spritely.institute/docs/guile-goblins/0.15.0/Example-Two-Goblins-programs-chatting-over-CapTP-via-Tor.html&quot;&gt;CapTP&lt;/a&gt;
tutorial, I was under the impression that the only way to create a handle
on an actor on a different machine was by obtaining its sturdyref ID
and “enlivening” this ID locally. Currently the server script prints its
ID, which the client script obtains as an argument when invoked from the
command line. This enables the client to enliven the server and to send
its ID to the server when registering by a &lt;code&gt;&amp;lt;-&lt;/code&gt; call; then
the server enlivens the client.
It turns out, however, that it is also possible to directly send actors
instead of their IDs through &lt;code&gt;&amp;lt;-&lt;/code&gt;. Printing and copy-pasting
IDs is still necessary for bootstrapping, but once a spanning tree is
generated in this manner between all participating scripts, it is possible
to obtain a complete communication graph by just sending actors along these
bootstrapped network edges.
&lt;/p&gt;
&lt;p&gt;
We would still like the client to somehow present itself to the server with
a name, so that the server can print who connects to it and thus make
debugging easier. If we drop the ocapn ID, then the client can use a pet
name, a string that we pass as an additional argument on the command line.
The server needs only minimal modifications:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (goblins)
             (goblins actor-lib cell)
             (goblins actor-lib joiners)
             (goblins actor-lib methods)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls))

(define vat (spawn-vat))
(define net (spawn-vat))

(define (print-id prefix id)
  (with-vat net
    (on id
      (lambda (sref)
        (format #t &amp;quot;~a ~a\n&amp;quot;
                   prefix (ocapn-id-&amp;gt;string sref))))))

(define capn
   (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;))))

(define (^registry bcom clients)
  (lambda (client name)
    ($ clients (cons client ($ clients)))
    (format #t &amp;quot;Registered ~a\n&amp;quot; name)))

(define clients (with-vat vat (spawn ^cell '())))
(define registry (with-vat vat (spawn ^registry clients)))
(let ((id (with-vat net ($ capn 'register registry 'tcp-tls))))
  (print-id &amp;quot;Server ID&amp;quot; id))

(while (not (eq? (length (with-vat vat ($ clients))) 2))
       (sleep 1))

(define v '(1 2 3 4 5))
(with-vat vat
  (while (&amp;lt; (length ($ clients)) (length v))
     (let ((c ($ clients)))
       ($ clients (append c c)))))
(with-vat vat
  (on (all-of* (map &amp;lt;- ($ clients) v))
      (lambda (res)
        (format #t &amp;quot;~a\n&amp;quot; (sqrt (fold + 0 res))))))

(sleep 3600)
&lt;/pre&gt;
&lt;p&gt;
Notice the additional argument &lt;code&gt;name&lt;/code&gt; for the
&lt;code&gt;^registry&lt;/code&gt; actor, which is used for announcing arriving
clients instead of their ocapn ID.
(In this implementation we forget the name of a client immediately;
it would make sense to somehow keep it, either by remembering it directly
in &lt;code&gt;^square&lt;/code&gt; or by having the server memorise it in its client
list.)
Instead of enlivening an ID and adding the resulting actor to the
&lt;code&gt;clients&lt;/code&gt; list, the server adds the client actor directly.
The client modifications are also straightforward and simplify the script
considerably:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (goblins)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls))

(define vat (spawn-vat))
(define net (spawn-vat))

(define (^square bcom)
  (lambda (x)
    (* x x)))
(define client
  (with-vat vat (spawn ^square)))

(define capn
  (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;))))
(with-vat net ($ capn 'register client 'tcp-tls))

(define name (second (command-line)))

(define server
  (with-vat vat
    (&amp;lt;- capn 'enliven (string-&amp;gt;ocapn-id (third (command-line))))))

(with-vat vat
  (&amp;lt;- server client name))

(sleep 3600)
&lt;/pre&gt;
&lt;p&gt;
Now start the server as usual, and two clients as
&lt;/p&gt;
&lt;pre&gt;
guile client.scm Alice 'ocapn://…'
guile client.scm Bob 'ocapn://…'
&lt;/pre&gt;
&lt;p&gt;
to see the familiar result.
&lt;/p&gt;


&lt;h2&gt;Being methodical&lt;/h2&gt;

&lt;p&gt;
As it will be useful later on, let us replace the workhorse in the client,
the &lt;code&gt;^square&lt;/code&gt; actor with only one possible action (squaring
a number that is sent to it) by an implementation with potentially more
actions. To do so, we use
&lt;a href=&quot;https://files.spritely.institute/docs/guile-goblins/0.15.0/Methods.html&quot;&gt;methods&lt;/a&gt;
from Goblin actor libs, which dispatch actions using an additional symbol.
So
&lt;/p&gt;
&lt;pre&gt;
(define (^square bcom)
  (lambda (x)
    (* x x)))
(define client
  (with-vat vat (spawn ^square)))
&lt;/pre&gt;
&lt;p&gt;
becomes
&lt;/p&gt;
&lt;pre&gt;
(use-module (goblins actor-lib methods)
…
(define (^worker bcom)
  (methods
    ((square x)
     (* x x))))
(define client
  (with-vat vat (spawn ^worker)))
&lt;/pre&gt;
&lt;p&gt;
Inside the server, we now need to change calls of the form
&lt;/p&gt;
&lt;pre&gt;
(&amp;lt;- client x)
&lt;/pre&gt;
&lt;p&gt;
by adding an additional symbol to
&lt;/p&gt;
&lt;pre&gt;
(&amp;lt;- client 'square x)
&lt;/pre&gt;
&lt;p&gt;
This is made more complicated since they appear inside &lt;code&gt;map&lt;/code&gt;:
&lt;/p&gt;
&lt;pre&gt;
(map &amp;lt;- ($ clients) v)
&lt;/pre&gt;
&lt;p&gt;
The solution is to change the &lt;code&gt;&amp;lt;-&lt;/code&gt; function, which now takes
three arguments (a client, a symbol and a number) into a function with only
two arguments by fixing the middle argument to &lt;code&gt;'square&lt;/code&gt;.
This can be done using
&lt;a href=&quot;https://www.gnu.org/software/guile/manual/html_node/SRFI_002d26.html#index-cut&quot;&gt;SRFI-26
cut&lt;/a&gt;; it takes the function name and for each argument of the function
either a fixed value, or the placeholder &lt;code&gt;&amp;lt;&amp;gt;&lt;/code&gt; indicating
that this argument should be kept as such. In our case, this gives
&lt;/p&gt;
&lt;pre&gt;
(map (cut &amp;lt;- &amp;lt;&amp;gt; 'square &amp;lt;&amp;gt;) ($ clients) v))
&lt;/pre&gt;
&lt;p&gt;
So altogether, here is our current server:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (srfi srfi-26)
             (goblins)
             (goblins actor-lib cell)
             (goblins actor-lib joiners)
             (goblins actor-lib methods)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls))

(define vat (spawn-vat))
(define net (spawn-vat))

(define (print-id prefix id)
  (with-vat net
    (on id
      (lambda (sref)
        (format #t &amp;quot;~a ~a\n&amp;quot;
                   prefix (ocapn-id-&amp;gt;string sref))))))

(define capn
   (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;))))

(define (^registry bcom clients)
  (lambda (client name)
    ($ clients (cons client ($ clients)))
    (format #t &amp;quot;Registered ~a\n&amp;quot; name)))

(define clients (with-vat vat (spawn ^cell '())))
(define registry (with-vat vat (spawn ^registry clients)))
(let ((id (with-vat net ($ capn 'register registry 'tcp-tls))))
  (print-id &amp;quot;Server ID&amp;quot; id))

(while (not (eq? (length (with-vat vat ($ clients))) 2))
       (sleep 1))

(define v '(1 2 3 4 5))
(with-vat vat
  (while (&amp;lt; (length ($ clients)) (length v))
     (let ((c ($ clients)))
       ($ clients (append c c)))))
(with-vat vat
  (on (all-of* (map (cut &amp;lt;- &amp;lt;&amp;gt; 'square &amp;lt;&amp;gt;) ($ clients) v))
      (lambda (res)
        (format #t &amp;quot;~a\n&amp;quot; (sqrt (fold + 0 res))))))

(sleep 3600)
&lt;/pre&gt;
&lt;p&gt;
and here our current client:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (goblins)
             (goblins actor-lib methods)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls))

(define vat (spawn-vat))
(define net (spawn-vat))

(define (^worker bcom)
  (methods
    ((square x)
     (* x x))))
(define client
  (with-vat vat (spawn ^worker)))

(define capn
  (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;))))
(with-vat net ($ capn 'register client 'tcp-tls))

(define name (second (command-line)))

(define server
  (with-vat vat
    (&amp;lt;- capn 'enliven (string-&amp;gt;ocapn-id (third (command-line))))))

(with-vat vat
  (&amp;lt;- server client name))

(sleep 3600)
&lt;/pre&gt;


&lt;h2&gt;Everything has an end, but Goblins&lt;/h2&gt;

&lt;p&gt;
It is mildly annoying that the scripts run forever (well, for one hour…)
and need to be stopped with &lt;code&gt;&amp;lt;ctrl-c&amp;gt;&lt;/code&gt;. But it is
somewhat difficult to decide when to stop: In both our scripts, the
control flow reaches the end of the programs, while Goblins are still
working in the background through promises.
It is possible to use
&lt;a href=&quot;https://github.com/wingo/fibers/wiki/Manual#25-conditions&quot;&gt;conditions&lt;/a&gt;
from &lt;a href=&quot;https://github.com/wingo/fibers/&quot;&gt;Guile Fibers&lt;/a&gt;, as
inspired by the
&lt;a href=&quot;https://files.spritely.institute/docs/guile-goblins/0.15.0/Example-Two-Goblins-programs-chatting-over-CapTP-via-Tor.html&quot;&gt;chat
example&lt;/a&gt; in the Goblins documentation. Since Fibers are a basic
ingredient of Goblins in Guile, they do not need to be installed
separately.
We can modify the client as follows:
&lt;/p&gt;
&lt;pre&gt;
(use-module (fibers conditions)
…
(define end (make-condition))
…
(define (^worker bcom)
  (methods
    ((square x)
     (* x x))
    ((finish)
     (signal-condition! end))))
…
(wait end)
&lt;/pre&gt;
&lt;p&gt;
First we import the &lt;code&gt;(fibers conditions)&lt;/code&gt; module. Then we create
the “condition” &lt;code&gt;end&lt;/code&gt;. We use &lt;code&gt;signal-condition!&lt;/code&gt;
to signal, well, that the condition has been fulfilled. And we replace
&lt;code&gt;sleep&lt;/code&gt;ing by &lt;code&gt;wait&lt;/code&gt;ing for the condition.
The signalling is encapsulated in a new method &lt;code&gt;'finish&lt;/code&gt; of the
&lt;code&gt;^worker&lt;/code&gt; actor, which can be called from the server as
&lt;/p&gt;
&lt;pre&gt;
(map (cut &amp;lt;- &amp;lt;&amp;gt; 'finish) ($ clients))
&lt;/pre&gt;
&lt;p&gt;
after the result of the computations has been printed.
This results in the following client script:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (fibers conditions)
             (goblins)
             (goblins actor-lib methods)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls))

(define vat (spawn-vat))
(define net (spawn-vat))
(define end (make-condition))

(define (^worker bcom)
  (methods
    ((square x)
     (* x x))
    ((finish)
     (signal-condition! end))))
(define client
  (with-vat vat (spawn ^worker)))

(define capn
  (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;))))
(with-vat net ($ capn 'register client 'tcp-tls))

(define name (second (command-line)))

(define server
  (with-vat vat
    (&amp;lt;- capn 'enliven (string-&amp;gt;ocapn-id (third (command-line))))))

(with-vat vat
  (&amp;lt;- server client name))

(wait end)
&lt;/pre&gt;
&lt;p&gt;
With the server script modified suitably as explained above, the clients
now end correctly, but the server crashes after printing the result of the
computations. A hasty decision we took earlier comes back to haunt us now:
Since there are more tasks than clients, we have filled the
&lt;code&gt;clients&lt;/code&gt; list with duplicates of the client actors so as to
send multiple &lt;code&gt;'square&lt;/code&gt; messages to the same actor; but now we
send multiple &lt;code&gt;'finish&lt;/code&gt; messages to clients that have stopped
running after the first such message, resulting in a scary error on the
server side that boils down to &lt;code&gt;&amp;amp;non-continuable&lt;/code&gt;.
To reach this correct conclusion more gracefully, we take another hasty
decision and deduplicate the clients list when calling finish:
&lt;/p&gt;
&lt;pre&gt;
(map (cut &amp;lt;- &amp;lt;&amp;gt; 'finish) (delete-duplicates ($ clients)))
&lt;/pre&gt;
&lt;p&gt;
An an excuse for our laziness in not looking for a more elegant solution,
we remark that anyway this part will be reworked later to obtain a more
flexible client queue.
&lt;/p&gt;
&lt;p&gt;
I have not found a similar approach to also have the server end gracefully.
If one places &lt;code&gt;signal-condition!&lt;/code&gt; in the code right after sending
the &lt;code&gt;'finish&lt;/code&gt; messages to the clients, then the clients do not end,
since it turns out that the server finishes so fast that the messages are
not actually sent. If one tries to wait for the promise coming out of the
&lt;code&gt;'finish&lt;/code&gt; calls, then this also fails, since the finished clients
cannot send back a function value any more.
So I keep the &lt;code&gt;sleep&lt;/code&gt; in the end and make it just a bit shorter.
The current &lt;code&gt;server.scm&lt;/code&gt; then looks like this:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (srfi srfi-26)
             (goblins)
             (goblins actor-lib cell)
             (goblins actor-lib joiners)
             (goblins actor-lib methods)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls))

(define vat (spawn-vat))
(define net (spawn-vat))

(define (print-id prefix id)
  (with-vat net
    (on id
      (lambda (sref)
        (format #t &amp;quot;~a ~a\n&amp;quot;
                   prefix (ocapn-id-&amp;gt;string sref))))))

(define capn
   (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;))))

(define (^registry bcom clients)
  (lambda (client name)
    ($ clients (cons client ($ clients)))
    (format #t &amp;quot;Registered ~a\n&amp;quot; name)))

(define clients (with-vat vat (spawn ^cell '())))
(define registry (with-vat vat (spawn ^registry clients)))
(let ((id (with-vat net ($ capn 'register registry 'tcp-tls))))
  (print-id &amp;quot;Server ID&amp;quot; id))

(while (not (eq? (length (with-vat vat ($ clients))) 2))
       (sleep 1))

(define v '(1 2 3 4 5))
(with-vat vat
  (while (&amp;lt; (length ($ clients)) (length v))
     (let ((c ($ clients)))
       ($ clients (append c c)))))
(with-vat vat
  (on (all-of* (map (cut &amp;lt;- &amp;lt;&amp;gt; 'square &amp;lt;&amp;gt;) ($ clients) v))
      (lambda (res)
        (format #t &amp;quot;~a\n&amp;quot; (sqrt (fold + 0 res)))
        (map (cut &amp;lt;- &amp;lt;&amp;gt; 'finish) (delete-duplicates ($ clients))))))

(sleep 10)
&lt;/pre&gt;


&lt;h2&gt;Résistez ! euh, persistez !&lt;/h2&gt;

&lt;p&gt;
Another annoyance in the current code is that the ocapn ID of the server
changes every time it is started, so that there is a lot of copy-pasting
for starting the clients. This turns from a minor annoyance into a problem
when different clients are supposed to be started independently all over
the Internet, and the ocapn ID is the de facto credential to enable
connections. Then a restart of the server script for any reason, be it a
power outage or an update, requires to communicate the new ID to all
participants. From the name of it, it sounds as if
&lt;a href=&quot;https://files.spritely.institute/docs/guile-goblins/0.15.0/Persistence.html&quot;&gt;persistence&lt;/a&gt;
could come to the rescue. We only need to persist the server.
In a first step, we add a bit of boilerplate, taken from the
documentation of
&lt;a href=&quot;https://files.spritely.institute/docs/guile-goblins/0.15.0/Persistent-Vats.html&quot;&gt;persistent
vats&lt;/a&gt;; this seems to be required when several vats with cross-references
to each other are to be persisted, but cannot do any harm in general.
&lt;/p&gt;
&lt;pre&gt;
(use-module (goblins vat)
…
(define persistence-vat (spawn-vat))
(define persistence-registry
  (with-vat persistence-vat
    (spawn ^persistence-registry)))
&lt;/pre&gt;
&lt;p&gt;
Then we follow the example on persistence in the documentation of the
&lt;a href=&quot;https://files.spritely.institute/docs/guile-goblins/0.15.0/TCP-_002b-TLS.html&quot;&gt;TCP
netlayer&lt;/a&gt; (after correcting a small error in the documentation for
version 0.15, which has been updated in the meantime) and replace
&lt;/p&gt;
&lt;pre&gt;
(define net (spawn-vat))
(define capn
   (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;))))
&lt;/pre&gt;
&lt;p&gt;
by
&lt;/p&gt;
&lt;pre&gt;
(use-module (goblins persistence-store syrup)
…
(define-values (net capn)
  (spawn-persistent-vat
    (make-persistence-env #:extends (list captp-env tcp-tls-netlayer-env))
    (lambda ()
      (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;)))
    (make-syrup-store &amp;quot;ocapn.syrup&amp;quot;)
    #:persistence-registry persistence-registry))
&lt;/pre&gt;
&lt;p&gt;
The &lt;code&gt;spawn-persistent-vat&lt;/code&gt; returns a number of values; the first
one is a new vat, the other ones are created by the &lt;code&gt;lambda&lt;/code&gt;
expression and correspond to actors in the vat which are to be persisted
(more precisely, they form the roots of the corresponding graph).
A persistence environment is passed as the first argument; it “knows” how
to store the different types of actors. In this case, we store to a file
named &lt;code&gt;ocapn.syrup&lt;/code&gt;, where syrup is the Goblins internal file
format.
&lt;/p&gt;
&lt;p&gt;
It is instructive to run the server and to inspect the ocapn ID it prints.
The general format seems to be
&lt;code&gt;ocapn://….tcp-tls/s/…?host=localhost&amp;amp;port=…&lt;/code&gt;
where the first ellipsis consists of 52 lower case letters and digits
(a 256 bit hash encoded in base 32?),
the second ellipsis consists of 43 lower and upper case letters, digits
and symbols (a 256 bit hash encoded in base 64?),
and the third ellipsis is a random port.
Previously, all three would change when invoking the script. Now the
sequence in the place of the first ellipsis as well as the port remain
fixed.
&lt;/p&gt;
&lt;p&gt;
So we need to persist more, in particular the actor that is registered
in the network layer. So we replace
&lt;/p&gt;
&lt;pre&gt;
(define (^registry bcom clients) …)
(define vat (spawn-vat))
(define clients (with-vat vat (spawn ^cell '())))
(define registry (with-vat vat (spawn ^registry clients)))
&lt;/pre&gt;
by
&lt;pre&gt;
(define-actor (^registry bcom clients) …)
(define-values (vat clients registry)
  (spawn-persistent-vat
    (make-persistence-env
      (list (list '((registry) ^registry) ^registry))
      #:extends cell-env)
    (lambda ()
      (let ((clients (spawn ^cell '())))
        (values
          clients
          (spawn ^registry clients))))
    (make-syrup-store &amp;quot;registry.syrup&amp;quot;)
    #:persistence-registry persistence-registry))
&lt;/pre&gt;
&lt;p&gt;
Notice the use of &lt;code&gt;define-actor&lt;/code&gt; instead of &lt;code&gt;define&lt;/code&gt;,
which appears to be necessary to achieve persistence.
Besides the cell actor known to Goblins from the actor-lib, we also need
to declare our self-defined actor of type &lt;code&gt;^registry&lt;/code&gt; in the
persistence environment; this is obtained by the rather indigest boiler
plate line creating nested lists. We use a second file,
&lt;code&gt;registry.syrup&lt;/code&gt;, to store this actor.
&lt;/p&gt;
&lt;p&gt;
However, this fails miserably, as the server crashes with an error message
containing keywords such as &lt;code&gt;vat-churn&lt;/code&gt; and
&lt;code&gt;vat-maybe-persist-changed-objs!&lt;/code&gt;.
What happens exactly seems to depend on timing. In this case there is a
176 byte file &lt;code&gt;registry.syrup&lt;/code&gt; containing a few strings
and binary data. I suppose it stores the empty client list and the
corresponding registry. After clients register, there is a “churn”
(which I understand as the vat taking a break after a turn is over),
and the persistence system tries to update the file. However, the client
list now contains an actor coming from the client script, that is, coming
over the network from potentially a different machine. Since this is not
under the control of the local script, it cannot be stored.
&lt;/p&gt;
&lt;p&gt;
There is apparently a very simple workaround. The
&lt;code&gt;spawn-persistent-vat&lt;/code&gt; function admits on optional parameter
&lt;code&gt;#:persist-on&lt;/code&gt;; if this is changed from the default
&lt;code&gt;'churn&lt;/code&gt; to something else, then the vat changes are not
stored at each churn. In effect, the vat is only stored once in the
beginning, and keeps an empty client list forever. This is actually
exactly what we need, an empty client list at each restart of the server.
So we end up with the following &lt;code&gt;server.scm&lt;/code&gt;:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (srfi srfi-26)
             (goblins)
             (goblins actor-lib cell)
             (goblins actor-lib joiners)
             (goblins actor-lib methods)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls)
             (goblins persistence-store syrup)
             (goblins vat))

(define persistence-vat (spawn-vat))
(define persistence-registry
  (with-vat persistence-vat
    (spawn ^persistence-registry)))

(define-values (net capn)
  (spawn-persistent-vat
    (make-persistence-env #:extends (list captp-env tcp-tls-netlayer-env))
    (lambda ()
      (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;)))
    (make-syrup-store &amp;quot;ocapn.syrup&amp;quot;)
    #:persistence-registry persistence-registry))

(define (print-id prefix id)
  (with-vat net
    (on id
      (lambda (sref)
        (format #t &amp;quot;~a ~a\n&amp;quot;
                   prefix (ocapn-id-&amp;gt;string sref))))))

(define-actor (^registry bcom clients)
  (lambda (client name)
    ($ clients (cons client ($ clients)))
    (format #t &amp;quot;Registered ~a\n&amp;quot; name)))

(define-values (vat clients registry)
  (spawn-persistent-vat
    (make-persistence-env
      (list (list '((registry) ^registry) ^registry))
      #:extends cell-env)
    (lambda ()
      (let ((clients (spawn ^cell '())))
        (values
          clients
          (spawn ^registry clients))))
    (make-syrup-store &amp;quot;registry.syrup&amp;quot;)
    #:persist-on #f
    #:persistence-registry persistence-registry))

(let ((id (with-vat net ($ capn 'register registry 'tcp-tls))))
  (print-id &amp;quot;Server ID&amp;quot; id))

(while (not (eq? (length (with-vat vat ($ clients))) 2))
       (sleep 1))

(define v '(1 2 3 4 5))
(with-vat vat
  (while (&amp;lt; (length ($ clients)) (length v))
     (let ((c ($ clients)))
       ($ clients (append c c)))))
(with-vat vat
  (on (all-of* (map (cut &amp;lt;- &amp;lt;&amp;gt; 'square &amp;lt;&amp;gt;) ($ clients) v))
      (lambda (res)
        (format #t &amp;quot;~a\n&amp;quot; (sqrt (fold + 0 res)))
        (map (cut &amp;lt;- &amp;lt;&amp;gt; 'finish) (delete-duplicates ($ clients))))))

(sleep 10)
&lt;/pre&gt;
&lt;p&gt;
It may be prudent now to remove all &lt;code&gt;.syrup&lt;/code&gt; files from previous
failed attempts. Running a server and two client scripts computes the
desired result as before. But now one notices that upon restarting the
server script, it prints the exact same ocapn ID as before. So the clients
can also be restarted with the exact same commands, and no more copy-pasting
is needed.
&lt;/p&gt;

&lt;/div&gt;</content></entry><entry><title>Goblins for number theory, part 2</title><id>https://enge.math.u-bordeaux.fr/blog/goblins-2.html</id><author><name>Andreas Enge</name><email>andreas.enge@inria.fr</email></author><updated>2025-02-25T00:00:00Z</updated><link href="https://enge.math.u-bordeaux.fr/blog/goblins-2.html" rel="alternate" /><content type="html">&lt;div&gt;

&lt;h1&gt;Parallel Goblins&lt;/h1&gt;

&lt;p&gt;
After seeing how to use the
&lt;a href=&quot;goblins-1.html&quot;&gt;programming concepts&lt;/a&gt;
of &lt;a href=&quot;https://spritely.institute/goblins/&quot;&gt;Goblins&lt;/a&gt;
for a toy problem the structure of which resembles algorithms encountered
in number theory, let us turn our attention to parallelising, or rather
distributing the code. We keep the running example of computing the length
of a vector, by giving out the tasks of squaring to the clients, and leaving
the task of adding up the squares and taking the final square root to the
server.
&lt;/p&gt;


&lt;h2&gt;Networking&lt;/h2&gt;

&lt;p&gt;
Communication in Goblins is abstracted over what is called the
“&lt;a href=&quot;https://spritely.institute/files/docs/guile-goblins/0.13.0/OCapN.html&quot;&gt;Object
Capabilities Network&lt;/a&gt;”, or “OCapN”. This somewhat frightening term
simply means that a function in one script may call functions in another
script running elsewhere in the network.
&lt;/p&gt;
&lt;p&gt;
Goblins suggests to use &lt;a href=&quot;https://www.torproject.org&quot;&gt;Tor&lt;/a&gt;
as the underlying network. Indeed after setting up a Tor daemon
as described in the
&lt;a href=&quot;https://spritely.institute/files/docs/guile-goblins/0.13.0/Launching-a-Tor-daemon-for-Goblins.html&quot;&gt;
Goblins documentation&lt;/a&gt; on my laptop, the provided
&lt;a href=&quot;https://spritely.institute/files/docs/guile-goblins/0.13.0/Example-Two-Goblins-programs-chatting-over-CapTP-via-Tor.html&quot;&gt;example&lt;/a&gt;
of a chat client Alice talking to a chat server Bob works directly out of
the box.
This should also make it relatively easy to run distributed projects over
the Internet, which would fit the idea of using Goblins for popular
science projects.
&lt;/p&gt;
&lt;p&gt;
On the other hand, institutional computing clusters tend to limit network
access, sometimes even blocking outgoing HTTP requests to servers outside
a whitelist. So it is unlikely that the Tor approach will work in this
setting. Also it appears that Tor needs to have access to the Internet
for bootstrapping: The chat script does not run purely locally after
turning off Internet access.
It may be possible to set up Tor in a specific way to cover such local
use cases, but so far my knowledge of Tor is limited to what is described
in the Goblins documentation.
The documentation points to the possibility of using
&lt;a href=&quot;https://spritely.institute/files/docs/guile-goblins/0.13.0/TCP-_002b-TLS.html&quot;&gt;TCP&lt;/a&gt;.
This requires that the participating nodes know each other's IP address
or hostname, which sounds restrictive, but since I am currently using MPI
over TCP, OpenMPI seems to somehow be able to determine these
addresses, so it should also be a feasible option with Goblins.
But for the time being let us assume that we are working with a machine
that has access to the Tor network after setting up the Tor daemon as
taught by the Goblins documentation; we will come back to the TCP setting
below.
&lt;/p&gt;


&lt;h2&gt;From chatting to computing&lt;/h2&gt;
&lt;p&gt;
When saying that OCapN enables a function to call functions running
somewhere else in the Tor network, one should more precisely use the term
“actor” instead of “function”; and as seen before, these
do not return values, but promises that resolve to the desired values. But
it is conceptually helpful to think of calls to outsourced functions.
So in our very simple model inspired by algorithmic number theory, we will
have a client script that runs in a number of identical copies, and a
server script that calls functions defined in the clients.
This is in fact much easier to programme than with MPI, where the exchange
of function arguments and results requires explicit
&lt;code&gt;MPISend&lt;/code&gt; and matching &lt;code&gt;MPIReceive&lt;/code&gt; statements in
the server and the client, and where furthermore complex data types need
to be serialised by hand since the communication functions work only
with basic, scalar types. Finally it is necessary to carefully and
explicitly craft the control flows of the different programs exchanging
data so that indeed the data sending statements exactly match the data
receiving statements; otherwise there will be a deadlock.
In the Goblins framework, this is all implicit.
As an end result a distributed code does not look very different from
the corresponding serial code.
&lt;/p&gt;
&lt;p&gt;
But we still need to make a few things explicit:
First of all, the different running scripts need to connect to the
network. And the functions to be called remotely need to obtain a
unique identifier and advertise it, and the caller needs to know this
identifier to make the call. Luckily in our setting, most of the
corresponding code can be considered as copy-pastable boilerplate.
&lt;/p&gt;
&lt;p&gt;
Indeed the chat example can be transposed to our running example of
vector lengths almost immediately.
&lt;/p&gt;
&lt;p&gt;
Let us start with the client, to be put into a file
&lt;code&gt;client.scm&lt;/code&gt;
(compared with the chat example, the client and server roles are
reversed):
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (goblins)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer onion))

(define vat (spawn-vat))
(define net (spawn-vat))

;; Define the client functionality.
(define (^square bcom)
  (lambda (x)
    (* x x)))
(define client
  (with-vat vat (spawn ^square)))

;; Helper function for printing IDs.
(define (print-id prefix id)
  (with-vat net
    (on id
      (lambda (sref)
        (format #t &amp;quot;~a ~a\n&amp;quot;
                   prefix (ocapn-id-&amp;gt;string sref))))))

;; Create a communicator.
(define mycapn
  (with-vat net (spawn-mycapn (spawn ^onion-netlayer))))
;; Create an ID for the client and print it.
(define id
  (with-vat net ($ mycapn 'register client 'onion)))
(print-id &amp;quot;Client ID&amp;quot; id)

;; Wait for requests.
(sleep 3600)
&lt;/pre&gt;
&lt;p&gt;
The chat example uses two vats to separate the networking part and the
actual functionality. This does not seem to be strictly necessary (the
examples also work when everything is put into the same vat); but if I
understand correctly, each vat corresponds to a separate, concurrent event
loop, so having several vats might help to prevent deadlocks and possibly
speed things up by separating communication and computation, so I follow
the example and declare two vats from the start, &lt;code&gt;net&lt;/code&gt; for
everything network related and &lt;code&gt;vat&lt;/code&gt; for everything else.
The client actor in the main vat is defined as before through the function
computing a square.
&lt;/p&gt;
&lt;p&gt;
A network connection &lt;code&gt;mycapn&lt;/code&gt; is defined, the client actor
is registered with it and its network ID &lt;code&gt;id&lt;/code&gt; is obtained
through some magic incantations. Before version 0.15.0 of Goblins,
&lt;code&gt;id&lt;/code&gt; used to be a value, but now it is a promise.
So before printing it as a string by applying the
&lt;code&gt;ocapn-id-&amp;gt;string&lt;/code&gt; function, one needs to wait for the
resolution of the promise; this is moved into the helper function
&lt;code&gt;print-id&lt;/code&gt;. This string value will be used to communicate
the ID manually to the server later on.
&lt;/p&gt;
&lt;p&gt;
Finally, we just wait for requests to compute squares (the chat example
has a more sophisticated approach to waiting using Guile fibers, but
&lt;code&gt;sleep&lt;/code&gt; is enough for illustration purposes).
&lt;/p&gt;
&lt;p&gt;
The corresponding server follows, to be put into a file
&lt;code&gt;server.scm&lt;/code&gt;:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (goblins)
             (goblins actor-lib joiners)
             (goblins actor-lib let-on)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer onion))

(define vat (spawn-vat))
(define net (spawn-vat))

(define mycapn
   (with-vat net (spawn-mycapn (spawn ^onion-netlayer))))

;; Enliven the clients.
(define client1
  (with-vat vat
    (&amp;lt;- mycapn 'enliven (string-&amp;gt;ocapn-id (second (command-line))))))
(define client2
  (with-vat vat
    (&amp;lt;- mycapn 'enliven (string-&amp;gt;ocapn-id (third (command-line))))))

(define (^len bcom)
  (lambda (v)
    (on (all-of (&amp;lt;- client1 (first v))(&amp;lt;- client2 (second v)))
        (lambda (res)
          (sqrt (fold + 0 res)))
        #:promise? #t)))

(define len (with-vat vat (spawn ^len)))

(with-vat vat
  (let-on ((l ($ len '(3 4))))
    (format #t &amp;quot;~a\n&amp;quot; l)))

;; Wait for the result to be computed, otherwise nothing will be printed.
(sleep 3600)
&lt;/pre&gt;
&lt;p&gt;
The server also starts by connecting to the network, and then it registers
(or “enlivens” in Goblins parlance) two clients by magic incantations.
The IDs of the clients are supposed to be passed as string
arguments through the commandline, which are retrieved by
&lt;code&gt;(second (command-line))&lt;/code&gt; and
&lt;code&gt;(third (command-line))&lt;/code&gt;, respectively
(as with &lt;code&gt;argv&lt;/code&gt; in C, the first argument is the name of the
program or Guile script itself, and unlike in C, counting starts with 1,
not 0).
So we obtain local variables
&lt;code&gt;client1&lt;/code&gt; and &lt;code&gt;client2&lt;/code&gt;.
The remainder of the code is the same as in the serial example,
except that we again combine the norm and square root computations into
one function &lt;code&gt;len&lt;/code&gt;.
Finally we add a bit of waiting: This is necessary to wait for the
resolution of the promises, since &lt;code&gt;let-on&lt;/code&gt; does apparently
not do so; otherwise the server script will terminate before the result
of the computation is printed.
&lt;/p&gt;
&lt;p&gt;
To run the example, do not forget to start the Tor daemon with the command
&lt;/p&gt;
&lt;pre&gt;
tor -f $HOME/.config/goblins/tor-config.txt
&lt;/pre&gt;
&lt;p&gt;
Then open three terminals, and in two of them launch a client with the
command
&lt;/p&gt;
&lt;pre&gt;
guile client.scm
&lt;/pre&gt;
&lt;p&gt;
and copy the two URIs of the form &lt;code&gt;ocapn://…&lt;/code&gt;
In the third terminal, start the server with the command
&lt;/p&gt;
&lt;pre&gt;
guile server.scm ocapn://… ocapn://…
&lt;/pre&gt;
&lt;p&gt;
where the &lt;code&gt;ocapn://…&lt;/code&gt; command line arguments are pasted
from the client output.
After a few seconds the server will print the result of the computation,
and all three programs can be stopped using the
&lt;code&gt;&amp;lt;ctrl&amp;gt;-&amp;lt;c&amp;gt;&lt;/code&gt; key combination.
&lt;/p&gt;
&lt;p&gt;
If nothing happens, chances are there is a problem with the Tor network;
the file &lt;code&gt;$HOME/.cache/goblins/tor/tor-log.txt&lt;/code&gt; may contain
hints. In particular, the network needs to be 100% bootstrapped.
&lt;/p&gt;


&lt;h2&gt;TCP instead of onions, after all&lt;/h2&gt;

&lt;p&gt;
Even if used only locally, the need to access the Internet makes the Tor
protocol relatively slow; connections can fail, and this makes debugging
somewhat painful – it is not easy to distinguish a deadlock in the program
code from a poorly working network. The Goblins documentation does not
provide a working example for using
&lt;a href=&quot;https://spritely.institute/files/docs/guile-goblins/0.13.0/TCP-_002b-TLS.html&quot;&gt;TCP&lt;/a&gt;,
but moving from Tor to TCP is relatively straightforward:
Replace all occurrences of the substring &lt;code&gt;onion&lt;/code&gt; in the
scripts above (also in the name &lt;code&gt;^onion-netlayer&lt;/code&gt;
and the symbol &lt;code&gt;'onion&lt;/code&gt;) by &lt;code&gt;tcp-tls&lt;/code&gt;, then add
the parameter &lt;code&gt;&amp;quot;localhost&amp;quot;&lt;/code&gt; to the invocation of
&lt;code&gt;(spawn ^tcp-tls-netlayer)&lt;/code&gt;.
To simplify copying and pasting, here is the resulting code for
&lt;code&gt;client.scm&lt;/code&gt;:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (goblins)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls))

(define vat (spawn-vat))
(define net (spawn-vat))

;; Define the client functionality.
(define (^square bcom)
  (lambda (x)
    (* x x)))
(define client
  (with-vat vat (spawn ^square)))

;; Helper function for printing IDs.
(define (print-id prefix id)
  (with-vat net
    (on id
      (lambda (sref)
        (format #t &amp;quot;~a ~a\n&amp;quot;
                   prefix (ocapn-id-&amp;gt;string sref))))))

;; Create a communicator.
(define mycapn
  (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;))))
;; Create an ID for the client and print it.
(define id
  (with-vat net ($ mycapn 'register client 'tcp-tls)))
(print-id &amp;quot;Client ID&amp;quot; id)

;; Wait for requests.
(sleep 3600)
&lt;/pre&gt;
&lt;p&gt;
And &lt;code&gt;server.scm&lt;/code&gt; becomes the following code:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (goblins)
             (goblins actor-lib joiners)
             (goblins actor-lib let-on)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls))

(define vat (spawn-vat))
(define net (spawn-vat))

(define mycapn
   (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;))))

;; Enliven the clients.
(define client1
  (with-vat vat
    (&amp;lt;- mycapn 'enliven (string-&amp;gt;ocapn-id (second (command-line))))))
(define client2
  (with-vat vat
    (&amp;lt;- mycapn 'enliven (string-&amp;gt;ocapn-id (third (command-line))))))

(define (^len bcom)
  (lambda (v)
    (on (all-of (&amp;lt;- client1 (first v))(&amp;lt;- client2 (second v)))
        (lambda (res)
          (sqrt (fold + 0 res)))
        #:promise? #t)))

(define len (with-vat vat (spawn ^len)))

(with-vat vat
  (let-on ((l ($ len '(3 4))))
    (format #t &amp;quot;~a\n&amp;quot; l)))

;; Wait for the result to be computed, otherwise nothing will be printed.
(sleep 3600)
&lt;/pre&gt;

&lt;p&gt;
When starting the client, notice that the ID changes from a URI of the
form &lt;code&gt;ocapn://….onion/…&lt;/code&gt; to one of the form
&lt;code&gt;ocapn://….tcp-tls/…?host=localhost&amp;amp;port=…&lt;/code&gt;,
where the port is chosen at random; the two clients and the server will
each get their own port. (If desired, a given port can be chosen by
adding a parameter such as &lt;code&gt;#:port 12345&lt;/code&gt; after
&lt;code&gt;&amp;quot;localhost&amp;quot;&lt;/code&gt; in the invocation of
&lt;code&gt;(spawn ^tcp-tls-netlayer)&lt;/code&gt;.)
Due to the special character &lt;code&gt;&amp;amp;&lt;/code&gt; in the URI, it is necessary
to enclose it in a pair of apostrophes &lt;code&gt;'&lt;/code&gt; on the command line,
so one needs to start the server with the command
&lt;/p&gt;
&lt;pre&gt;
guile server.scm 'ocapn://…' 'ocapn://…'
&lt;/pre&gt;
&lt;p&gt;
That the parameter &lt;code&gt;'onion&lt;/code&gt; or &lt;code&gt;'tcp-tls&lt;/code&gt; is
required in function calls such as
&lt;code&gt;($ mycapn 'register client 'tcp-tls)&lt;/code&gt; is a surprising
design choice in Goblins:
When spawning the &lt;code&gt;mycapn&lt;/code&gt; variable, a netlayer is passed as
a parameter, so in theory the variable should be able to memorise the
kind of network setting it is attached to.
&lt;/p&gt;
&lt;p&gt;
Notice that with TCP, the result of the computation is printed immediately,
whereas it takes a few seconds with Tor. So to ease debugging, we will from
now on keep the TCP setting; going back to Tor is straightforward.
&lt;/p&gt;


&lt;h2&gt;Registering clients&lt;/h2&gt;

&lt;p&gt;
The approach in which the server needs to know all client IDs beforehand
becomes unwieldy in a context where we expect hundreds or even thousands
of computation cores. It would be preferable to use a two-stage process:
The server publishes its ID, and the clients use it to connect to the
server and to register their IDs. Then in a second step the server can
send computing tasks to the clients. We will gradually transform the
example code to end up with such a solution.
&lt;/p&gt;
&lt;p&gt;
First of all, let us replace the fixed number (in our case, 2) of client
variables by a more dynamic structure, a list of clients; for this, it is
enough to modify the server as follows:
&lt;/p&gt;
&lt;pre&gt;
(define clients
  (with-vat vat
    (map (lambda (uri)
           (&amp;lt;- mycapn 'enliven (string-&amp;gt;ocapn-id uri)))
         (list-tail (command-line) 1))))

(define (^len bcom)
  (lambda (v)
    (on (all-of* (map &amp;lt;- clients v))
        (lambda (res)
          (sqrt (fold + 0 res)))
        #:promise? #t)))
&lt;/pre&gt;
&lt;p&gt;
So here the client variable becomes a list instantiated using the
(in principle variable number of) IDs passed on the command line.
The variant &lt;code&gt;all-of*&lt;/code&gt; of the joiner is used to treat lists
of promises.
Notice that &lt;code&gt;&amp;lt;-&lt;/code&gt; can be used as any other function in
a &lt;code&gt;map&lt;/code&gt; statement:
&lt;code&gt;(map &amp;lt;- clients v)&lt;/code&gt; matches the two clients with the two
entries of the vector and returns a list of promises resolving to the
squares (for the time being we still assume that the length of the client
list matches the length of the vector).
&lt;/p&gt;
&lt;p&gt;
While we are at it, we may as well hold the list in a
&lt;a href=&quot;https://spritely.institute/files/docs/guile-goblins/0.13.0/Cell.html&quot;&gt;cell&lt;/a&gt;
actor, as a way of introducing state by the backdoor: The cell may hold
values that are exchanged throughout the program execution.
&lt;/p&gt;
&lt;pre&gt;
(use-modules (goblins actor-lib cell))
…
(define clients (with-vat vat (spawn ^cell '())))
(with-vat vat
  ($ clients (map (lambda (uri)
                    (&amp;lt;- mycapn 'enliven (string-&amp;gt;ocapn-id uri)))
                  (list-tail (command-line) 1))))

(define (^len bcom)
  (lambda (v)
    (on (all-of* (map &amp;lt;- ($ clients) v))
        (lambda (res)
          (sqrt (fold + 0 res)))
        #:promise? #t)))
&lt;/pre&gt;
&lt;p&gt;
So instead of creating a list, we spawn a cell containing an empty list;
then we put a different value into the cell by applying the &lt;code&gt;$&lt;/code&gt;
function to it with the desired new value as additional argument.
Later we extract the list by applying the &lt;code&gt;$&lt;/code&gt; function without
additional argument to the cell (since we are in the same vat, we may use
&lt;code&gt;$&lt;/code&gt; instead of &lt;code&gt;&amp;lt;-&lt;/code&gt; and need not worry about
promise resolution).
&lt;/p&gt;
&lt;p&gt;
We are now prepared to implement the registration of clients in the server.
For this, we create a new type of agent which takes a URI identifying a
client and which adds it to the list of clients in the cell. To see
that something actually happens, we then print the added URI:
&lt;/p&gt;
&lt;pre&gt;
(define (^register bcom)
  (lambda (uri)
    ($ clients (cons (&amp;lt;- mycapn 'enliven (string-&amp;gt;ocapn-id uri))
                     ($ clients)))
    (format #t &amp;quot;Registered ~a\n&amp;quot; uri)))
&lt;/pre&gt;
&lt;p&gt;
We create an instance of this agent type, add it to the network and
print its ID (using the same &lt;code&gt;print-id&lt;/code&gt; function):
&lt;/p&gt;
&lt;pre&gt;
(define register (with-vat vat (spawn ^register)))
(let ((id (with-vat net ($ mycapn 'register register 'tcp-tls))))
  (print-id &amp;quot;Server ID&amp;quot; id))
&lt;/pre&gt;
&lt;p&gt;
Finally we can use this new register function instead of the ad-hoc
creation to add the clients from the command line to the list:
&lt;/p&gt;
&lt;pre&gt;
(with-vat vat
  (map (lambda (uri)
         ($ register uri))
       (list-tail (command-line) 1)))
&lt;/pre&gt;
&lt;p&gt;
Altogether we arrive at the following code, which can replace the
&lt;code&gt;server.scm&lt;/code&gt; script while keeping the current clients:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (goblins)
             (goblins actor-lib cell)
             (goblins actor-lib joiners)
             (goblins actor-lib let-on)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls))

(define vat (spawn-vat))
(define net (spawn-vat))

(define (print-id prefix id)
  (with-vat net
    (on id
      (lambda (sref)
        (format #t &amp;quot;~a ~a\n&amp;quot;
                   prefix (ocapn-id-&amp;gt;string sref))))))

(define mycapn
   (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;))))

;; Register clients.
(define clients (with-vat vat (spawn ^cell '())))

(define (^register bcom)
  (lambda (uri)
    ($ clients (cons (&amp;lt;- mycapn 'enliven (string-&amp;gt;ocapn-id uri))
                     ($ clients)))
    (format #t &amp;quot;Registered ~a\n&amp;quot; uri)))

(define register (with-vat vat (spawn ^register)))
(let ((id (with-vat net ($ mycapn 'register register 'tcp-tls))))
  (print-id &amp;quot;Server ID&amp;quot; id))

(with-vat vat
  (map (lambda (uri)
         ($ register uri))
       (list-tail (command-line) 1)))

;; Use clients.
(define (^len bcom)
  (lambda (v)
    (on (all-of* (map &amp;lt;- ($ clients) v))
        (lambda (res)
          (sqrt (fold + 0 res)))
        #:promise? #t)))

(define len (with-vat vat (spawn ^len)))

(with-vat vat
  (let-on ((l ($ len '(3 4))))
    (format #t &amp;quot;~a\n&amp;quot; l)))

(sleep 3600)
&lt;/pre&gt;
&lt;p&gt;
Now it is time to swap the roles! We first start the server without
command line arguments (as it is written, it then just has an initial
empty client list):
&lt;/p&gt;
&lt;pre&gt;
guile server.scm
&lt;/pre&gt;
&lt;p&gt;
Two clients are now started using the URI printed by the server
as a command line argument:
&lt;/p&gt;
&lt;pre&gt;
guile client.scm 'ocapn://…'
guile client.scm 'ocapn://…'
&lt;/pre&gt;
&lt;p&gt;
For this to work, we need to add to the client script the necessary
(and straightforward) code to enliven the server and to remotely register
the client with the server.
We end up with the following script &lt;code&gt;client.scm&lt;/code&gt;:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (goblins)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls))

(define vat (spawn-vat))
(define net (spawn-vat))

(define (^square bcom)
  (lambda (x)
    (* x x)))
(define client
  (with-vat vat (spawn ^square)))

(define (print-id prefix id)
  (with-vat net
    (on id
      (lambda (sref)
        (format #t &amp;quot;~a ~a\n&amp;quot;
                   prefix (ocapn-id-&amp;gt;string sref))))))

(define mycapn
  (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;))))
(define id
  (with-vat net ($ mycapn 'register client 'tcp-tls)))
(print-id &amp;quot;Client ID&amp;quot; id)

;; Enliven server.
(define server
  (with-vat vat
    (&amp;lt;- mycapn 'enliven (string-&amp;gt;ocapn-id (second (command-line))))))

;; Register with server.
(with-vat vat
  (on id
    (lambda (id)
      (&amp;lt;- server id))))

(sleep 3600)
&lt;/pre&gt;
&lt;p&gt;
Notice that we have slightly modified registration with the server:
Since the client communicates directly with the server, there is no need
to go through a string representation of the ID, which we may use directly
as an argument to the function call.
This assumes that in the server, registration has been modified as follows:
&lt;/p&gt;
&lt;pre&gt;
(define (^register bcom)
  (lambda (id)
    ($ clients (cons (&amp;lt;- mycapn 'enliven id)
                     ($ clients)))
    (print-id &amp;quot;Registered&amp;quot; id)))
&lt;/pre&gt;
&lt;p&gt;
Running the server and two copies of the client, one should now see the
client IDs printed in their respective terminals, and messages in the
server terminal that these clients have been registered.
However, the desired length 5 is not printed. In fact, the 
&lt;code&gt;len&lt;/code&gt; actor is called at the end of the server script
before the clients have had a chance to register through the network
(actually even before the clients are started), so the
expression &lt;code&gt;($ clients)&lt;/code&gt; yields an empty list.
Now the &lt;code&gt;map&lt;/code&gt; function from
&lt;a href=&quot;https://www.gnu.org/software/guile/manual/html_node/SRFI_002d1-Fold-and-Map.html#index-map-1&quot;&gt;SRFI
1&lt;/a&gt; also truncates &lt;code&gt;v&lt;/code&gt; to the
empty list, &lt;code&gt;all-of*&lt;/code&gt; resolves to the empty list, and
&lt;code&gt;fold&lt;/code&gt; returns the starting value 0, which is actually printed
before the two client IDs.
&lt;/p&gt;
&lt;p&gt;
This can be solved by having the server wait until the desired number of
clients has registered, by adding the following code:
&lt;/p&gt;
&lt;pre&gt;
(define v '(3 4))
(while (not (eq? (length (with-vat vat ($ clients))) (length v)))
       (sleep 1))
&lt;/pre&gt;
&lt;p&gt;
As a warning, the equivalently looking lines
&lt;/p&gt;
&lt;pre&gt;
(define v '(3 4))
(with-vat vat
  (while (not (eq? (length ($ clients)) (length v)))
         (sleep 1)))
&lt;/pre&gt;
&lt;p&gt;
result in a deadlock in which none of the clients get a chance to
register. It looks as if operations inside &lt;code&gt;with-vat&lt;/code&gt;
block the vat so that it does not handle incoming remote function
calls.
&lt;/p&gt;
&lt;p&gt;
After also removing the code that registers clients specified on the
command line, the server script currently looks like this:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (goblins)
             (goblins actor-lib cell)
             (goblins actor-lib joiners)
             (goblins actor-lib let-on)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls))

(define vat (spawn-vat))
(define net (spawn-vat))

(define (print-id prefix id)
  (with-vat net
    (on id
      (lambda (sref)
        (format #t &amp;quot;~a ~a\n&amp;quot;
                   prefix (ocapn-id-&amp;gt;string sref))))))

(define mycapn
   (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;))))

;; Register clients.
(define clients (with-vat vat (spawn ^cell '())))

(define (^register bcom)
  (lambda (id)
    ($ clients (cons (&amp;lt;- mycapn 'enliven id)
                     ($ clients)))
    (print-id &amp;quot;Registered&amp;quot; id)))

(define register (with-vat vat (spawn ^register)))
(let ((id (with-vat net ($ mycapn 'register register 'tcp-tls))))
  (print-id &amp;quot;Server ID&amp;quot; id))

;; Use clients.
(define (^len bcom)
  (lambda (v)
    (on (all-of* (map &amp;lt;- ($ clients) v))
        (lambda (res)
          (sqrt (fold + 0 res)))
        #:promise? #t)))

(define len (with-vat vat (spawn ^len)))

;; Wait until enough clients have registered.
(define v '(3 4))
(while (not (eq? (length (with-vat vat ($ clients))) (length v)))
       (sleep 1))

(with-vat vat
  (let-on ((l ($ len '(3 4))))
    (format #t &amp;quot;~a\n&amp;quot; l)))

(sleep 3600)
&lt;/pre&gt;
&lt;p&gt;
Notice that the same code can be run for vectors with different numbers
of entries; it just requires that (at least) as many clients connect as
there are tasks to handle.
As a small caveat, the code is correct as we did not implement an
unregister procedure for the clients, so their number is monotonically
increasing – otherwise it would be possible that between the arrival
of the second client and the call to the &lt;code&gt;len&lt;/code&gt; function,
one of the clients has disappeared again and the &lt;code&gt;clients&lt;/code&gt;
list contains only one entry, say. Then the
&lt;a href=&quot;https://www.gnu.org/software/guile/manual/html_node/SRFI_002d1-Fold-and-Map.html#index-map-1&quot;&gt;SRFI-1
map&lt;/a&gt;
function we are using, which accepts lists of different lengths by
truncating them all to the smallest occurring length, would only consider
the first entry of &lt;code&gt;v&lt;/code&gt;, and the incorrect length 3 would be
computed.
&lt;/p&gt;
&lt;p&gt;
In a more realistic setting, there are more computing tasks than clients.
When these take all more or less the same time, they may be evenly split
between the available clients. For instance, the following server code
waits for two clients to connect and then computes the length of vectors
of arbitrary dimension:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (goblins)
             (goblins actor-lib cell)
             (goblins actor-lib joiners)
             (goblins actor-lib let-on)
             (goblins ocapn ids)
             (goblins ocapn captp)
             (goblins ocapn netlayer tcp-tls))

(define vat (spawn-vat))
(define net (spawn-vat))

(define (print-id prefix id)
  (with-vat net
    (on id
      (lambda (sref)
        (format #t &amp;quot;~a ~a\n&amp;quot;
                   prefix (ocapn-id-&amp;gt;string sref))))))

(define mycapn
   (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer &amp;quot;localhost&amp;quot;))))

;; Register clients.
(define clients (with-vat vat (spawn ^cell '())))

(define (^register bcom)
  (lambda (id)
    ($ clients (cons (&amp;lt;- mycapn 'enliven id)
                     ($ clients)))
    (print-id &amp;quot;Registered&amp;quot; id)))

(define register (with-vat vat (spawn ^register)))
(let ((id (with-vat net ($ mycapn 'register register 'tcp-tls))))
  (print-id &amp;quot;Server ID&amp;quot; id))

;; Use clients.
(define (^len bcom)
  (lambda (v)
    (on (all-of* (map &amp;lt;- ($ clients) v))
        (lambda (res)
          (sqrt (fold + 0 res)))
        #:promise? #t)))

(define len (with-vat vat (spawn ^len)))

(while (not (eq? (length (with-vat vat ($ clients))) 2))
       (sleep 1))

(define v '(1 2 3 4 5))
(with-vat vat
  (while (&amp;lt; (length ($ clients)) (length v))
     (let ((c ($ clients)))
       ($ clients (append c c)))))

(with-vat vat
  (let-on ((l ($ len v)))
    (format #t &amp;quot;~a\n&amp;quot; l)))

(sleep 3600)
&lt;/pre&gt;
&lt;p&gt;
The code somewhat crudely “doubles” the client list until there are
at least as many occurrences of clients (with multiplicities) as tasks;
then &lt;code&gt;map&lt;/code&gt; does the right thing.
&lt;/p&gt;
&lt;p&gt;
This simple situation occurs surprisingly often in number theory.
For instance in ECPP, one needs to compute many modular square roots for
the same modulus; trial factor many batches of numbers of the same size;
do many primality tests for numbers of the same size. However, the more
general case of tasks taking more or less long also occurs (in ECPP,
for instance, when computing roots of class polynomials of vastly differing
degrees). The relative task durations are also not necessarily easy to
estimate.
In a more distributed setting, one can also imagine that even homogeneous
tasks are more or less quickly solved with more or less powerful
participating machines.
Scheduling tasks by hand is thus not realistic in general.
Instead, one would need a more dynamic approach, in which the server
maintains a list of tasks and a list of clients; whenever a client is
idle it should be sent a new task.
&lt;/p&gt;
&lt;p&gt;
Given the length of this second part, this is a question I plan to pursue
in another instalment.
&lt;/p&gt;

&lt;p&gt;
This blog post, originally published on 2024-09-04, was updated
on 2025-02-25 to cover changes between Goblins 0.13.0 and 0.15.0
and to incorporate minor improvements.
&lt;/p&gt;

&lt;/div&gt;</content></entry><entry><title>Goblins for number theory, part 1</title><id>https://enge.math.u-bordeaux.fr/blog/goblins-1.html</id><author><name>Andreas Enge</name><email>andreas.enge@inria.fr</email></author><updated>2024-09-04T00:00:00Z</updated><link href="https://enge.math.u-bordeaux.fr/blog/goblins-1.html" rel="alternate" /><content type="html">&lt;div&gt;

&lt;h1&gt;Starting with Goblins&lt;/h1&gt;

&lt;h2&gt;Motivation&lt;/h2&gt;

&lt;p&gt;
Most of my code in algorithmic number theory is written in C and runs in a
parallelised version using MPI on a cluster.
The C language is mandatory for efficiency reasons;
MPI is mostly a convenience. Indeed number theoretic code is often
embarrassingly parallel. For instance my
&lt;a href=&quot;https://www.multiprecision.org/cm/ecpp.html&quot;&gt;ECPP
implementation&lt;/a&gt; for primality proving essentially consists of a number
of &lt;code&gt;for&lt;/code&gt; loops running one after the other. A server process
distributes evaluations of the function inside the loop to the available
clients, which take a few seconds or even minutes to report back their
respective results. These are then handled by the server before entering
the next loop.
In a cluster environment with a shared file system, this could even be
realised by starting a number of clients over SSH, starting a server, and
then using numbered files to exchange function arguments and results,
or &lt;code&gt;touch&lt;/code&gt; on files to send “signals” between the server and
the clients.
Computation time is the bottleneck, communication is minimal, so even
doing this over NFS is perfectly feasible (and I have written and deployed
such code in the past).
MPI then provides a convenience layer that makes the process look more
professional, and also integrates more smoothly with the batch submission
approach of computation clusters.
&lt;/p&gt;
&lt;p&gt;
The very loosely coupled nature of number theoretic computations should
make it possible to distribute them beyond a cluster. Why not even do
a primality proof with several participants working together over the
Internet?
I have looked at &lt;a href=&quot;https://boinc.berkeley.edu/&quot;&gt;BOINC&lt;/a&gt;
previously; but the system seems to be intended for completely uncoupled
problems, essentially exploration of a large search space. The work is
cut up into a number of independent tasks that are sent out to the
participants; if they do not report back in a few days, the same task is
sent out again, and over several months all tasks are treated.
While number theoretic computations may also take a few months, they do
require at least some synchronisation, and the server needs to hear back
from the clients every few minutes so as not to be blocked.
(Each &lt;code&gt;for&lt;/code&gt; loop is embarrassingly parallel, but several loops
must be run sequentially.)
Also the “administrative” overhead of things to be done for BOINC outside
the program itself looks rather daunting: setting up a database, for
instance.
So I have been looking for a programming environment that is somewhere
between MPI and BOINC, making loosely coupled computations possible;
it should be able to run over the Internet and not only on a cluster
connected by SSH;
and it should result in code that is relatively easy to write, and
for which just as with MPI the parallel version does not look very
different from the sequential one.
(In my C code, I usually end up having everything in one file,
with the parallel and the sequential versions being handled by
alternating blocks selected by &lt;code&gt;#ifdef&lt;/code&gt;.)
&lt;/p&gt;
&lt;p&gt;
&lt;a href=&quot;https://spritely.institute/goblins/&quot;&gt;Goblins&lt;/a&gt; is a distributed
programming environment by the
&lt;a href=&quot;https://spritely.institute/&quot;&gt;Spritely Institute&lt;/a&gt;
that seems to fit the bill. It is meant for distributed programming, and
locally running code can seamlessly be run over
&lt;a href=&quot;https://spritely.institute/files/docs/guile-goblins/0.13.0/OCapN.html&quot;&gt;networks&lt;/a&gt;
using various mechanisms such as
&lt;a href=&quot;https://spritely.institute/files/docs/guile-goblins/0.13.0/Tor-Onion-Services.html&quot;&gt;Tor&lt;/a&gt;
or simply
&lt;a href=&quot;https://spritely.institute/files/docs/guile-goblins/0.13.0/TCP-_002b-TLS.html&quot;&gt;TCP
and TLS&lt;/a&gt;.
On the other hand, not only does it use puzzling vocabulary, but also
puzzling concepts, such as object “capabilities”, actor “model” and
“vats”. For someone coming from imperative programming and good old C,
with a penchant for assembly, looking at Goblins can feel like reading
Heidegger.
Fortunately Goblins comes with an excellent
&lt;a href=&quot;https://spritely.institute/files/docs/guile-goblins/0.13.0/index.html&quot;&gt;tutorial&lt;/a&gt;,
which clarifies the seemingly exotic concepts by a hands-on approach with
concrete examples. These put the emphasis, however, on the distribution and
communication layer; tangible results are essentially obtained as side
effects of printing values on screen. While I think it is an excellent
idea to teach programming without mathematics to lower the barrier for
people who do not like mathematics, I had the opposite problem: As a
mathematician wanting to do computations, it was not immediately clear
to me how to have the server send computational tasks to the clients
and recover the results.
Goblins is a library (or collection of “modules”) for
&lt;a href=&quot;https://www.gnu.org/software/guile/&quot;&gt;Guile&lt;/a&gt;
(or &lt;a href=&quot;https://racket-lang.org/&quot;&gt;Racket&lt;/a&gt;), two
&lt;a href=&quot;https://www.r6rs.org/&quot;&gt;Scheme&lt;/a&gt; dialects;
from what I understand, it is unlikely that a C implementation will be
available any time soon.
&lt;/p&gt;
&lt;p&gt;
So the way in which I see Goblins being useful to distributed number theory
computations is as follows:
&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
Use Guile and Goblins to express the high-level control flow of the program,
and additionally Goblins to express its parallelised and distributed
aspects.
&lt;/li&gt;
&lt;li&gt;
Use functions from a C library to do the heavy lifting,
for instance functions from
&lt;a href=&quot;https://www.multiprecision.org/cm/&quot;&gt;CM&lt;/a&gt;
to do the computationally intensive tasks related to primality proving,
which can be called from Guile using the
&lt;a href=&quot;https://www.gnu.org/software/guile/manual/html_node/Foreign-Function-Interface.html&quot;&gt;Foreign
Function Interface&lt;/a&gt; or FFI.
In a number theoretic context, function arguments and values are often
arbitrarily long integers coming from the
&lt;a href=&quot;https://gmplib.org/&quot;&gt;GMP&lt;/a&gt; library.
Given that Guile itself is written in C and relies on GMP for its
implementation of integers, one can be hopeful that this should not pose
too many problems.
&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;The prototypical example&lt;/h2&gt;

&lt;p&gt;
As a running example, I would like to treat the computation of the euclidean
length of a vector; starting simple and enhancing the example step by step
in the following tutorial, which I am making up while I am trying to solve
the problem for myself.
&lt;/p&gt;
&lt;p&gt;
Let us begin with a fixed vector of small, fixed size,
which would look like the following in C:
&lt;/p&gt;
&lt;pre&gt;
int square (int x) {
   return x*x;
}

int v [] = {3, 4};
double len;

len = 0.0;
for (int i = 0; i &amp;lt; sizeof (v) / sizeof (int); i++)
   len += square (v [i]);
len = sqrt (len);
&lt;/pre&gt;
&lt;p&gt;
Granted, this it not number theory, but it fits the situation described
in the motivational section above:
There is a &lt;code&gt;for&lt;/code&gt; loop going through the vector calling
independently for each entry the &lt;code&gt;square&lt;/code&gt; function, which
stands for a function that would be expensive to compute, should be
distributed to the clients and loaded from a C library; the additions
and the square root, the &lt;code&gt;+&lt;/code&gt; and &lt;code&gt;sqrt&lt;/code&gt; functions,
stand for cheap post-treatment done at the server level.
&lt;/p&gt;
&lt;p&gt;
The following Guile code captures this sequential computation
with the &lt;code&gt;map&lt;/code&gt; and &lt;code&gt;fold&lt;/code&gt; idiom in
appreciable compactness:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1))
(define (square x) (* x x))
(define v '(3 4))
(sqrt (fold + 0 (map square v)))
&lt;/pre&gt;
&lt;p&gt;
Open a Guile REPL using the &lt;code&gt;guile&lt;/code&gt; command.
Then copy-paste this code into the REPL;
or save it as a file &lt;code&gt;euclid.scm&lt;/code&gt; and
type &lt;code&gt;(load &amp;quot;euclid.scm&amp;quot;)&lt;/code&gt; in the REPL;
enjoy the Pythagorean result!
&lt;/p&gt;
&lt;p&gt;
Before continuing, please go first through Chapters 1 to 4 of the Goblins
&lt;a href=&quot;https://spritely.institute/files/docs/guile-goblins/0.13.0/index.html&quot;&gt;documentation
and tutorial&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;
The next step is to “locally distribute” the computation, that is, to
create two clients and a server for the different steps of the computation;
for the time being, these will all live in the same local Guile REPL.
What is called a “process” in MPI corresponds to a “vat” in Goblins;
so we create &lt;code&gt;vat0&lt;/code&gt; for the server and &lt;code&gt;vat1&lt;/code&gt;
and &lt;code&gt;vat2&lt;/code&gt; for the clients:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (goblins))
(define vat0 (spawn-vat))
(define vat1 (spawn-vat))
(define vat2 (spawn-vat))
&lt;/pre&gt;
&lt;p&gt;
Functions in an MPI instrumented parallel program correspond to
“actors” living in a vat; we first define a type of actor computing
squares:
&lt;/p&gt;
&lt;pre&gt;
(define (^square bcom)
  (lambda (x)
    (* x x)))
&lt;/pre&gt;
&lt;p&gt;
To distinguish it from the &lt;code&gt;square&lt;/code&gt; function above, we prepend
a &lt;code&gt;^&lt;/code&gt; to its name; it takes a formal parameter called
&lt;code&gt;bcom&lt;/code&gt; that we need not worry about.
&lt;/p&gt;
&lt;p&gt;
Then we populate the client vats with a square actor each and keep
references to the different actors under different global names:
&lt;/p&gt;
&lt;pre&gt;
(define client1
  (with-vat vat1 (spawn ^square)))
(define client2
  (with-vat vat2 (spawn ^square)))
&lt;/pre&gt;
&lt;p&gt;
Now we can create a function in the server vat which computes the length
of a 2-dimensional vector by calls to the client actors using
&lt;code&gt;&amp;lt;-&lt;/code&gt;. For this to work, we will need to wait for the
clients to finish their computations (or, in Goblins parlance, for their
“promises” to be “fulfilled”); this is done using &lt;code&gt;on&lt;/code&gt; for
each call to a client actor.
The “Goblins standard library”, described in
&lt;a href=&quot;https://spritely.institute/files/docs/guile-goblins/0.13.0/actor_002dlib.html&quot;&gt;Chapter
6&lt;/a&gt; of the documentation, comes in handy here; in particular we can use a
&lt;a href=&quot;https://spritely.institute/files/docs/guile-goblins/0.13.0/Joiners.html&quot;&gt;joiner&lt;/a&gt;
to wait for several actors at the same time.
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (goblins actor-lib joiners))
(define (^len bcom)
  (lambda (v)
    (on (all-of (&amp;lt;- client1 (first v))(&amp;lt;- client2 (second v)))
        (lambda (res)
          (let ((l (sqrt (fold + 0 res))))
            (format #t &amp;quot;~a\n&amp;quot; l))))))
(define len (with-vat vat0 (spawn ^len)))
&lt;/pre&gt;
&lt;p&gt;
The len actor can now be called as follows, which will print the
euclidean length of a vector on screen; from within the vat where the
actor resides, we may use &lt;code&gt;$&lt;/code&gt; instead of &lt;code&gt;&amp;lt;-&lt;/code&gt;,
which behaves like a normal function call:
&lt;/p&gt;
&lt;pre&gt;
(with-vat vat0 ($ len '(3 4)))
&lt;/pre&gt;
&lt;p&gt;
Putting this all together for convenient copy-pasting, here is the complete
code:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (goblins)
             (goblins actor-lib joiners))

(define vat0 (spawn-vat))
(define vat1 (spawn-vat))
(define vat2 (spawn-vat))

(define (^square bcom)
  (lambda (x)
    (* x x)))

(define client1
  (with-vat vat1 (spawn ^square)))
(define client2
  (with-vat vat2 (spawn ^square)))

(define (^len bcom)
  (lambda (v)
    (on (all-of (&amp;lt;- client1 (first v))(&amp;lt;- client2 (second v)))
        (lambda (res)
          (let ((l (sqrt (fold + 0 res))))
            (format #t &amp;quot;~a\n&amp;quot; l))))))

(define len (with-vat vat0 (spawn ^len)))

(with-vat vat0 ($ len '(3 4)))
&lt;/pre&gt;


&lt;h2&gt;Promises, promises!&lt;/h2&gt;

&lt;p&gt;
The previous code prints the length of the vector on screen using the
&lt;code&gt;format&lt;/code&gt; function; one might wish to instead create a function
that returns the value, to assign it to a variable for future treatment,
for instance, or to enter the Guile equivalent of the next &lt;code&gt;for&lt;/code&gt;
loop. This turns out to be surprisingly difficult, or, to be more precise,
impossible. The reason is that
&lt;code&gt;on&lt;/code&gt; handles the promise by calling the function in its body
with the return value of the promise, but does not itself return the result
of this evaluation, as I would have expected.
(&lt;a href=&quot;https://www.gnu.org/software/guile/manual/html_node/Delayed-Evaluation.html#index-promises&quot;&gt;Promises&lt;/a&gt;
in Guile itself, created with &lt;code&gt;delay&lt;/code&gt;, behave in this expected
way when using &lt;code&gt;force&lt;/code&gt;.)
It is possible to obtain a return value for &lt;code&gt;on&lt;/code&gt;, but this will
again be a promise and not a “real” value — once a promise, always a
promise!
&lt;/p&gt;
&lt;p&gt;
So instead of passing around values, one quickly ends up passing around
promises; this requires to get used to, and entails an additional layer
of wrapping everything into &lt;code&gt;on&lt;/code&gt; and a function instead of
just evaluating the body of the function. As far as I understand, to
obtain any tangible result, one eventually needs to print it on screen
or into a file. The following example illustrates how to use the
&lt;code&gt;#:promise? #t&lt;/code&gt; keyword parameter with &lt;code&gt;on&lt;/code&gt; to
ensure that it returns a promise, and how to lug this promise around to
continue computations with its encapsulated value, while never leaving the
realm of promises until eventually printing a result. It moves the
computation of the square root out of the &lt;code&gt;^len&lt;/code&gt; actor.
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (goblins)
             (goblins actor-lib joiners))

(define vat0 (spawn-vat))
(define vat1 (spawn-vat))
(define vat2 (spawn-vat))

(define (^square bcom)
  (lambda (x)
    (* x x)))

(define client1
  (with-vat vat1 (spawn ^square)))
(define client2
  (with-vat vat2 (spawn ^square)))

(define (^norm bcom)
  (lambda (v)
    (on (all-of (&amp;lt;- client1 (first v))(&amp;lt;- client2 (second v)))
        (lambda (res)
          (fold + 0 res))
        #:promise? #t)))

(define norm (with-vat vat0 (spawn ^norm)))

(with-vat vat0
  (define n ($ norm '(3 4)))
  (define l (on n (lambda (x) (sqrt x)) #:promise? #t))
  (on l (lambda (x) (format #t &amp;quot;~a\n&amp;quot; x))))
&lt;/pre&gt;
&lt;p&gt;
So here, &lt;code&gt;n&lt;/code&gt; and &lt;code&gt;l&lt;/code&gt; are promises, and fulfillment
occurs only in the last &lt;code&gt;on&lt;/code&gt; that prints a result.
&lt;/p&gt;
&lt;p&gt;
Now that we have understood how things work, it is useful to introduce the
&lt;code&gt;let*-on&lt;/code&gt;
&lt;a href=&quot;https://spritely.institute/files/docs/guile-goblins/0.13.0/Let_002dOn.html&quot;&gt;syntactic
sugar&lt;/a&gt;, which lets us end up with the following code:
&lt;/p&gt;
&lt;pre&gt;
(use-modules (srfi srfi-1)
             (goblins)
             (goblins actor-lib joiners)
             (goblins actor-lib let-on))

(define vat0 (spawn-vat))
(define vat1 (spawn-vat))
(define vat2 (spawn-vat))

(define (^square bcom)
  (lambda (x)
    (* x x)))

(define client1
  (with-vat vat1 (spawn ^square)))
(define client2
  (with-vat vat2 (spawn ^square)))

(define (^norm bcom)
  (lambda (v)
    (on (all-of (&amp;lt;- client1 (first v))(&amp;lt;- client2 (second v)))
        (lambda (res)
          (fold + 0 res))
        #:promise? #t)))

(define norm (with-vat vat0 (spawn ^norm)))

(with-vat vat0
  (let*-on ((n ($ norm '(3 4)))
            (l (sqrt n)))
    (format #t &amp;quot;~a\n&amp;quot; l)))
&lt;/pre&gt;
&lt;p&gt;
This looks exactly like normal &lt;code&gt;let*&lt;/code&gt; syntax in Guile!
So in the end, we arrive at a program which looks as if it handled normal
values, with all promises swept under the rug.
&lt;/p&gt;

&lt;h3&gt;Acknowledgements&lt;/h3&gt;
&lt;p&gt;
I thank Jessica Tallon and David Thompson for their kind help with
understanding the concept of promises covered in the previous section.
&lt;/p&gt;

&lt;p&gt;
This first part has dealt with basic programming concepts in Goblins.
In the end, all our code still runs in a single script, so we have
taken a twisted path to write essentially serial code, but in doing so,
we have laid the groundwork for
&lt;a href=&quot;goblins-2.html&quot;&gt;true parallelisation&lt;/a&gt;
with Goblins.
&lt;/p&gt;

&lt;/div&gt;</content></entry></feed>