Goblins for number theory, part 3
Ending and persisting
In previous posts we have seen how to solve our toy problem of computing the euclidian length of a vector in a distributed fashion using Goblins, with a client script that runs in several copies, carries out most of the work and reports back to a server script, which collects the partial results into a solution to the problem. The clients could in principle live on distant machines and communicate over the Tor network. For testing in a local setting, however, letting them run on the same machine as the server and communicating over TCP turns out to be more efficient. So far, our architecture is rather inflexible: We assume that the server knows the number of participating clients beforehand, and that all tasks take more or less the same time so that distributing them evenly to the clients is an optimal scheduling strategy. The logical next step is to overcome these limitations. My initial solution for a more general framework, however, turned out to be very inefficient. Jessica Tallon and David Thompson of the Spritely Institute (many thanks to them!) kindly had a look at it and came up with a much better solution; but our discussions also helped me understand Goblins better and inspired ideas on how to improve the current client and server scripts. So before going for more generality in the next post, let us do a pirouette with the current framework and also explore some interesting side tracks that did not make it into the previous post.
Spring cleaning
Before doing anything substantial, let us clean up a few things in the
current code. The main actor in the server script is currently defined
through the type ^register
as follows:
(define clients (with-vat vat (spawn ^cell '()))) (define (^register bcom) (lambda (id) ($ clients (cons (<- mycapn 'enliven id) ($ clients))) (print-id "Registered" id))) (define register (with-vat vat (spawn ^register)))
It captures the clients
variable in the closure defined by
lambda
, which works, but requires the variables to be defined
in this order. A more elegant solution is to pass clients
as an argument. At the same time, we take the opportunity to rename the
verb register
to the noun registry
.
(define (^registry bcom clients) (lambda (id) ($ clients (cons (<- mycapn 'enliven id) ($ clients))) (print-id "Registered" id))) (define clients (with-vat vat (spawn ^cell '()))) (define registry (with-vat vat (spawn ^registry clients)))
Let us also get rid of some “overgoblinification”; indeed the actor of
type ^len
in the server can be replaced by a simple function,
or (since the Goblins promises force us to work with side effects anyway)
by sequential code. We end up with the following server script
server.scm
:
(use-modules (srfi srfi-1) (goblins) (goblins actor-lib cell) (goblins actor-lib joiners) (goblins actor-lib methods) (goblins ocapn ids) (goblins ocapn captp) (goblins ocapn netlayer tcp-tls)) (define vat (spawn-vat)) (define net (spawn-vat)) (define (print-id prefix id) (with-vat net (on id (lambda (sref) (format #t "~a ~a\n" prefix (ocapn-id->string sref)))))) (define capn (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer "localhost")))) (define (^registry bcom clients) (lambda (id) ($ clients (cons (<- capn 'enliven id) ($ clients))) (print-id "Registered" id))) (define clients (with-vat vat (spawn ^cell '()))) (define registry (with-vat vat (spawn ^registry clients))) (let ((id (with-vat net ($ capn 'register registry 'tcp-tls)))) (print-id "Server ID" id)) (while (not (eq? (length (with-vat vat ($ clients))) 2)) (sleep 1)) (define v '(1 2 3 4 5)) (with-vat vat (while (< (length ($ clients)) (length v)) (let ((c ($ clients))) ($ clients (append c c))))) (with-vat vat (on (all-of* (map <- ($ clients) v)) (lambda (res) (format #t "~a\n" (sqrt (fold + 0 res)))))) (sleep 3600)
and the following client script client.scm
:
(use-modules (srfi srfi-1) (goblins) (goblins ocapn ids) (goblins ocapn captp) (goblins ocapn netlayer tcp-tls)) (define vat (spawn-vat)) (define net (spawn-vat)) (define (^square bcom) (lambda (x) (* x x))) (define client (with-vat vat (spawn ^square))) (define (print-id prefix id) (with-vat net (on id (lambda (sref) (format #t "~a ~a\n" prefix (ocapn-id->string sref)))))) (define capn (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer "localhost")))) (define id (with-vat net ($ capn 'register client 'tcp-tls))) (print-id "Client ID" id) (define server (with-vat vat (<- capn 'enliven (string->ocapn-id (second (command-line)))))) (with-vat vat (on id (lambda (id) (<- server id)))) (sleep 3600)
Now run again
guile server.scm
in one terminal and two copies of the client script as
guile client.scm 'ocapn://…'
in two other terminals, where the ocapn URI has been replaced by the one printed by the server, to compute the same result as before.
Passing actors around
After going through the
CapTP
tutorial, I was under the impression that the only way to create a handle
on an actor on a different machine was by obtaining its sturdyref ID
and “enlivening” this ID locally. Currently the server script prints its
ID, which the client script obtains as an argument when invoked from the
command line. This enables the client to enliven the server and to send
its ID to the server when registering by a <-
call; then
the server enlivens the client.
It turns out, however, that it is also possible to directly send actors
instead of their IDs through <-
. Printing and copy-pasting
IDs is still necessary for bootstrapping, but once a spanning tree is
generated in this manner between all participating scripts, it is possible
to obtain a complete communication graph by just sending actors along these
bootstrapped network edges.
We would still like the client to somehow present itself to the server with a name, so that the server can print who connects to it and thus make debugging easier. If we drop the ocapn ID, then the client can use a pet name, a string that we pass as an additional argument on the command line. The server needs only minimal modifications:
(use-modules (srfi srfi-1) (goblins) (goblins actor-lib cell) (goblins actor-lib joiners) (goblins actor-lib methods) (goblins ocapn ids) (goblins ocapn captp) (goblins ocapn netlayer tcp-tls)) (define vat (spawn-vat)) (define net (spawn-vat)) (define (print-id prefix id) (with-vat net (on id (lambda (sref) (format #t "~a ~a\n" prefix (ocapn-id->string sref)))))) (define capn (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer "localhost")))) (define (^registry bcom clients) (lambda (client name) ($ clients (cons client ($ clients))) (format #t "Registered ~a\n" name))) (define clients (with-vat vat (spawn ^cell '()))) (define registry (with-vat vat (spawn ^registry clients))) (let ((id (with-vat net ($ capn 'register registry 'tcp-tls)))) (print-id "Server ID" id)) (while (not (eq? (length (with-vat vat ($ clients))) 2)) (sleep 1)) (define v '(1 2 3 4 5)) (with-vat vat (while (< (length ($ clients)) (length v)) (let ((c ($ clients))) ($ clients (append c c))))) (with-vat vat (on (all-of* (map <- ($ clients) v)) (lambda (res) (format #t "~a\n" (sqrt (fold + 0 res)))))) (sleep 3600)
Notice the additional argument name
for the
^registry
actor, which is used for announcing arriving
clients instead of their ocapn ID.
(In this implementation we forget the name of a client immediately;
it would make sense to somehow keep it, either by remembering it directly
in ^square
or by having the server memorise it in its client
list.)
Instead of enlivening an ID and adding the resulting actor to the
clients
list, the server adds the client actor directly.
The client modifications are also straightforward and simplify the script
considerably:
(use-modules (srfi srfi-1) (goblins) (goblins ocapn ids) (goblins ocapn captp) (goblins ocapn netlayer tcp-tls)) (define vat (spawn-vat)) (define net (spawn-vat)) (define (^square bcom) (lambda (x) (* x x))) (define client (with-vat vat (spawn ^square))) (define capn (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer "localhost")))) (with-vat net ($ capn 'register client 'tcp-tls)) (define name (second (command-line))) (define server (with-vat vat (<- capn 'enliven (string->ocapn-id (third (command-line)))))) (with-vat vat (<- server client name)) (sleep 3600)
Now start the server as usual, and two clients as
guile client.scm Alice 'ocapn://…' guile client.scm Bob 'ocapn://…'
to see the familiar result.
Being methodical
As it will be useful later on, let us replace the workhorse in the client,
the ^square
actor with only one possible action (squaring
a number that is sent to it) by an implementation with potentially more
actions. To do so, we use
methods
from Goblin actor libs, which dispatch actions using an additional symbol.
So
(define (^square bcom) (lambda (x) (* x x))) (define client (with-vat vat (spawn ^square)))
becomes
(use-module (goblins actor-lib methods) … (define (^worker bcom) (methods ((square x) (* x x)))) (define client (with-vat vat (spawn ^worker)))
Inside the server, we now need to change calls of the form
(<- client x)
by adding an additional symbol to
(<- client 'square x)
This is made more complicated since they appear inside map
:
(map <- ($ clients) v)
The solution is to change the <-
function, which now takes
three arguments (a client, a symbol and a number) into a function with only
two arguments by fixing the middle argument to 'square
.
This can be done using
SRFI-26
cut; it takes the function name and for each argument of the function
either a fixed value, or the placeholder <>
indicating
that this argument should be kept as such. In our case, this gives
(map (cut <- <> 'square <>) ($ clients) v))
So altogether, here is our current server:
(use-modules (srfi srfi-1) (srfi srfi-26) (goblins) (goblins actor-lib cell) (goblins actor-lib joiners) (goblins actor-lib methods) (goblins ocapn ids) (goblins ocapn captp) (goblins ocapn netlayer tcp-tls)) (define vat (spawn-vat)) (define net (spawn-vat)) (define (print-id prefix id) (with-vat net (on id (lambda (sref) (format #t "~a ~a\n" prefix (ocapn-id->string sref)))))) (define capn (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer "localhost")))) (define (^registry bcom clients) (lambda (client name) ($ clients (cons client ($ clients))) (format #t "Registered ~a\n" name))) (define clients (with-vat vat (spawn ^cell '()))) (define registry (with-vat vat (spawn ^registry clients))) (let ((id (with-vat net ($ capn 'register registry 'tcp-tls)))) (print-id "Server ID" id)) (while (not (eq? (length (with-vat vat ($ clients))) 2)) (sleep 1)) (define v '(1 2 3 4 5)) (with-vat vat (while (< (length ($ clients)) (length v)) (let ((c ($ clients))) ($ clients (append c c))))) (with-vat vat (on (all-of* (map (cut <- <> 'square <>) ($ clients) v)) (lambda (res) (format #t "~a\n" (sqrt (fold + 0 res)))))) (sleep 3600)
and here our current client:
(use-modules (srfi srfi-1) (goblins) (goblins actor-lib methods) (goblins ocapn ids) (goblins ocapn captp) (goblins ocapn netlayer tcp-tls)) (define vat (spawn-vat)) (define net (spawn-vat)) (define (^worker bcom) (methods ((square x) (* x x)))) (define client (with-vat vat (spawn ^worker))) (define capn (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer "localhost")))) (with-vat net ($ capn 'register client 'tcp-tls)) (define name (second (command-line))) (define server (with-vat vat (<- capn 'enliven (string->ocapn-id (third (command-line)))))) (with-vat vat (<- server client name)) (sleep 3600)
Everything has an end, but Goblins
It is mildly annoying that the scripts run forever (well, for one hour…)
and need to be stopped with <ctrl-c>
. But it is
somewhat difficult to decide when to stop: In both our scripts, the
control flow reaches the end of the programs, while Goblins are still
working in the background through promises.
It is possible to use
conditions
from Guile Fibers, as
inspired by the
chat
example in the Goblins documentation. Since Fibers are a basic
ingredient of Goblins in Guile, they do not need to be installed
separately.
We can modify the client as follows:
(use-module (fibers conditions) … (define end (make-condition)) … (define (^worker bcom) (methods ((square x) (* x x)) ((finish) (signal-condition! end)))) … (wait end)
First we import the (fibers conditions)
module. Then we create
the “condition” end
. We use signal-condition!
to signal, well, that the condition has been fulfilled. And we replace
sleep
ing by wait
ing for the condition.
The signalling is encapsulated in a new method 'finish
of the
^worker
actor, which can be called from the server as
(map (cut <- <> 'finish) ($ clients))
after the result of the computations has been printed. This results in the following client script:
(use-modules (srfi srfi-1) (fibers conditions) (goblins) (goblins actor-lib methods) (goblins ocapn ids) (goblins ocapn captp) (goblins ocapn netlayer tcp-tls)) (define vat (spawn-vat)) (define net (spawn-vat)) (define end (make-condition)) (define (^worker bcom) (methods ((square x) (* x x)) ((finish) (signal-condition! end)))) (define client (with-vat vat (spawn ^worker))) (define capn (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer "localhost")))) (with-vat net ($ capn 'register client 'tcp-tls)) (define name (second (command-line))) (define server (with-vat vat (<- capn 'enliven (string->ocapn-id (third (command-line)))))) (with-vat vat (<- server client name)) (wait end)
With the server script modified suitably as explained above, the clients
now end correctly, but the server crashes after printing the result of the
computations. A hasty decision we took earlier comes back to haunt us now:
Since there are more tasks than clients, we have filled the
clients
list with duplicates of the client actors so as to
send multiple 'square
messages to the same actor; but now we
send multiple 'finish
messages to clients that have stopped
running after the first such message, resulting in a scary error on the
server side that boils down to &non-continuable
.
To reach this correct conclusion more gracefully, we take another hasty
decision and deduplicate the clients list when calling finish:
(map (cut <- <> 'finish) (delete-duplicates ($ clients)))
An an excuse for our laziness in not looking for a more elegant solution, we remark that anyway this part will be reworked later to obtain a more flexible client queue.
I have not found a similar approach to also have the server end gracefully.
If one places signal-condition!
in the code right after sending
the 'finish
messages to the clients, then the clients do not end,
since it turns out that the server finishes so fast that the messages are
not actually sent. If one tries to wait for the promise coming out of the
'finish
calls, then this also fails, since the finished clients
cannot send back a function value any more.
So I keep the sleep
in the end and make it just a bit shorter.
The current server.scm
then looks like this:
(use-modules (srfi srfi-1) (srfi srfi-26) (goblins) (goblins actor-lib cell) (goblins actor-lib joiners) (goblins actor-lib methods) (goblins ocapn ids) (goblins ocapn captp) (goblins ocapn netlayer tcp-tls)) (define vat (spawn-vat)) (define net (spawn-vat)) (define (print-id prefix id) (with-vat net (on id (lambda (sref) (format #t "~a ~a\n" prefix (ocapn-id->string sref)))))) (define capn (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer "localhost")))) (define (^registry bcom clients) (lambda (client name) ($ clients (cons client ($ clients))) (format #t "Registered ~a\n" name))) (define clients (with-vat vat (spawn ^cell '()))) (define registry (with-vat vat (spawn ^registry clients))) (let ((id (with-vat net ($ capn 'register registry 'tcp-tls)))) (print-id "Server ID" id)) (while (not (eq? (length (with-vat vat ($ clients))) 2)) (sleep 1)) (define v '(1 2 3 4 5)) (with-vat vat (while (< (length ($ clients)) (length v)) (let ((c ($ clients))) ($ clients (append c c))))) (with-vat vat (on (all-of* (map (cut <- <> 'square <>) ($ clients) v)) (lambda (res) (format #t "~a\n" (sqrt (fold + 0 res))) (map (cut <- <> 'finish) (delete-duplicates ($ clients)))))) (sleep 10)
Résistez ! euh, persistez !
Another annoyance in the current code is that the ocapn ID of the server changes every time it is started, so that there is a lot of copy-pasting for starting the clients. This turns from a minor annoyance into a problem when different clients are supposed to be started independently all over the Internet, and the ocapn ID is the de facto credential to enable connections. Then a restart of the server script for any reason, be it a power outage or an update, requires to communicate the new ID to all participants. From the name of it, it sounds as if persistence could come to the rescue. We only need to persist the server. In a first step, we add a bit of boilerplate, taken from the documentation of persistent vats; this seems to be required when several vats with cross-references to each other are to be persisted, but cannot do any harm in general.
(use-module (goblins vat) … (define persistence-vat (spawn-vat)) (define persistence-registry (with-vat persistence-vat (spawn ^persistence-registry)))
Then we follow the example on persistence in the documentation of the TCP netlayer (after correcting a small error in the documentation for version 0.15, which has been updated in the meantime) and replace
(define net (spawn-vat)) (define capn (with-vat net (spawn-mycapn (spawn ^tcp-tls-netlayer "localhost"))))
by
(use-module (goblins persistence-store syrup) … (define-values (net capn) (spawn-persistent-vat (make-persistence-env #:extends (list captp-env tcp-tls-netlayer-env)) (lambda () (spawn-mycapn (spawn ^tcp-tls-netlayer "localhost"))) (make-syrup-store "ocapn.syrup") #:persistence-registry persistence-registry))
The spawn-persistent-vat
returns a number of values; the first
one is a new vat, the other ones are created by the lambda
expression and correspond to actors in the vat which are to be persisted
(more precisely, they form the roots of the corresponding graph).
A persistence environment is passed as the first argument; it “knows” how
to store the different types of actors. In this case, we store to a file
named ocapn.syrup
, where syrup is the Goblins internal file
format.
It is instructive to run the server and to inspect the ocapn ID it prints.
The general format seems to be
ocapn://….tcp-tls/s/…?host=localhost&port=…
where the first ellipsis consists of 52 lower case letters and digits
(a 256 bit hash encoded in base 32?),
the second ellipsis consists of 43 lower and upper case letters, digits
and symbols (a 256 bit hash encoded in base 64?),
and the third ellipsis is a random port.
Previously, all three would change when invoking the script. Now the
sequence in the place of the first ellipsis as well as the port remain
fixed.
So we need to persist more, in particular the actor that is registered in the network layer. So we replace
(define (^registry bcom clients) …) (define vat (spawn-vat)) (define clients (with-vat vat (spawn ^cell '()))) (define registry (with-vat vat (spawn ^registry clients)))by
(define-actor (^registry bcom clients) …) (define-values (vat clients registry) (spawn-persistent-vat (make-persistence-env (list (list '((registry) ^registry) ^registry)) #:extends cell-env) (lambda () (let ((clients (spawn ^cell '()))) (values clients (spawn ^registry clients)))) (make-syrup-store "registry.syrup") #:persistence-registry persistence-registry))
Notice the use of define-actor
instead of define
,
which appears to be necessary to achieve persistence.
Besides the cell actor known to Goblins from the actor-lib, we also need
to declare our self-defined actor of type ^registry
in the
persistence environment; this is obtained by the rather indigest boiler
plate line creating nested lists. We use a second file,
registry.syrup
, to store this actor.
However, this fails miserably, as the server crashes with an error message
containing keywords such as vat-churn
and
vat-maybe-persist-changed-objs!
.
What happens exactly seems to depend on timing. In this case there is a
176 byte file registry.syrup
containing a few strings
and binary data. I suppose it stores the empty client list and the
corresponding registry. After clients register, there is a “churn”
(which I understand as the vat taking a break after a turn is over),
and the persistence system tries to update the file. However, the client
list now contains an actor coming from the client script, that is, coming
over the network from potentially a different machine. Since this is not
under the control of the local script, it cannot be stored.
There is apparently a very simple workaround. The
spawn-persistent-vat
function admits on optional parameter
#:persist-on
; if this is changed from the default
'churn
to something else, then the vat changes are not
stored at each churn. In effect, the vat is only stored once in the
beginning, and keeps an empty client list forever. This is actually
exactly what we need, an empty client list at each restart of the server.
So we end up with the following server.scm
:
(use-modules (srfi srfi-1) (srfi srfi-26) (goblins) (goblins actor-lib cell) (goblins actor-lib joiners) (goblins actor-lib methods) (goblins ocapn ids) (goblins ocapn captp) (goblins ocapn netlayer tcp-tls) (goblins persistence-store syrup) (goblins vat)) (define persistence-vat (spawn-vat)) (define persistence-registry (with-vat persistence-vat (spawn ^persistence-registry))) (define-values (net capn) (spawn-persistent-vat (make-persistence-env #:extends (list captp-env tcp-tls-netlayer-env)) (lambda () (spawn-mycapn (spawn ^tcp-tls-netlayer "localhost"))) (make-syrup-store "ocapn.syrup") #:persistence-registry persistence-registry)) (define (print-id prefix id) (with-vat net (on id (lambda (sref) (format #t "~a ~a\n" prefix (ocapn-id->string sref)))))) (define-actor (^registry bcom clients) (lambda (client name) ($ clients (cons client ($ clients))) (format #t "Registered ~a\n" name))) (define-values (vat clients registry) (spawn-persistent-vat (make-persistence-env (list (list '((registry) ^registry) ^registry)) #:extends cell-env) (lambda () (let ((clients (spawn ^cell '()))) (values clients (spawn ^registry clients)))) (make-syrup-store "registry.syrup") #:persist-on #f #:persistence-registry persistence-registry)) (let ((id (with-vat net ($ capn 'register registry 'tcp-tls)))) (print-id "Server ID" id)) (while (not (eq? (length (with-vat vat ($ clients))) 2)) (sleep 1)) (define v '(1 2 3 4 5)) (with-vat vat (while (< (length ($ clients)) (length v)) (let ((c ($ clients))) ($ clients (append c c))))) (with-vat vat (on (all-of* (map (cut <- <> 'square <>) ($ clients) v)) (lambda (res) (format #t "~a\n" (sqrt (fold + 0 res))) (map (cut <- <> 'finish) (delete-duplicates ($ clients)))))) (sleep 10)
It may be prudent now to remove all .syrup
files from previous
failed attempts. Running a server and two client scripts computes the
desired result as before. But now one notices that upon restarting the
server script, it prints the exact same ocapn ID as before. So the clients
can also be restarted with the exact same commands, and no more copy-pasting
is needed.