Communication Protocol¶
The SpiNNaker partition server uses a simple JSON-based protocol over TCP to communicate with clients. The protocol has no security features what-so-ever, just like SpiNNaker hardware, and it is assumed that the server is operated within the same trusted network as the boards it manages.
By default, the server listens on TCP port number 22244. The client and server
communicate by sending and receiving newline (\n
) delimited JSON objects
(i.e. lines of the form {...}
). Clients may send commands to the server
to which return values are sent by the server. The server may also
asynchronously send notifications to the client, if requested by the client.
As soon as a client connects to the server it should send a
version()
command and ensure the server version is
compatible with the client.
Sending Commands¶
Commands map exactly to Python function calls. A command has a name and arguments and returns a value. To send a command, a client must sent a JSON object the following keys, followed by a newline:
- “command”
A string. The name of the command to be executed.
- “args”
An array. A list of positional arguments for the command.
- “kwargs”
An object. A list of keyword arguments for the command.
For example, if a client sent the following to the server:
{"command": "create_job", "args": [4, 2], "kwargs": {"owner": "me"}}\n
This would be interpreted as a function call like:
create_job(4, 2, owner="me")
Note
In all examples, \n
means a newline character (ASCII 10), not the
\
and n
characters.
The server will then respond with a JSON object with a single key, “return”, whose value is the value returned by the command. For example:
{"return": 42}\n
Commands are processed and return values sent in FIFO order. No blocking commands are implemented by the server and the server will make a best-effort attempt to respond to all commands as quickly as possible. If any command is malformed or causes an error for any reason, the client is immediately disconnected.
Receiving Asynchronous Notifications¶
If the client requests to be notified of certain events (using a command, as described above), the server may send a JSON object to the client which does not contain the key “return”. Notifications may be sent at any time, once requested, including between a function being called and the return value being sent. The exact format of the notification depends on its type. An example notification may look like the following:
{"jobs_changed": [42, 10, 3]}\n
Available Commands¶
The commands supported by the server are enumerated below and expressed in the form of Python functions.
- create_job(*args, **kwargs)[source]¶
Create a new job (i.e. allocation of boards).
This function should be called in one of the following styles:
# Any single (SpiNN-5) board job_id = create_job(owner="me") job_id = create_job(1, owner="me") # Board x=3, y=2, z=1 on the machine named "m" job_id = create_job(3, 2, 1, machine="m", owner="me") # Any machine with at least 4 boards job_id = create_job(4, owner="me") # Any 7-or-more board machine with an aspect ratio at least as # square as 1:2 job_id = create_job(7, min_ratio=0.5, owner="me") # Any 4x5 triad segment of a machine (may or may-not be a # torus/full machine) job_id = create_job(4, 5, owner="me") # Any torus-connected (full machine) 4x2 machine job_id = create_job(4, 2, require_torus=True, owner="me")
The ‘other parameters’ enumerated below may be used to further restrict what machines the job may be allocated onto.
Jobs for which no suitable machines are available are immediately destroyed (and the reason given).
Once a job has been created, it must be ‘kept alive’ by a simple watchdog mechanism. Jobs may be kept alive by periodically calling the
job_keepalive()
command or by calling any other job-specific command. Jobs are culled if no keep alive message is received forkeepalive
seconds. If absolutely necessary, a job’s keepalive value may be set to None, disabling the keepalive mechanism.Once a job has been allocated some boards, these boards will be automatically powered on and left unbooted ready for use.
- Parameters:
owner (str) – Required. The name of the owner of this job.
keepalive (float or None) – The maximum number of seconds which may elapse between a query on this job before it is automatically destroyed. If None, no timeout is used. (Default: 60.0)
machine (str or None) – Specify the name of a machine which this job must be executed on. If None, the first suitable machine available will be used, according to the tags selected below. Must be None when tags are given. (Default: None)
tags (list(str) or None) – The set of tags which any machine running this job must have. If None is supplied, only machines with the “default” tag will be used. If machine is given, this argument must be None. (Default: None)
min_ratio (float) – The aspect ratio (h/w) which the allocated region must be ‘at least as square as’. Set to 0.0 for any allowable shape, 1.0 to be exactly square. Ignored when allocating single boards or specific rectangles of triads.
max_dead_boards (int or None) – The maximum number of broken or unreachable boards to allow in the allocated region. If None, any number of dead boards is permitted, as long as the board on the bottom-left corner is alive (Default: None).
max_dead_links (int or None) – The maximum number of broken links allow in the allocated region. When require_torus is True this includes wrap-around links, otherwise peripheral links are not counted. If None, any number of broken links is allowed. (Default: None).
require_torus (bool) – If True, only allocate blocks with torus connectivity. In general this will only succeed for requests to allocate an entire machine (when the machine is otherwise not in use!). Must be False when allocating boards. (Default: False)
- Returns:
The job ID given to the newly allocated job.
- Return type:
- job_keepalive(job_id)[source]¶
Reset the keepalive timer for the specified job.
Note
All other job-specific commands implicitly do this.
- Parameters:
job_id (int) – A job ID to be kept alive.
- get_job_state(job_id)[source]¶
Poll the state of a running job.
- Parameters:
job_id (int) – A job ID to get the state of.
- Return type:
- Returns:
A dictionary with the following keys:
state
JobState
The current state of the queried job.
power
bool or NoneIf job is in the ready or power states, indicates whether the boards are power{ed,ing} on (
True
), or power{ed,ing} off (False
). In other states, this value isNone
.keepalive
float or NoneThe Job’s keepalive value: the number of seconds between queries about the job before it is automatically destroyed.
None
if no timeout is active (or when the job has been destroyed).reason
str or NoneIf the job has been destroyed, this may be a string describing the reason the job was terminated.
start_time
float or NoneFor queued and allocated jobs, gives the Unix time (UTC) at which the job was created (or None otherwise).
- get_job_machine_info(job_id)[source]¶
Get the list of Ethernet connections to the allocated machine.
- Parameters:
job_id (int) – A job ID to get the machine info for.
- Return type:
- Returns:
A dictionary with the following keys:
width
,height
int or NoneThe dimensions of the machine in chips, e.g. for booting.
None
if no boards are allocated to the job.connections
[[[x, y], hostname], …] or NoneA list giving Ethernet-connected chip coordinates in the machine to hostname.
None
if no boards are allocated to the job.machine_name
str or NoneThe name of the machine the job is allocated on.
None
if no boards are allocated to the job.boards
[[x, y, z], …] or NoneAll the boards allocated to the job or
None
if no boards allocated.
- power_on_job_boards(job_id)[source]¶
Power on (or reset if already on) boards associated with a job.
Once called, the job will enter the ‘power’ state until the power state change is complete, this may take some time.
- Parameters:
job_id (int) – A job ID to turn boards on for.
- power_off_job_boards(job_id)[source]¶
Power off boards associated with a job.
Once called, the job will enter the ‘power’ state until the power state change is complete, this may take some time.
- Parameters:
job_id (int) – A job ID to turn boards off for.
- destroy_job(job_id, reason=None)[source]¶
Destroy a job.
Call when the job is finished, or to terminate it early, this function releases any resources consumed by the job and removes it from any queues.
- notify_job(job_id=None)[source]¶
Register to be notified about changes to a specific job ID.
Once registered, a client will be asynchronously be sent notifications form
{"jobs_changed": [job_id, ...]}\n
enumerating job IDs which have changed. Notifications are sent when a job changes state, for example when created, queued, powering on/off, powered on and destroyed. The specific nature of the change is not reflected in the notification.- Parameters:
job_id (int or None) – A job ID to be notified of or None if all job state changes should be reported. Defaults to None (i.e., all jobs).
See also
no_notify_job
Stop being notified about a job.
notify_machine
Register to be notified about changes to machines.
- no_notify_job(job_id=None)[source]¶
Stop being notified about a specific job ID.
Once this command returns, no further notifications for the specified ID will be received.
- Parameters:
job_id (int or None) – A job ID to no longer be notified of or None to not be notified of any jobs. Note that if all job IDs were registered for notification, this command only has an effect if the specified job_id is None. Defaults to None (i.e., all jobs).
See also
notify_job
Register to be notified about changes to a specific job.
- notify_machine(machine_name=None)[source]¶
Register to be notified about a specific machine name.
Once registered, a client will be asynchronously be sent notifications of the form
{"machines_changed": [machine_name, ...]}\n
enumerating machine names which have changed. Notifications are sent when a machine changes state, for example when created, change, removed, allocated a job or an allocated job is destroyed.- Parameters:
machine_name (str or None) – A machine name to be notified of or None if all machine state changes should be reported. Defaults to None (i.e., all machines).
See also
no_notify_machine
Stop being notified about a machine.
notify_job
Register to be notified about changes to jobs.
- no_notify_machine(machine_name=None)[source]¶
Unregister to be notified about a specific machine name.
Once this command returns, no further notifications for the specified ID will be received.
- Parameters:
machine_name (str or None) – A machine name to no longer be notified of or None to not be notified of any machines. Note that if all machines were registered for notification, this command only has an effect if the specified machine_name is None. Defaults to None (i.e., all machines).
See also
notify_machine
Register to be notified about changes to a machine.
- list_jobs()[source]¶
Enumerate all non-destroyed jobs.
- Return type:
- Returns:
A list of allocated/queued jobs in order of creation from oldest (first) to newest (last). Each job is described by a dictionary with the following keys:
job_id
the ID of the job.
owner
the string giving the name of the Job’s owner.
start_time
the time the job was created (Unix time, UTC).
keepalive
the maximum time allowed between queries for this job before it is automatically destroyed (or
None
if the job can remain allocated indefinitely).state
the current
JobState
of the job.power
indicates whether the boards are powered on or not. If job is in the ready or power states, indicates whether the boards are power{ed,ing} on (
True
), or power{ed,ing} off (False
). In other states, this value isNone
.args
andkwargs
the arguments to the alloc function which specifies the type/size of allocation requested and the restrictions on dead boards, links and torus connectivity.
allocated_machine_name
the name of the machine the job has been allocated to run on (or
None
if not allocated yet).boards
a list [(x, y, z), …] of boards allocated to the job.
keepalivehost
the IP address of the host reckoned to be keeping this job alive (i.e., the host that did a request most recently that updated the internal keep-alive timeout).
- list_machines()[source]¶
Enumerates all machines known to the system.
- Return type:
- Returns:
The list of machines known to the system in order of priority from highest (first) to lowest (last). Each machine is described by a dictionary with the following keys:
name
the name of the machine.
tags
the list [‘tag’, …] of tags the machine has.
width
andheight
the dimensions of the machine in triads.
dead_boards
a list([(x, y, z), …]) giving the coordinates of known-dead boards.
dead_links
a list([(x, y, z, link), …]) giving the locations of known-dead links from the perspective of the sender. Links to dead boards may or may not be included in this list.
- get_board_position(machine_name, x, y, z)[source]¶
Get the physical location of a specified board.
- Parameters:
- Returns:
The physical location of the board (cabinet, frame, board) at the specified location or None if the machine/board are not recognised.
- Return type:
- get_board_at_position(machine_name, x, y, z)[source]¶
Get the logical location of a board at the specified physical location.
- Parameters:
- Returns:
The logical location of the board (a triple) at the specified location or None if the machine/board are not recognised.
- Return type:
- where_is(**kwargs)[source]¶
Find out where a SpiNNaker board or chip is located, logically and physically.
May be called in one of the following styles:
>>> # Query by logical board coordinate within a machine. >>> where_is(machine=..., x=..., y=..., z=...) >>> # Query by physical board location within a machine. >>> where_is(machine=..., cabinet=..., frame=..., board=...) >>> # Query by chip coordinate (as if the machine were booted as >>> # one large machine). >>> where_is(machine=..., chip_x=..., chip_y=...) >>> # Query by chip coordinate, within the boards allocated to a >>> # job. >>> where_is(job_id=..., chip_x=..., chip_y=...)
Only these patterns of use are supported; all keyword arguments listed above are mandatory when used for a particular query.
- Return type:
- Returns:
If a board exists at the supplied location, a dictionary giving the location of the board/chip, supplied in a number of alternative forms. If the supplied coordinates do not specify a specific chip, the chip coordinates given are those of the Ethernet connected chip on that board.
If no board exists at the supplied position,
None
is returned instead.The dictionary will have the following keys:
machine
the name of the machine containing the board.
logical
the logical board coordinate, (x, y, z) within the machine.
physical
the physical board location, (cabinet, frame, board), within the machine.
chip
the coordinates of the chip, (x, y), if the whole machine were booted as a single machine.
board_chip
the coordinates of the chip, (x, y), within its board.
job_id
the job ID of the job currently allocated to the board identified or
None
if the board is not allocated to a job.job_chip
the coordinates of the chip, (x, y), within its job, if a job is allocated to the board, or
None
otherwise.