Communication Protocol

The SpiNNaker partition server uses a simple JSON-based protocol over TCP to communicate with clients. The protocol has no security features what-so-ever, just like SpiNNaker hardware, and it is assumed that the server is operated within the same trusted network as the boards it manages.

By default, the server listens on TCP port number 22244. The client and server communicate by sending and receiving newline (\n) delimited JSON objects (i.e. lines of the form {...}). Clients may send commands to the server to which return values are sent by the server. The server may also asynchronously send notifications to the client, if requested by the client.

As soon as a client connects to the server it should send a version() command and ensure the server version is compatible with the client.

Sending Commands

Commands map exactly to Python function calls. A command has a name and arguments and returns a value. To send a command, a client must sent a JSON object the following keys, followed by a newline:

“command”
A string. The name of the command to be executed.
“args”
An array. A list of positional arguments for the command.
“kwargs”
An object. A list of keyword arguments for the command.

For example, if a client sent the following to the server:

{"command": "create_job", "args": [4, 2], "kwargs": {"owner": "me"}}\n

This would be interpreted as a function call like:

create_job(4, 2, owner="me")

Note

In all examples, \n means a newline character (ASCII 10), not the \ and n characters.

The server will then respond with a JSON object with a single key, “return”, whose value is the value returned by the command. For example:

{"return": 42}\n

Commands are processed and return values sent in FIFO order. No blocking commands are implemented by the server and the server will make a best-effort attempt to respond to all commands as quickly as possible. If any command is malformed or causes an error for any reason, the client is immediately disconnected.

Receiving Asynchronous Notifications

If the client requests to be notified of certain events (using a command, as described above), the server may send a JSON object to the client which does not contain the key “return”. Notifications may be sent at any time, once requested, including between a function being called and the return value being sent. The exact format of the notification depends on its type. An example notification may look like the following:

{"jobs_changed": [42, 10, 3]}\n

Available Commands

The commands supported by the server are enumerated below and expressed in the form of Python functions.

class JobState[source]

A job may be in any of the following (numbered) states.

Number State
0 unknown
1 queued
2 power
3 ready
4 destroyed
version()[source]
Returns:

str

The server’s version number.

create_job(*args, **kwargs)[source]

Create a new job (i.e. allocation of boards).

This function should be called in one of the following styles:

# Any single (SpiNN-5) board
job_id = create_job(owner="me")
job_id = create_job(1, owner="me")

# Board x=3, y=2, z=1 on the machine named "m"
job_id = create_job(3, 2, 1, machine="m", owner="me")

# Any machine with at least 4 boards
job_id = create_job(4, owner="me")

# Any 7-or-more board machine with an aspect ratio at least as
# square as 1:2
job_id = create_job(7, min_ratio=0.5, owner="me")

# Any 4x5 triad segment of a machine (may or may-not be a
# torus/full machine)
job_id = create_job(4, 5, owner="me")

# Any torus-connected (full machine) 4x2 machine
job_id = create_job(4, 2, require_torus=True, owner="me")

The ‘other parameters’ enumerated below may be used to further restrict what machines the job may be allocated onto.

Jobs for which no suitable machines are available are immediately destroyed (and the reason given).

Once a job has been created, it must be ‘kept alive’ by a simple watchdog mechanism. Jobs may be kept alive by periodically calling the job_keepalive() command or by calling any other job-specific command. Jobs are culled if no keep alive message is received for keepalive seconds. If absolutely necessary, a job’s keepalive value may be set to None, disabling the keepalive mechanism.

Once a job has been allocated some boards, these boards will be automatically powered on and left unbooted ready for use.

Parameters:

owner : str

Required. The name of the owner of this job.

keepalive : float or None

Optional. The maximum number of seconds which may elapse between a query on this job before it is automatically destroyed. If None, no timeout is used. (Default: 60.0)

Returns:

int

The job ID given to the newly allocated job.

Other Parameters:
 

machine : str or None

Optional. Specify the name of a machine which this job must be executed on. If None, the first suitable machine available will be used, according to the tags selected below. Must be None when tags are given. (Default: None)

tags : [str, ...] or None

Optional. The set of tags which any machine running this job must have. If None is supplied, only machines with the “default” tag will be used. If machine is given, this argument must be None. (Default: None)

min_ratio : float

The aspect ratio (h/w) which the allocated region must be ‘at least as square as’. Set to 0.0 for any allowable shape, 1.0 to be exactly square. Ignored when allocating single boards or specific rectangles of triads.

max_dead_boards : int or None

The maximum number of broken or unreachable boards to allow in the allocated region. If None, any number of dead boards is permitted, as long as the board on the bottom-left corner is alive (Default: None).

max_dead_links : int or None

The maximum number of broken links allow in the allocated region. When require_torus is True this includes wrap-around links, otherwise peripheral links are not counted. If None, any number of broken links is allowed. (Default: None).

require_torus : bool

If True, only allocate blocks with torus connectivity. In general this will only succeed for requests to allocate an entire machine (when the machine is otherwise not in use!). Must be False when allocating boards. (Default: False)

job_keepalive(job_id)[source]

Reset the keepalive timer for the specified job.

Note all other job-specific commands implicitly do this.

get_job_state(job_id)[source]

Poll the state of a running job.

Returns:

{“state”: state, “power”: power

“keepalive”: keepalive, “reason”: reason}

Where:

state : JobState

The current state of the queried job.

power : bool or None

If job is in the ready or power states, indicates whether the boards are power{ed,ing} on (True), or power{ed,ing} off (False). In other states, this value is None.

keepalive : float or None

The Job’s keepalive value: the number of seconds between queries about the job before it is automatically destroyed. None if no timeout is active (or when the job has been destroyed).

reason : str or None

If the job has been destroyed, this may be a string describing the reason the job was terminated.

start_time : float or None

For queued and allocated jobs, gives the Unix time (UTC) at which the job was created (or None otherwise).

get_job_machine_info(job_id)[source]

Get the list of Ethernet connections to the allocated machine.

Returns:

{“width”: width, “height”: height, “connections”: connections, “machine_name”: machine_name}

Where:

width, height : int or None

The dimensions of the machine in chips, e.g. for booting.

None if no boards are allocated to the job.

connections : [[[x, y], hostname], ...] or None

A list giving Ethernet-connected chip coordinates in the machine to hostname.

None if no boards are allocated to the job.

machine_name : str or None

The name of the machine the job is allocated on.

None if no boards are allocated to the job.

boards : [[x, y, z], ...] or None

All the boards allocated to the job or None if no boards allocated.

power_on_job_boards(job_id)[source]

Power on (or reset if already on) boards associated with a job.

Once called, the job will enter the ‘power’ state until the power state change is complete, this may take some time.

power_off_job_boards(job_id)[source]

Power off boards associated with a job.

Once called, the job will enter the ‘power’ state until the power state change is complete, this may take some time.

destroy_job(job_id, reason=None)[source]

Destroy a job.

Call when the job is finished, or to terminate it early, this function releases any resources consumed by the job and removes it from any queues.

Parameters:

reason : str or None

Optional. A human-readable string describing the reason for the job’s destruction.

notify_job(job_id=None)[source]

Register to be notified about changes to a specific job ID.

Once registered, a client will be asynchronously be sent notifications form {"jobs_changed": [job_id, ...]}\n enumerating job IDs which have changed. Notifications are sent when a job changes state, for example when created, queued, powering on/off, powered on and destroyed. The specific nature of the change is not reflected in the notification.

Parameters:

job_id : int or None

A job ID to be notified of or None if all job state changes should be reported.

See also

no_notify_job
Stop being notified about a job.
notify_machine
Register to be notified about changes to machines.
no_notify_job(job_id=None)[source]

Stop being notified about a specific job ID.

Once this command returns, no further notifications for the specified ID will be received.

Parameters:

job_id : id or None

A job ID to no longer be notified of or None to not be notified of any jobs. Note that if all job IDs were registered for notification, this command only has an effect if the specified job_id is None.

See also

notify_job
Register to be notified about changes to a specific job.
notify_machine(machine_name=None)[source]

Register to be notified about a specific machine name.

Once registered, a client will be asynchronously be sent notifications of the form {"machines_changed": [machine_name, ...]}\n enumerating machine names which have changed. Notifications are sent when a machine changes state, for example when created, change, removed, allocated a job or an allocated job is destroyed.

Parameters:

machine_name : machine or None

A machine name to be notified of or None if all machine state changes should be reported.

See also

no_notify_machine
Stop being notified about a machine.
notify_job
Register to be notified about changes to jobs.
no_notify_machine(machine_name=None)[source]

Unregister to be notified about a specific machine name.

Once this command returns, no further notifications for the specified ID will be received.

Parameters:

machine_name : name or None

A machine name to no longer be notified of or None to not be notified of any machines. Note that if all machines were registered for notification, this command only has an effect if the specified machine_name is None.

See also

notify_machine
Register to be notified about changes to a machine.
list_jobs()[source]

Enumerate all non-destroyed jobs.

Returns:

jobs : [{...}, ...]

A list of allocated/queued jobs in order of creation from oldest (first) to newest (last). Each job is described by a dictionary with the following keys:

“job_id” is the ID of the job.

“owner” is the string giving the name of the Job’s owner.

“start_time” is the time the job was created (Unix time, UTC).

“keepalive” is the maximum time allowed between queries for this job before it is automatically destroyed (or None if the job can remain allocated indefinitely).

“state” is the current JobState of the job.

“power” indicates whether the boards are powered on or not. If job is in the ready or power states, indicates whether the boards are power{ed,ing} on (True), or power{ed,ing} off (False). In other states, this value is None.

“args” and “kwargs” are the arguments to the alloc function which specifies the type/size of allocation requested and the restrictions on dead boards, links and torus connectivity.

“allocated_machine_name” is the name of the machine the job has been allocated to run on (or None if not allocated yet).

“boards” is a list [(x, y, z), ...] of boards allocated to the job.

list_machines()[source]

Enumerates all machines known to the system.

Returns:

machines : [{...}, ...]

The list of machines known to the system in order of priority from highest (first) to lowest (last). Each machine is described by a dictionary with the following keys:

“name” is the name of the machine.

“tags” is the list [‘tag’, ...] of tags the machine has.

“width” and “height” are the dimensions of the machine in triads.

“dead_boards” is a list([(x, y, z), ...]) giving the coordinates of known-dead boards.

“dead_links” is a list([(x, y, z, link), ...]) giving the locations of known-dead links from the perspective of the sender. Links to dead boards may or may not be included in this list.

get_board_position(machine_name, x, y, z)[source]

Get the physical location of a specified board.

Parameters:

machine_name : str

The name of the machine containing the board.

x, y, z : int

The logical board location within the machine.

Returns:

(cabinet, frame, board) or None

The physical location of the board at the specified location or None if the machine/board are not recognised.

get_board_at_position(machine_name, x, y, z)[source]

Get the logical location of a board at the specified physical location.

Parameters:

machine_name : str

The name of the machine containing the board.

cabinet, frame, board : int

The physical board location within the machine.

Returns:

(x, y, z) or None

The logical location of the board at the specified location or None if the machine/board are not recognised.

where_is(**kwargs)[source]

Find out where a SpiNNaker board or chip is located, logically and physically.

May be called in one of the following styles:

>>> # Query by logical board coordinate within a machine.
>>> where_is(machine=..., x=..., y=..., z=...)

>>> # Query by physical board location within a machine.
>>> where_is(machine=..., cabinet=..., frame=..., board=...)

>>> # Query by chip coordinate (as if the machine were booted as
>>> # one large machine).
>>> where_is(machine=..., chip_x=..., chip_y=...)

>>> # Query by chip coordinate, within the boards allocated to a
>>> # job.
>>> where_is(job_id=..., chip_x=..., chip_y=...)
Returns:

{“machine”: ..., “logical”: ..., “physical”: ..., “chip”: ..., “board_chip”: ..., “job_chip”: ..., “job_id”: ...} or None

If a board exists at the supplied location, a dictionary giving the location of the board/chip, supplied in a number of alternative forms. If the supplied coordinates do not specify a specific chip, the chip coordinates given are those of the Ethernet connected chip on that board.

If no board exists at the supplied position, None is returned instead.

machine gives the name of the machine containing the board.

logical the logical board coordinate, (x, y, z) within the machine.

physical the physical board location, (cabinet, frame, board), within the machine.

chip the coordinates of the chip, (x, y), if the whole machine were booted as a single machine.

board_chip the coordinates of the chip, (x, y), within its board.

job_id is the job ID of the job currently allocated to the board identified or None if the board is not allocated to a job.

job_chip the coordinates of the chip, (x, y), within its job, if a job is allocated to the board or None otherwise.