Communication Protocol

The SpiNNaker partition server uses a simple JSON-based protocol over TCP to communicate with clients. The protocol has no security features what-so-ever, just like SpiNNaker hardware, and it is assumed that the server is operated within the same trusted network as the boards it manages.

By default, the server listens on TCP port number 22244. The client and server communicate by sending and receiving newline (\n) delimited JSON objects (i.e. lines of the form {...}). Clients may send commands to the server to which return values are sent by the server. The server may also asynchronously send notifications to the client, if requested by the client.

As soon as a client connects to the server it should send a version() command and ensure the server version is compatible with the client.

Sending Commands

Commands map exactly to Python function calls. A command has a name and arguments and returns a value. To send a command, a client must sent a JSON object the following keys, followed by a newline:

“command”

A string. The name of the command to be executed.

“args”

An array. A list of positional arguments for the command.

“kwargs”

An object. A list of keyword arguments for the command.

For example, if a client sent the following to the server:

{"command": "create_job", "args": [4, 2], "kwargs": {"owner": "me"}}\n

This would be interpreted as a function call like:

create_job(4, 2, owner="me")

Note

In all examples, \n means a newline character (ASCII 10), not the \ and n characters.

The server will then respond with a JSON object with a single key, “return”, whose value is the value returned by the command. For example:

{"return": 42}\n

Commands are processed and return values sent in FIFO order. No blocking commands are implemented by the server and the server will make a best-effort attempt to respond to all commands as quickly as possible. If any command is malformed or causes an error for any reason, the client is immediately disconnected.

Receiving Asynchronous Notifications

If the client requests to be notified of certain events (using a command, as described above), the server may send a JSON object to the client which does not contain the key “return”. Notifications may be sent at any time, once requested, including between a function being called and the return value being sent. The exact format of the notification depends on its type. An example notification may look like the following:

{"jobs_changed": [42, 10, 3]}\n

Available Commands

The commands supported by the server are enumerated below and expressed in the form of Python functions.

version()[source]
Return type:

str

Returns:

The server’s version number.

create_job(*args, **kwargs)[source]

Create a new job (i.e. allocation of boards).

This function should be called in one of the following styles:

# Any single (SpiNN-5) board
job_id = create_job(owner="me")
job_id = create_job(1, owner="me")

# Board x=3, y=2, z=1 on the machine named "m"
job_id = create_job(3, 2, 1, machine="m", owner="me")

# Any machine with at least 4 boards
job_id = create_job(4, owner="me")

# Any 7-or-more board machine with an aspect ratio at least as
# square as 1:2
job_id = create_job(7, min_ratio=0.5, owner="me")

# Any 4x5 triad segment of a machine (may or may-not be a
# torus/full machine)
job_id = create_job(4, 5, owner="me")

# Any torus-connected (full machine) 4x2 machine
job_id = create_job(4, 2, require_torus=True, owner="me")

The ‘other parameters’ enumerated below may be used to further restrict what machines the job may be allocated onto.

Jobs for which no suitable machines are available are immediately destroyed (and the reason given).

Once a job has been created, it must be ‘kept alive’ by a simple watchdog mechanism. Jobs may be kept alive by periodically calling the job_keepalive() command or by calling any other job-specific command. Jobs are culled if no keep alive message is received for keepalive seconds. If absolutely necessary, a job’s keepalive value may be set to None, disabling the keepalive mechanism.

Once a job has been allocated some boards, these boards will be automatically powered on and left unbooted ready for use.

Parameters:
  • owner (str) – Required. The name of the owner of this job.

  • keepalive (float or None) – The maximum number of seconds which may elapse between a query on this job before it is automatically destroyed. If None, no timeout is used. (Default: 60.0)

  • machine (str or None) – Specify the name of a machine which this job must be executed on. If None, the first suitable machine available will be used, according to the tags selected below. Must be None when tags are given. (Default: None)

  • tags (list(str) or None) – The set of tags which any machine running this job must have. If None is supplied, only machines with the “default” tag will be used. If machine is given, this argument must be None. (Default: None)

  • min_ratio (float) – The aspect ratio (h/w) which the allocated region must be ‘at least as square as’. Set to 0.0 for any allowable shape, 1.0 to be exactly square. Ignored when allocating single boards or specific rectangles of triads.

  • max_dead_boards (int or None) – The maximum number of broken or unreachable boards to allow in the allocated region. If None, any number of dead boards is permitted, as long as the board on the bottom-left corner is alive (Default: None).

  • max_dead_links (int or None) – The maximum number of broken links allow in the allocated region. When require_torus is True this includes wrap-around links, otherwise peripheral links are not counted. If None, any number of broken links is allowed. (Default: None).

  • require_torus (bool) – If True, only allocate blocks with torus connectivity. In general this will only succeed for requests to allocate an entire machine (when the machine is otherwise not in use!). Must be False when allocating boards. (Default: False)

Returns:

The job ID given to the newly allocated job.

Return type:

int

job_keepalive(job_id)[source]

Reset the keepalive timer for the specified job.

Note

All other job-specific commands implicitly do this.

Parameters:

job_id (int) – A job ID to be kept alive.

get_job_state(job_id)[source]

Poll the state of a running job.

Parameters:

job_id (int) – A job ID to get the state of.

Return type:

dict(str, …)

Returns:

A dictionary with the following keys:

stateJobState

The current state of the queried job.

powerbool or None

If job is in the ready or power states, indicates whether the boards are power{ed,ing} on (True), or power{ed,ing} off (False). In other states, this value is None.

keepalivefloat or None

The Job’s keepalive value: the number of seconds between queries about the job before it is automatically destroyed. None if no timeout is active (or when the job has been destroyed).

reasonstr or None

If the job has been destroyed, this may be a string describing the reason the job was terminated.

start_timefloat or None

For queued and allocated jobs, gives the Unix time (UTC) at which the job was created (or None otherwise).

get_job_machine_info(job_id)[source]

Get the list of Ethernet connections to the allocated machine.

Parameters:

job_id (int) – A job ID to get the machine info for.

Return type:

dict(str, …)

Returns:

A dictionary with the following keys:

width, heightint or None

The dimensions of the machine in chips, e.g. for booting. None if no boards are allocated to the job.

connections[[[x, y], hostname], …] or None

A list giving Ethernet-connected chip coordinates in the machine to hostname. None if no boards are allocated to the job.

machine_namestr or None

The name of the machine the job is allocated on. None if no boards are allocated to the job.

boards[[x, y, z], …] or None

All the boards allocated to the job or None if no boards allocated.

power_on_job_boards(job_id)[source]

Power on (or reset if already on) boards associated with a job.

Once called, the job will enter the ‘power’ state until the power state change is complete, this may take some time.

Parameters:

job_id (int) – A job ID to turn boards on for.

power_off_job_boards(job_id)[source]

Power off boards associated with a job.

Once called, the job will enter the ‘power’ state until the power state change is complete, this may take some time.

Parameters:

job_id (int) – A job ID to turn boards off for.

destroy_job(job_id, reason=None)[source]

Destroy a job.

Call when the job is finished, or to terminate it early, this function releases any resources consumed by the job and removes it from any queues.

Parameters:
  • job_id (int) – A job ID to destroy.

  • reason (str) – An optional human-readable description of the reason for the job’s destruction.

notify_job(job_id=None)[source]

Register to be notified about changes to a specific job ID.

Once registered, a client will be asynchronously be sent notifications form {"jobs_changed": [job_id, ...]}\n enumerating job IDs which have changed. Notifications are sent when a job changes state, for example when created, queued, powering on/off, powered on and destroyed. The specific nature of the change is not reflected in the notification.

Parameters:

job_id (int or None) – A job ID to be notified of or None if all job state changes should be reported. Defaults to None (i.e., all jobs).

See also

no_notify_job

Stop being notified about a job.

notify_machine

Register to be notified about changes to machines.

no_notify_job(job_id=None)[source]

Stop being notified about a specific job ID.

Once this command returns, no further notifications for the specified ID will be received.

Parameters:

job_id (int or None) – A job ID to no longer be notified of or None to not be notified of any jobs. Note that if all job IDs were registered for notification, this command only has an effect if the specified job_id is None. Defaults to None (i.e., all jobs).

See also

notify_job

Register to be notified about changes to a specific job.

notify_machine(machine_name=None)[source]

Register to be notified about a specific machine name.

Once registered, a client will be asynchronously be sent notifications of the form {"machines_changed": [machine_name, ...]}\n enumerating machine names which have changed. Notifications are sent when a machine changes state, for example when created, change, removed, allocated a job or an allocated job is destroyed.

Parameters:

machine_name (str or None) – A machine name to be notified of or None if all machine state changes should be reported. Defaults to None (i.e., all machines).

See also

no_notify_machine

Stop being notified about a machine.

notify_job

Register to be notified about changes to jobs.

no_notify_machine(machine_name=None)[source]

Unregister to be notified about a specific machine name.

Once this command returns, no further notifications for the specified ID will be received.

Parameters:

machine_name (str or None) – A machine name to no longer be notified of or None to not be notified of any machines. Note that if all machines were registered for notification, this command only has an effect if the specified machine_name is None. Defaults to None (i.e., all machines).

See also

notify_machine

Register to be notified about changes to a machine.

list_jobs()[source]

Enumerate all non-destroyed jobs.

Return type:

list(dict(str, …))

Returns:

A list of allocated/queued jobs in order of creation from oldest (first) to newest (last). Each job is described by a dictionary with the following keys:

job_id

the ID of the job.

owner

the string giving the name of the Job’s owner.

start_time

the time the job was created (Unix time, UTC).

keepalive

the maximum time allowed between queries for this job before it is automatically destroyed (or None if the job can remain allocated indefinitely).

state

the current JobState of the job.

power

indicates whether the boards are powered on or not. If job is in the ready or power states, indicates whether the boards are power{ed,ing} on (True), or power{ed,ing} off (False). In other states, this value is None.

args and kwargs

the arguments to the alloc function which specifies the type/size of allocation requested and the restrictions on dead boards, links and torus connectivity.

allocated_machine_name

the name of the machine the job has been allocated to run on (or None if not allocated yet).

boards

a list [(x, y, z), …] of boards allocated to the job.

keepalivehost

the IP address of the host reckoned to be keeping this job alive (i.e., the host that did a request most recently that updated the internal keep-alive timeout).

list_machines()[source]

Enumerates all machines known to the system.

Return type:

list(dict(str, …))

Returns:

The list of machines known to the system in order of priority from highest (first) to lowest (last). Each machine is described by a dictionary with the following keys:

name

the name of the machine.

tags

the list [‘tag’, …] of tags the machine has.

width and height

the dimensions of the machine in triads.

dead_boards

a list([(x, y, z), …]) giving the coordinates of known-dead boards.

dead_links

a list([(x, y, z, link), …]) giving the locations of known-dead links from the perspective of the sender. Links to dead boards may or may not be included in this list.

get_board_position(machine_name, x, y, z)[source]

Get the physical location of a specified board.

Parameters:
  • machine_name (str) – The name of the machine containing the board.

  • x (int) – Logical address within machine: first coordinate.

  • y (int) – Logical address within machine: second coordinate.

  • z (int) – Logical address within machine: third coordinate.

Returns:

The physical location of the board (cabinet, frame, board) at the specified location or None if the machine/board are not recognised.

Return type:

list(int) or None

get_board_at_position(machine_name, x, y, z)[source]

Get the logical location of a board at the specified physical location.

Parameters:
  • machine_name (str) – The name of the machine containing the board.

  • x (int) – Physical address within machine: cabinet ID.

  • y (int) – Physical address within machine: frame ID.

  • z (int) – Physical address within machine: board ID.

Returns:

The logical location of the board (a triple) at the specified location or None if the machine/board are not recognised.

Return type:

list(int) or None

where_is(**kwargs)[source]

Find out where a SpiNNaker board or chip is located, logically and physically.

May be called in one of the following styles:

>>> # Query by logical board coordinate within a machine.
>>> where_is(machine=..., x=..., y=..., z=...)

>>> # Query by physical board location within a machine.
>>> where_is(machine=..., cabinet=..., frame=..., board=...)

>>> # Query by chip coordinate (as if the machine were booted as
>>> # one large machine).
>>> where_is(machine=..., chip_x=..., chip_y=...)

>>> # Query by chip coordinate, within the boards allocated to a
>>> # job.
>>> where_is(job_id=..., chip_x=..., chip_y=...)

Only these patterns of use are supported; all keyword arguments listed above are mandatory when used for a particular query.

Return type:

dict(str, …) or None

Returns:

If a board exists at the supplied location, a dictionary giving the location of the board/chip, supplied in a number of alternative forms. If the supplied coordinates do not specify a specific chip, the chip coordinates given are those of the Ethernet connected chip on that board.

If no board exists at the supplied position, None is returned instead.

The dictionary will have the following keys:

machine

the name of the machine containing the board.

logical

the logical board coordinate, (x, y, z) within the machine.

physical

the physical board location, (cabinet, frame, board), within the machine.

chip

the coordinates of the chip, (x, y), if the whole machine were booted as a single machine.

board_chip

the coordinates of the chip, (x, y), within its board.

job_id

the job ID of the job currently allocated to the board identified or None if the board is not allocated to a job.

job_chip

the coordinates of the chip, (x, y), within its job, if a job is allocated to the board, or None otherwise.

class JobState[source]

A job may be in any of the following (numbered) states.

Number

State

0

unknown

1

queued

2

power

3

ready

4

destroyed