Server Management and Configuration

Installation and Requirements

The server and its dependencies can be installed from PyPI like so:

$ pip install spalloc_server

The server is currently only compatible with Linux due to its use of the poll() system call and inotify subsystem.

Operation

The SpiNNaker partitioning server is started like so:

$ spalloc_server configfile.cfg

The server is configured by a configuration file whose name is supplied as the first command-line argument. See the section below for an overview of the config file format. If the config file is modified while the server is running the new configuration will be loaded on-the-fly and, when possible, running jobs will continue to execute without interruption and queued jobs will automatically be enqueued on any newly added machines.

Stopping the server

The server runs until it is terminated by SIGINT, i.e. pressing ctrl+c. When terminated the server attempts to gracefully shut-down, completing any outstanding board power-management commands and saving its state to disk. When the server is subsequently restarted, the saved state is restored and operation may continue as if the server had never been shut-down. Alternatively a cold-start may be enforced using the --cold-start argument when starting the server.

When the server is terminated, machines allocated to running jobs are left powered on meaning that user’s jobs are not interrupted by the partitioning server being restarted. If it is necessary to perform major maintenance and fully shut-down the server and all controlled SpiNNaker machines, the recommended approach is to add the following line to the bottom of the server’s config file:

configuration = Configuration()

This causes the server to believe all machines have been removed resulting in all queued and running jobs being cancelled and all previously allocated SpiNNaker boards being powered-down. To maximise the chances of all clients realising their jobs have been cancelled, the server should then be left running for a few minutes before being finally shut down.

Configuration file format

A configuration file is used to describe the machines which are to be managed. Configuration files are Python scripts which define a global configuration variable which is an instance of the Configuration class.

Note

Everything in spalloc_server.configuration and spalloc_server.coordinates modules is implicitly imported into the namespace of the config file.

A minimal (though useless) configuration file would look like so:

configuration = Configuration()

This causes the server to listen on all interfaces on the default port but does not define any machines for the server to manage. As a result, this server will cancel all jobs sent to it for lack of a suitable machine. See the Configuration class for a description of all available configuration options and default values.

Machines are defined using Machine objects. These specify the dimensions, broken boards and links, physical layout and IP addresses of a SpiNNaker machine. All machines are presumed to be interconnected in a valid hexagonal torus topology constructed from a rectangular array of triads of boards. (See also spalloc_server.coordinates for details of the coordinate systems used when referring to boards.)

Defining Machines

Since defining machines completely by hand can be quite verbose (see example below), a some convenience functions are provided to deal with the common case of machines constructed in the standard manner.

To define an isolated single-board machine, the Machine.single_board() constructor may be used as follows:

m = Machine.single_board("my-board",
                         bmp_ip="spinn-board-bmp",
                         spinnaker_ip="spinn-board")
configuration = Configuration(machines=[m])

Most multi-board systems follow a standardised IP addressing scheme and have their physical layout defined by SpiNNer. The board_locations_from_spinner() function reads CSV files produced by spinner-ethernet-chips describing machine layouts and the Machine.with_standard_ips() constructor produces Machines with IP addresses based on the standard IP addressing scheme. These may be used together like so:

# spinner-ethernet-chips -n 1200 > ethernet_chips.csv
m = Machine.with_standard_ips(
    "million-core-machine",
    board_locations=board_locations_from_spinner("ethernet_chips.csv"),
    base_ip="10.2.0.0",
)
configuration = Configuration(machines=[m])

If neither of the above convenience functions apply to your machine, you can also explicitly define your machine’s parameters. (Be sure to read about the coordinates used when referring to boards.) For example, a desktop 3-board machine may look something like:

m = Machine(name="my-three-board-machine",
            board_locations={
                #X  Y  Z    C  F  B
                (0, 0, 0): (0, 0, 0),
                (0, 0, 1): (0, 0, 2),
                (0, 0, 2): (0, 0, 5),
            },
            # Just one BMP
            bmp_ips={
                #C  F
                (0, 0): "192.168.240.0",
            },
            # Each SpiNNaker board has an IP
            spinnaker_ips={
                #X  Y  Z
                (0, 0, 0): "192.168.240.1",
                (0, 0, 1): "192.168.240.17",
                (0, 0, 2): "192.168.240.41",
            })
configuration = Configuration(machines=[m])

Remember, since the configuration file is just a normal Python file, you can use any code you like to pragmatically specify machines, etc. which you use.

Configuration File API Reference

class Configuration[source]

Defines the configuration of a server.

Parameters:

machines : [Machine, ...]

The list of machines, highest priority first, the server is to manage. (Default: [])

port : int

The port number the server should listen on. (Default: 22244)

ip : str

The IP the server should listen on. (Default: “”, i.e. all interfaces)

timeout_check_interval : float

The number of seconds between the server’s checks for job timeouts. (Default: 5.0)

max_retired_jobs : int

The number of retired jobs to keep records of. (Default: 1200)

class Machine[source]

Defines a SpiNNaker machine.

Parameters:

name : str

The name of the machine.

tags : set([str, ...])

A set of tags which jobs may use to filter machines by. Note that by default jobs are assigned the ‘default’ tag and thus machines probably ought have this tag too.

width, height : int

The dimensions of the machine in triads of boards. If omitted, these are inferred from the boards defined in board_locations and dead_boards.

dead_boards : set([(x, y, z), ...])

The board coordinates of all dead boards in the machine.

dead_links : set([(x, y, z, Links), ...])

The board coordinates of all dead links in the machine. Links to dead boards are implicitly dead and may or may not be included in this set.

board_locations : {(x, y, z): (c, f, b), ...}

Lookup from board coordinate to its physical in a SpiNNaker machine in terms of cabinet, frame and board position. Must give the coordinates of all working boards.

bmp_ips : {(c, f): hostname, ...}

The IP address of a BMP in every frame of the machine which contains working boards.

spinnaker_ips : {(x, y, z): hostname, ...}

For every working board gives the IP address of the SpiNNaker board’s Ethernet connected chip.

classmethod single_board(name, tags=set(['default']), bmp_ip=None, spinnaker_ip=None)[source]

Convenience constructor. Construct a Machine representing a single SpiNNaker board.

Parameters:

name : str

The name of the machine

tags : set([tag, ...])

The tags to assign to the machine.

bmp_ip : str

The hostname of the BMP controlling the board.

spinnaker_ip : str

The hostname of the SpiNNaker board.

classmethod with_standard_ips(name, tags=set(['default']), width=None, height=None, dead_boards=set([]), dead_links=set([]), board_locations={}, base_ip='192.168.0.0', cabinet_stride='0.0.5.0', frame_stride='0.0.1.0', board_stride='0.0.0.8', bmp_offset='0.0.0.0', spinnaker_offset='0.0.0.1')[source]

Convenience constructor. Construct a Machine which infers IP addresses of the form conventionally used by SpiNNaker installations.

In standard SpiNNaker installations, IP addresses are allocated in a regular fashion as described below.

IP addresses for a particular machine are allocated within an address range, e.g. 192.168.0.0 - 192.168.255.255.

This address range is then subdivided into address ranges for each frame, for example:

  • Cabinet 0, Frame 0: 192.168.0.0 - 192.168.0.255
  • Cabinet 0, Frame 1: 192.168.1.0 - 192.168.1.255
  • Cabinet 0, Frame 2: 192.168.2.0 - 192.168.2.255
  • Cabinet 0, Frame 3: 192.168.3.0 - 192.168.3.255
  • Cabinet 0, Frame 4: 192.168.4.0 - 192.168.4.255
  • Cabinet 1, Frame 0: 192.168.5.0 - 192.168.5.255
  • ...

Boards within a frame are each allocated their own range of IPs, for example:

  • Cabinet 0, Frame 0, Board 0: 192.168.0.0 - 192.168.0.7
  • Cabinet 0, Frame 0, Board 1: 192.168.0.8 - 192.168.0.15
  • Cabinet 0, Frame 0, Board 2: 192.168.0.16 - 192.168.0.23
  • ...

Finally, the IP address of the BMP and Ethernet-connected SpiNNaker chip of each board is at some fixed offset within this range, for example:

  • Cabinet 0, Frame 0, Board 0, BMP: 192.168.0.0
  • Cabinet 0, Frame 0, Board 0, SpiNNaker: 192.168.0.1
  • Cabinet 0, Frame 0, Board 1, BMP: 192.168.0.8
  • Cabinet 0, Frame 0, Board 1, SpiNNaker: 192.168.0.9

Finally, we assume that board 0’s BMP is to be used as the BMP for controlling all boards in its frame.

Parameters:

name : str

The name of the machine.

tags : set([str, ...])

A set of tags which jobs may use to filter machines by. Note that by default jobs are assigned the ‘default’ tag and thus machines probably ought have this tag too.

width, height : int

The dimensions of the machine in triads of boards. If omitted, these are inferred from the boards defined in board_locations and dead_boards.

dead_boards : set([(x, y, z), ...])

The board coordinates of all dead boards in the machine.

dead_links : set([(x, y, z, Links), ...])

The board coordinates of all dead links in the machine. Links to dead boards are implicitly dead and may or may not be included in this set.

board_locations : {(x, y, z): (c, f, b), ...}

Lookup from board coordinate to its physical in a SpiNNaker machine in terms of cabinet, frame and board position. Must give the coordinates of all working boards.

base_ip : str

The IPv4 address from which the IP address range assigned to the machine starts.

cabinet_stride : str

The stride in IP addresses between individual cabinets, expressed as an IPv4 address.

frame_stride : str

The stride in IP addresses between individual frames within a cabinet, expressed as an IPv4 address.

board_stride : str

The stride in IP addresses between individual boards within a frame, expressed as an IPv4 address.

bmp_offset : str

The offset of a board’s BMP IP from the start of a board’s IP address range, expressed as an IPv4 address.

spinnaker_offset : str

The offset of a board’s Ethernet-connected SpiNNaker chip IP from the start of a board’s IP address range, expressed as an IPv4 address.

board_locations_from_spinner(filename)[source]

Utility function which converts a CSV file produced by the spinner-ethernet-chips utility into a board_locations dictionary suitable for defining Machine objects.

Parameters:

filename : str

The name of a CSV file produced by spinner-ethernet-chips defining the relationship between Ethernet connected chip coordinates and physical board locations.

This file is expected to have five columns (named in the first line of the CSV) named ‘board’, ‘cabinet’, ‘frame’, ‘x’, and ‘y’.

Returns:

{(x, y, z): (c, f, b), ...}

The mapping from board coordinates to physical locations.

Coordinate systems

Utilities for working with board/triad coordinates.

This software assumes that all machines provided to it are interconnected in a valid (subset of) a hexagonal torus topology. Boards locations are expressed in one of several forms depending on circumstance.

  • Board coordinates (x, y, z) giving the logical location of the board within a hexagonal torus configuration. This is the most frequently used coordinate system used by this software.
  • Physical coordinates (cabinet, frame, board) giving the physical location of a board in a cabinet. Generally only used when dealing with board power management.
  • Ethernet chip coordinates (x, y) giving the chip coordinates of the Ethernet connected chip at the bottom-left coordinate of a SpiNNaker board. Generally only used when relating information to a client.

To deal with these coordinate systems a selection of utility functions are provided in the spalloc_server.coordinates.

Board coordinates

Board coordinates are given as a tuple (x, y, z).

Systems of SpiNNaker boards are defined in terms of ‘triads’ of boards. The figure below shows a single triad. The ‘z’ part of a board coordinate comes from the index of the board within its triad and are numbered as follows:

 ___
/ 2 \___
\___/ 1 \
/ 0 \___/
\___/

Larger systems are defined by replicating this pattern of triads. Triads are indexed along the X axis as follows:

 ___     ___     ___     ___
/ 2 \___/ 2 \___/ 2 \___/ 2 \___
\___/ 1 \___/ 1 \___/ 1 \___/ 1 \
/ 0 \___/ 0 \___/ 0 \___/ 0 \___/
\___/   \___/   \___/   \___/

    0       1       2       3

And then along the Y axis thus:

 ___     ___     ___     ___
/ 2 \___/ 2 \___/ 2 \___/ 2 \___
\___/ 1 \___/ 1 \___/ 1 \___/ 1 \   3
/ 0 \___/ 0 \___/ 0 \___/ 0 \___/
\___/ 2 \___/ 2 \___/ 2 \___/ 2 \___
    \___/ 1 \___/ 1 \___/ 1 \___/ 1 \   2
    / 0 \___/ 0 \___/ 0 \___/ 0 \___/
    \___/ 2 \___/ 2 \___/ 2 \___/ 2 \___
        \___/ 1 \___/ 1 \___/ 1 \___/ 1 \   1
        / 0 \___/ 0 \___/ 0 \___/ 0 \___/
        \___/ 2 \___/ 2 \___/ 2 \___/ 2 \___
            \___/ 1 \___/ 1 \___/ 1 \___/ 1 \   0
            / 0 \___/ 0 \___/ 0 \___/ 0 \___/
            \___/   \___/   \___/   \___/

                0       1       2       3

Physical coordinates

Physical coordinates are given as a tuple (cabinet, frame, board).

Physical coordinates give the positions of boards within a set of cabinets containing several frames containing several boards. These are indexed as illustrated below, starting from the top-right corner:

          2             1                0
Cabinet --+-------------+----------------+
          |             |                |
+-------------+  +-------------+  +-------------+    Frame
|             |  |             |  |             |      |
| +---------+ |  | +---------+ |  | +---------+ |      |
| | : : : : | |  | | : : : : | |  | | : : : : |--------+ 0
| | : : : : | |  | | : : : : | |  | | : : : : | |      |
| +---------+ |  | +---------+ |  | +---------+ |      |
| | : : : : | |  | | : : : : | |  | | : : : : |--------+ 1
| | : : : : | |  | | : : : : | |  | | : : : : | |      |
| +---------+ |  | +---------+ |  | +---------+ |      |
| | : : : : | |  | | : : : : | |  | | : : : : |--------+ 2
| | : : : : | |  | | : : : : | |  | | : : : : | |      |
| +---------+ |  | +---------+ |  | +---------+ |      |
| | : : : : | |  | | : : : : | |  | | : : : : |--------+ 3
| | : : : : | |  | | : : : : | |  | | : : : : | |
| +---------+ |  | +|-|-|-|-|+ |  | +---------+ |
|             |  |  | | | | |  |  |             |
+-------------+  +--|-|-|-|-|--+  +-------------+
                    | | | | |
         Board -----+-+-+-+-+
                    4 3 2 1 0

A mapping from board coordinates to physical coordinates is supplied by the user and is unique to the machine being built. A tool such as SpiNNer_ may be used to generate such mappings.

Ethernet chip coordinates

Ethernet chip coordinates are given as a tuple (x, y).

Ethernet chip coordinates give the chip coordinates of Ethernet connected chips at the bottom-left corner of SpiNNaker boards.

Utilities

The following utilities are provided for working with the above coordinate systems.

A lookup from (z, rig.links.Links) to (dx, dy, dz).

Get the coordinates of the board down the specified link.

Parameters:

x1, y1, z1 : int

The board coordinates from which a link will be traversed.

link : rig.links.Link

The link to follow.

width, height : int

The dimensions of the system in triads.

Returns:

x, y, z : int

The coordinates of the board down the specified link.

wrapped : WrapAround

In what way did we wrap-around when following that link?

board_to_chip(x, y, z)[source]

Convert a board coordinate into a chip coordinate.

Assumes a regular torus composed of SpiNN-5 boards.

Parameters:

x, y, z : int

Board coordinates.

Returns:

x, y : int

Chip coordinates.

chip_to_board(x, y, w, h)[source]

Convert a chip coordinate into a board coordinate.

Assumes a regular torus composed of SpiNN-5 boards.

Parameters:

x, y : int

Chip coordinates.

w, h : int

Dimensions of the system, in chips.

Returns:

x, y, z : int

Board coordinates.

triad_dimensions_to_chips(w, h, torus)[source]

Convert the dimensions of a system from numbers of triads to numbers of chips in the underlying network.

Assumes a regular torus composed of SpiNN-5 boards.

Parameters:

w, h : int

Dimensions of the system in triads.

torus : :py:class`.WrapAround`

What wrap-around connections are present?

Returns:

w, h : int

Dimensions of the SpiNNaker chip network in the specified machine, e.g. for booting.

class WrapAround[source]

Defines what type of wrap-around links a torus has, if any.

Values chosen have the following useful properties:

>>> # Can be meaningfully cast to bool
>>> assert bool(WrapAround.none) is False
>>> assert bool(WrapAround.x) is True
>>> assert bool(WrapAround.y) is True
>>> assert bool(WrapAround.both) is True

>>> # Bit-operations make sense
>>> assert bool(WrapAround.both & WrapAround.x) is True
>>> assert bool(WrapAround.both & WrapAround.y) is True
>>> assert bool(WrapAround.x & WrapAround.x) is True
>>> assert bool(WrapAround.x & WrapAround.y) is False
none = <WrapAround.none: 0>

No wrap-around links.

__eq__ = <method-wrapper '__eq__' of type object at 0x9197a0>
__ge__ = <method-wrapper '__ge__' of type object at 0x9197a0>
__gt__ = <method-wrapper '__gt__' of type object at 0x9197a0>
__hash__() <==> hash(x)
__le__ = <method-wrapper '__le__' of type object at 0x9197a0>
__lt__ = <method-wrapper '__lt__' of type object at 0x9197a0>
__ne__ = <method-wrapper '__ne__' of type object at 0x9197a0>
x = <WrapAround.x: 1>

Has wrap around links around X-axis.

y = <WrapAround.y: 2>

Has wrap around links around Y-axis.

both = <WrapAround.both: 3>

Has wrap around links on X and Y axes.