Agents names

player_i for i in [0, 1]

Action Space


Observation Space

Dict(‘action_mask’: MultiBinary(192), ‘observation’: Box(0, 1, (8, 8, 2), int8))

Reward Space

Box(-1.0, 1.0, (4,), float32)



Multi-objective Breakthrough.

MO-Breakthrough is a multi-objective variant of the two-player, single-objective turn-based board game Breakthrough. In Breakthrough, players start with two rows of identical pieces in front of them, on an 8x8 board, and try to reach the opponent’s home row with any piece. The first player to move a piece on their opponent’s home row wins. Players move alternatingly, and each piece can move one square straight forward or diagonally forward. Opponent pieces can also be captured, but only by moving diagonally forward, not straight. MO-Breakthrough extends this game with up to three additional objectives: a second objective that incentivizes faster wins, a third one for capturing opponent pieces, and a fourth one for avoiding the capture of the agent’s own pieces. Additionally, the board width can be modified from 3 to 20 squares, and the board height from 5 to 20 squares.

Observation Space

The observation is a dictionary which contains an 'observation' element which is the usual RL observation described below, and an 'action_mask' which holds the legal moves, described in the Legal Actions Mask section below. The main observation space is 2 planes of a board_height * board_width grid (a board_height * board_width * 2 tensor). Each plane represents a specific agent’s pieces, and each location in the grid represents the placement of the corresponding agent’s piece. 1 indicates that the agent has a piece placed in the given location, and 0 indicates they do not have a piece in that location (meaning that either the cell is empty, or the other agent has a piece in that location).

Action Space

The action space is the set of integers from 0 to board_widthboard_height3 (exclusive). If a piece at coordinates (x,y) is moved, this is encoded as the integer x * 3 * board_height + y * 3 + z where z == 0 for left diagonal, 1 for straight, and 2 for right diagonal move.


Dimension 0: If an agent moves one of their pieces to the opponent’s home row, they will be rewarded 1 point. At the same time, the opponent agent will be awarded -1 point. There are no draws in Breakthrough. Dimension 1: If an agent wins, they get a reward of 1-(move_count/max_moves) to incentivize faster wins. The losing opponent gets the negated reward. In case of a draw, both agents get 0. Dimension 2: (optional) The number of opponent pieces (divided by the max number of pieces) an agent has captured. Dimension 3: (optional) The negative number of pieces (divided by the max number of pieces) an agent has lost to the opponent.

Starting State

The starting board is empty except for the first two rows that are filled with pieces of player 0, and the last two rows that are filled with pieces of player 1.


  • ‘board_width’: The width of the board (from 3 to 20)

  • ‘board_height’: The height of the board (from 5 to 20)

  • ‘num_objectives’: The number of objectives (from 1 to 4)

  • ‘render_mode’: The render mode.

