Agents names

player_i for i in [0, 1]

Action Space


Observation Space

Dict(‘action_mask’: Box(0, 1, (7,), int8), ‘observation’: Box(0, 1, (6, 7, 2), int8))

Reward Space

Box(-1.0, 1.0, (9,), float32)



Multi-objective Connect Four.

MO-Connect4 is a multi-objective variant of the two-player, single-objective turn-based board game Connect 4. In Connect 4, players can win by connecting four of their tokens vertically, horizontally or diagonally. The players drop their respective token in a column of a standing board (of width 7 and height 6 by default), where each token will fall until it reaches the bottom of the column or lands on top of an existing token. Players cannot place a token in a full column, and the game ends when either a player has made a sequence of 4 tokens, or when all columns have been filled (draw). MO-Connect4 extends this game with a second objective that incentivizes faster wins, and optionally the additional (conflicting) objectives of having more tokens than the opponent in every column. Additionally, width and height of the board can be set to values from 4 to 20.

Observation Space

The observation is a dictionary which contains an 'observation' element which is the usual RL observation described below, and an 'action_mask' which holds the legal moves, described in the Legal Actions Mask section below. The main observation space is 2 planes of a board_height * board_width grid (a board_height * board_width * 2 tensor). Each plane represents a specific agent’s tokens, and each location in the grid represents the placement of the corresponding agent’s token. 1 indicates that the agent has a token placed in the given location, and 0 indicates they do not have a token in that location (meaning that either the cell is empty, or the other agent has a token in that location).

Action Space

The action space is the set of integers from 0 to board_width (exclusive), where the number represents which column a token should be dropped in.


Dimension 0: If an agent successfully connects four of their tokens, they will be rewarded 1 point. At the same time, the opponent agent will be awarded -1 point. If the game ends in a draw, both players are rewarded 0. Dimension 1: If an agent wins, they get a reward of 1-(move_count/board_size) to incentivize faster wins. The losing opponent gets the negated reward. In case of a draw, both agents get 0. Dimension 2 to board_width+1 (default 8): (optional) If at game end, an agent has more tokens than their opponent in column X, they will be rewarded 1 point in reward dimension 2+X. The opponent agent will be rewarded -1 point. If the column has an equal number of tokens from both players, both players are rewarded 0.

Starting State

The game starts with an empty board.


  • ‘render_mode’: The mode to render with. Can be ‘human’ or ‘rgb_array’.

  • ‘screen_scaling’: The factor by which to scale the screen.

  • ‘board_width’: The width of the board (from 4 to 20)

  • ‘board_height’: The height of the board (from 4 to 20)

  • ‘column_objectives’: Whether to use column objectives or not (without them, there are 2 objectives. With them, there are 2+board_width objectives)

Version History