Agents names

agent_i for i in [0, 0]

Action Space


Observation Space

Dict(‘action_mask’: Box(0, 1, (225,), int8), ‘observation’: Box(0, 1, (15, 15, 5), int8))

Reward Space

Box(0.0, 50625.0, (5,), float32)



Multi-objective Multi-agent SameGame.

MO-SameGame is a multi-objective, multi-agent variant of the single-player, single-objective turn-based puzzle game called SameGame. 1 to 5 agents can play (default is 1), on a rectangular board with width and height from 3 to 30 squares ( defaults are 15), which are initially filled with randomly colored tiles in 2 to 10 different colors (default is 5). Players move in sequential order by selecting any tile in a group of at least 2 vertically and/or horizontally connected tiles of the same color. This group then disappears from the board. Tiles that were above the removed group “fall down” to close any vertical gaps; when entire columns of tiles become empty, all columns to the right move left to close the horizontal gap. Single-player, single-objective SameGame rewards the player with n^2 points for removing any group of n tiles. MO-SameGame can extend this in two ways. Agents can either only get points for their own actions, or all rewards can be shared. Additionally, points for every color can be counted as separate objectives, or they can be accumulated in a single objective like in the default game variant.

Observation Space

The observation is a dictionary which contains an 'observation' element which is the usual RL observation described below, and an 'action_mask' which holds the legal moves, described in the Legal Actions Mask section below. The main observation space is num_colors planes of a board_height * board_width grid (a board_height * board_width * num_colors tensor). Each plane represents the tiles of a specific color, and each location in the grid represents a location on the board. 1 indicates that a given location has a tile of the given plane’s color, and 0 indicates there is no tile of that color at that location (meaning that either the board location is empty, or filled by a tile of another color).

Action Space

The action space is the set of integers from 0 to board_width * board_height (exclusive). If the group connected to the tile at coordinates (x,y) is removed, this is encoded as the integer y * board_width + x.


Rewards can be team rewards or individual rewards (default is individual). If color_rewards = False: Dimension 0: n^2 points for the removal of any group of size n. If color_rewards = True (default): Dimensions d=0 to d=num_objectives-1: n^2 points for the removal of any group of size n in color d+1.

Starting State

The starting board is filled with randomly colored tiles in 2 to 10 different colors (default is 5).


  • ‘board_width’: The width of the board (between 3 and 30)

  • ‘board_height’: The height of the board (between 3 and 30)

  • ‘num_colors’: The number of colors (between 2 and 10)

  • ‘num_agents’: The number of agents (between 1 and 5)

  • ‘team_rewards’: True = agents share all rewards, False = agents get individual rewards

  • ‘color_rewards’: True = agents get separate rewards for each color, False = agents get a single reward accumulating all colors

  • ‘render_mode’: The render mode

Version History