Agents names

agent_i for i in [0, 1]

Action Space


Observation Space

Tuple(Discrete(2, start=-2), Box(-2, 3, (8, 8), int64))

Reward Space

Box(0.0, 3.0, (3,), float32)



A Parallel multi-objective environment of the Item Gathering problem.

Observation Space

The observation space is a tuple containing the agent id (a negative integer) and the 2D map observation, where 0 is an empty cell, negative integers represent agent IDs, and positive integers represent items

Action Space

The action space is a Discrete space, where:

  • 0: stay

  • 1: up

  • 2: down

  • 3: left

  • 4: right

Reward Space

The reward space is a vector containing rewards for each type of items available in the environment

Starting State

The initial position of the agent is determined by the 1 entries of the initial map.

Episode Termination

The episode is terminated if all the items have been gathered.

Episode Truncation

The episode termination occurs if the maximum number of timesteps is reached.


  • ‘num_timesteps’: number of timesteps to run the environment for. Default: 10

  • ‘initial_map’: map of the environment. Default: 8x8 grid, 2 agents, 3 objectives (Källström and Heintz, 2019)

  • ‘randomise’: whether to randomise the map, at each episode. Default: False

  • ‘reward_mode’: reward mode for the environment (‘individual’ or ‘team’). Default: ‘individual’

  • ‘render_mode’: render mode for the environment. Default: None