MO-ItemGathering¶


Agents names	`agent_i for i in [0, 1]`
Action Space	Discrete(5)
Observation Space	Tuple(Discrete(2, start=-2), Box(-2, 3, (8, 8), int64))
Reward Space	Box(0.0, 3.0, (3,), float32)
Import	`momaland.envs.moitem_gathering_v0`

A Parallel multi-objective environment of the Item Gathering problem.

Observation Space¶

The observation space is a tuple containing the agent id (a negative integer) and the 2D map observation, where 0 is an empty cell, negative integers represent agent IDs, and positive integers represent items

Action Space¶

The action space is a Discrete space, where:

0: stay
1: up
2: down
3: left
4: right

Reward Space¶

The reward space is a vector containing rewards for each type of items available in the environment

Starting State¶

The initial position of the agent is determined by the 1 entries of the initial map.

Episode Termination¶

The episode is terminated if all the items have been gathered.

Episode Truncation¶

The episode termination occurs if the maximum number of timesteps is reached.

Arguments¶

‘num_timesteps’: number of timesteps to run the environment for. Default: 10
‘initial_map’: map of the environment. Default: 8x8 grid, 2 agents, 3 objectives (Källström and Heintz, 2019)
‘randomise’: whether to randomise the map, at each episode. Default: False
‘reward_mode’: reward mode for the environment (‘individual’ or ‘team’). Default: ‘individual’
‘render_mode’: render mode for the environment. Default: None