AEC¶

Usage¶

Similar to PettingZoo, the MOMAland API models environments as simple Python env classes. Creating environment instances and interacting with them is very simple - here’s an example using the “momultiwalker_stability_v0” environment:

from momaland.envs.momultiwalker_stability import momultiwalker_stability_v0 as _env
import numpy as np

# .env() function will return an AEC environment, as per PZ standard
env = _env.env(render_mode="human")

env.reset(seed=42)
for agent in env.agent_iter():
    # vec_reward is a numpy array
    observation, vec_reward, termination, truncation, info = env.last()

    if termination or truncation:
        action = None
    else:
        action = env.action_space(agent).sample() # this is where you would insert your policy

    env.step(action)
env.close()

# optionally, you can scalarize the reward with weights
# Making the vector reward a scalar reward to shift to single-objective multi-agent (aka PettingZoo)
# We can assign different weights to the objectives of each agent.
weights = {
    "walker_0": np.array([0.7, 0.3]),
    "walker_1": np.array([0.5, 0.5]),
    "walker_2": np.array([0.2, 0.8]),
}
env = LinearizeReward(env, weights)

For details on multi-objective multi-agent RL definitions, see Multi-Objective Multi-Agent Decision Making: A Utility-based Analysis and Survey.

You can also check more examples in this colab notebook!

MOMAland environments extend the base MOAEC class, as opposed to PettingZoo’s base AEC class. MOAEC extends AEC and has a reward_space member.

For more detailed documentation on the AEC API, see the PettingZoo documentation.

MOAECEnv¶

class momaland.utils.env.MOAECEnv[source]¶: Overrides PZ types to enforce multi objective rewards.

Attributes¶

MOAECEnv.agents: list[AgentID]¶

A list of the names of all current agents, typically integers. These may be changed as an environment progresses (i.e. agents can be added or removed).

Type:: List[AgentID]

MOAECEnv.num_agents¶: The length of the agents list.

MOAECEnv.possible_agents: list[AgentID]¶

A list of all possible_agents the environment could generate. Equivalent to the list of agents in the observation and action spaces. This cannot be changed through play or resetting.

Type:: List[AgentID]

MOAECEnv.max_num_agents¶: The length of the possible_agents list.

MOAECEnv.agent_selection: AgentID¶

An attribute of the environment corresponding to the currently selected agent that an action can be taken for.

Type:: AgentID

MOAECEnv.terminations: dict[AgentID, bool]¶

MOAECEnv.truncations: dict[AgentID, bool]¶

MOAECEnv.rewards: Dict[AgentID, ndarray[Any, dtype[_ScalarType_co]]]¶

A dict of the rewards of every current agent at the time called, keyed by name. Contains the instantaneous reward generated after the last step (not accumulated). Note that agents can be added or removed from this attribute. last() does not directly access this attribute, rather the returned reward is stored in an internal variable. The rewards structure looks like:

{0:[first agent reward], 1:[second agent reward] ... n-1:[nth agent reward]}

Type:: Dict[AgentID, float]

MOAECEnv.infos: dict[AgentID, dict[str, Any]]¶

A dict of info for each current agent, keyed by name. Each agent’s info is also a dict. Note that agents can be added or removed from this attribute. last() accesses this attribute. The returned dict looks like:

infos = {0:[first agent info], 1:[second agent info] ... n-1:[nth agent info]}

Type:: Dict[AgentID, Dict[str, Any]]

MOAECEnv.observation_spaces: dict[AgentID, gymnasium.spaces.Space]¶

A dict of the observation spaces of every agent, keyed by name. This cannot be changed through play or resetting.

Type:: Dict[AgentID, gymnasium.spaces.Space]

MOAECEnv.action_spaces: dict[AgentID, gymnasium.spaces.Space]¶

A dict of the action spaces of every agent, keyed by name. This cannot be changed through play or resetting.

Type:: Dict[AgentID, gymnasium.spaces.Space]

MOAECEnv.reward_spaces: Dict[AgentID, Space]¶

A dict of the reward spaces of every agent, keyed by name. This cannot be changed through play or resetting.

Type:: Dict[AgentID, gymnasium.spaces.Space]

Methods¶

MOAECEnv.step(action: ActionType) → None¶

Accepts and executes the action of the current agent_selection in the environment.

Automatically switches control to the next agent.

MOAECEnv.reset(seed: int | None = None, options: dict | None = None) → None¶: Resets the environment to a starting state.

MOAECEnv.observe(agent: AgentID) → ObsType | None¶

Returns the observation an agent currently can make.

last() calls this function.

MOAECEnv.render() → None | np.ndarray | str | list¶

Renders the environment as specified by self.render_mode.

Render mode can be human to display a window. Other render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments).

MOAECEnv.close()¶

Closes any resources that should be released.

Closes the rendering window, subprocesses, network connections, or any other resources that should be released.

MOAECEnv.observation_space(agent: AgentID) → Space¶

Takes in agent and returns the observation space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the observation_spaces dict

MOAECEnv.action_space(agent: AgentID) → Space¶

Takes in agent and returns the action space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the action_spaces dict

MOAECEnv.reward_space(agent: AgentID) → Space[source]¶

Takes in agent and returns the reward space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the reward_spaces dict