Parallel

Usage

Parallel environments can be interacted with as follows:

from momaland.envs.momultiwalker_stability import momultiwalker_stability_v0 as _env

# .parallel_env() function will return a Parallel environment, as per PZ standard
parallel_env = _env.parallel_env(render_mode="human")

# optionally, you can scalarize the reward with weights
parallel_env = momaland.LinearReward(parallel_env, weight=np.array([0.7, 0.3]))

observations, infos = parallel_env.reset(seed=42)
while parallel_env.agents:
    # this is where you would insert your policy
    actions = {agent: parallel_env.action_space(agent).sample() for agent in parallel_env.agents}

    # vec_reward is a dict[str, numpy array]
    observations, vec_rewards, terminations, truncations, infos = parallel_env.step(actions)
parallel_env.close()

In Parallel, the returned values of observations, vec_rewards, etc. are dict type, where the keys are agent names, and the values are the respective data. So for vec_rewards, the values are numpy arrays.

MOMAland environments extend the base MOParallel class, as opposed to PettingZoo’s base Parallel class. MOParallel extends Parallel and has a reward_space member.

For more detailed documentation on the Parallel API, see the PettingZoo documentation.

MOParallelEnv

class momaland.utils.env.MOParallelEnv[source]

Overrides PZ types to enforce multi objective rewards.

Attributes

MOParallelEnv.agents: list[AgentID]

A list of the names of all current agents, typically integers. These may be changed as an environment progresses (i.e. agents can be added or removed).

Type:

list[AgentID]

MOParallelEnv.num_agents

The length of the agents list.

Type:

int

MOParallelEnv.possible_agents: list[AgentID]

A list of all possible_agents the environment could generate. Equivalent to the list of agents in the observation and action spaces. This cannot be changed through play or resetting.

Type:

list[AgentID]

MOParallelEnv.max_num_agents

The length of the possible_agents list.

Type:

int

MOParallelEnv.observation_spaces: dict[AgentID, gymnasium.spaces.Space]

A dict of the observation spaces of every agent, keyed by name. This cannot be changed through play or resetting.

Type:

Dict[AgentID, gym.spaces.Space]

MOParallelEnv.action_spaces: dict[AgentID, gymnasium.spaces.Space]

A dict of the action spaces of every agent, keyed by name. This cannot be changed through play or resetting.

Type:

Dict[AgentID, gym.spaces.Space]

MOParallelEnv.reward_spaces: Dict[AgentID, Space]

A dict of the reward spaces of every agent, keyed by name. This cannot be changed through play or resetting.

Type:

Dict[AgentID, gym.spaces.Space]

Methods

MOParallelEnv.step(actions: dict[AgentID, ActionType]) tuple[dict[AgentID, ObsType], dict[AgentID, float], dict[AgentID, bool], dict[AgentID, bool], dict[AgentID, dict]]

Receives a dictionary of actions keyed by the agent name.

Returns the observation dictionary, reward dictionary, terminated dictionary, truncated dictionary and info dictionary, where each dictionary is keyed by the agent.

MOParallelEnv.reset(seed: int | None = None, options: dict | None = None) tuple[dict[AgentID, ObsType], dict[AgentID, dict]]

Resets the environment.

And returns a dictionary of observations (keyed by the agent name)

MOParallelEnv.render() None | np.ndarray | str | list

Displays a rendered frame from the environment, if supported.

Alternate render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments).

MOParallelEnv.close()

Closes the rendering window.

MOParallelEnv.state() ndarray

Returns the state.

State returns a global view of the environment appropriate for centralized training decentralized execution methods like QMIX

MOParallelEnv.observation_space(agent: AgentID) Space

Takes in agent and returns the observation space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the observation_spaces dict

MOParallelEnv.action_space(agent: AgentID) Space

Takes in agent and returns the action space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the action_spaces dict

MOParallelEnv.reward_space(agent: AgentID) Space[source]

Takes in agent and returns the reward space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the reward_spaces dict