Parallel¶
Usage¶
Parallel environments can be interacted with as follows:
from momaland.envs.momultiwalker_stability import momultiwalker_stability_v0 as _env
# .parallel_env() function will return a Parallel environment, as per PZ standard
parallel_env = _env.parallel_env(render_mode="human")
# optionally, you can scalarize the reward with weights
parallel_env = momaland.LinearReward(parallel_env, weight=np.array([0.7, 0.3]))
observations, infos = parallel_env.reset(seed=42)
while parallel_env.agents:
# this is where you would insert your policy
actions = {agent: parallel_env.action_space(agent).sample() for agent in parallel_env.agents}
# vec_reward is a dict[str, numpy array]
observations, vec_rewards, terminations, truncations, infos = parallel_env.step(actions)
parallel_env.close()
In Parallel
, the returned values of observations
, vec_rewards
, etc. are dict
type, where the keys are agent names, and the values are the respective data. So for vec_rewards
, the values are numpy
arrays.
MOMAland environments extend the base MOParallel
class, as opposed to PettingZoo’s base Parallel
class. MOParallel
extends Parallel
and has a reward_space
member.
For more detailed documentation on the Parallel API, see the PettingZoo documentation.
MOParallelEnv¶
Attributes¶
- MOParallelEnv.agents: list[AgentID]¶
A list of the names of all current agents, typically integers. These may be changed as an environment progresses (i.e. agents can be added or removed).
- Type:
list[AgentID]
- MOParallelEnv.num_agents¶
The length of the agents list.
- Type:
int
- MOParallelEnv.possible_agents: list[AgentID]¶
A list of all possible_agents the environment could generate. Equivalent to the list of agents in the observation and action spaces. This cannot be changed through play or resetting.
- Type:
list[AgentID]
- MOParallelEnv.max_num_agents¶
The length of the possible_agents list.
- Type:
int
- MOParallelEnv.observation_spaces: dict[AgentID, gymnasium.spaces.Space]¶
A dict of the observation spaces of every agent, keyed by name. This cannot be changed through play or resetting.
- Type:
Dict[AgentID, gym.spaces.Space]
- MOParallelEnv.action_spaces: dict[AgentID, gymnasium.spaces.Space]¶
A dict of the action spaces of every agent, keyed by name. This cannot be changed through play or resetting.
- Type:
Dict[AgentID, gym.spaces.Space]
- MOParallelEnv.reward_spaces: Dict[AgentID, Space]¶
A dict of the reward spaces of every agent, keyed by name. This cannot be changed through play or resetting.
- Type:
Dict[AgentID, gym.spaces.Space]
Methods¶
- MOParallelEnv.step(actions: dict[AgentID, ActionType]) tuple[dict[AgentID, ObsType], dict[AgentID, float], dict[AgentID, bool], dict[AgentID, bool], dict[AgentID, dict]] ¶
Receives a dictionary of actions keyed by the agent name.
Returns the observation dictionary, reward dictionary, terminated dictionary, truncated dictionary and info dictionary, where each dictionary is keyed by the agent.
- MOParallelEnv.reset(seed: int | None = None, options: dict | None = None) tuple[dict[AgentID, ObsType], dict[AgentID, dict]] ¶
Resets the environment.
And returns a dictionary of observations (keyed by the agent name)
- MOParallelEnv.render() None | np.ndarray | str | list ¶
Displays a rendered frame from the environment, if supported.
Alternate render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments).
- MOParallelEnv.close()¶
Closes the rendering window.
- MOParallelEnv.state() ndarray ¶
Returns the state.
State returns a global view of the environment appropriate for centralized training decentralized execution methods like QMIX
- MOParallelEnv.observation_space(agent: AgentID) Space ¶
Takes in agent and returns the observation space for that agent.
MUST return the same value for the same agent name
Default implementation is to return the observation_spaces dict
- MOParallelEnv.action_space(agent: AgentID) Space ¶
Takes in agent and returns the action space for that agent.
MUST return the same value for the same agent name
Default implementation is to return the action_spaces dict