Wrappers and Utils

A wrapper is an environment transformation that takes in an environment as input, and outputs a new environment that is similar to the input environment, but with some transformation or validation applied.

For conversion between AEC and Parallel APIs, the native MOMAland wrappers must be used. On top of conversion wrappers, there are also a few utility wrappers.

Wrappers for the AEC and Parallel wrappers are split into their own modules and can be accessed like momaland.utils.parallel_wrappers.LinearizeReward.

Conversion

AEC to Parallel

class momaland.utils.conversions.mo_aec_to_parallel_wrapper(aec_env)[source]

Converts an AEC environment into a Parallel environment.

Overrides PZ behavior to handle vectorial rewards. Keeping inheritance avoids code duplication and checks for instance type.

Converts an MO AEC environment into a MO Parallel environment.

Parallel to AEC

class momaland.utils.conversions.mo_parallel_to_aec_wrapper(parallel_env)[source]

Converts a parallel environment into an AEC environment.

Overrides PZ behavior to handle vectorial rewards. Keeping inheritance avoids code duplication and checks for instance type.

Converts a MO parallel environment into an MO AEC environment.

AEC

class momaland.utils.aec_wrappers.LinearizeReward(env, weights: dict)[source]

Convert MO reward vector into scalar SO reward value.

weights represents the weights of each objective in the reward vector space for each agent.

Example

>>> weights = {"agent_0": np.array([0.1, 0.9]), "agent_1": np.array([0.2, 0.8]}
... env = LinearizeReward(env, weights)

Reward linearization class initializer.

Parameters:
  • env – base env to add the wrapper on.

  • weights – a dict where keys are agents and values are vectors representing the weights of their rewards.

class momaland.utils.aec_wrappers.NormalizeReward(env, agent, idx, gamma: float = 0.99, epsilon: float = 1e-08)[source]

This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.

The exponential moving average will have variance \((1 - \gamma)^2\).

Note

The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently.

Example

>>> for agent in env.possible_agents:
...     for idx in range(env.reward_space(agent).shape[0]):
...         env = AECWrappers.NormalizeReward(env, agent, idx)

This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.

Parameters:
  • env – The environment to apply the wrapper

  • agent – the agent whose reward will be normalized

  • idx – the index of the rewards that will be normalized.

  • epsilon – A stability parameter

  • gamma – The discount factor that is used in the exponential moving average.

Parallel

class momaland.utils.parallel_wrappers.LinearizeReward(env, weights: dict)[source]

Convert MO reward vector into scalar SO reward value.

weights represents the weights of each objective in the reward vector space for each agent.

Example

>>> weights = {"agent_0": np.array([0.1, 0.9]), "agent_1": np.array([0.2, 0.8])}
... env = LinearizeReward(env, weights)

Reward linearization class initializer.

Parameters:
  • env – base env to add the wrapper on.

  • weights – a dict where keys are agents and values are vectors representing the weights of their rewards.

class momaland.utils.parallel_wrappers.NormalizeReward(env, agent, idx, gamma: float = 0.99, epsilon: float = 1e-08)[source]

This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.

The exponential moving average will have variance \((1 - \gamma)^2\).

Note

The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently.

Example

>>> for agent in env.possible_agents:
...     for idx in range(env.reward_space(agent).shape[0]):
...         env = AECWrappers.NormalizeReward(env, agent, idx)

This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.

Parameters:
  • env – The environment to apply the wrapper

  • agent – the agent whose reward will be normalized

  • idx – the index of the rewards that will be normalized.

  • epsilon – A stability parameter

  • gamma – The discount factor that is used in the exponential moving average.

class momaland.utils.parallel_wrappers.RecordEpisodeStatistics(env)[source]

This wrapper will record episode statistics and print them at the end of each episode.

This wrapper will record episode statistics and print them at the end of each episode.

Parameters:

env (env) – The environment to apply the wrapper