Wrappers and Utils¶
A wrapper is an environment transformation that takes in an environment as input, and outputs a new environment that is similar to the input environment, but with some transformation or validation applied.
For conversion between AEC
and Parallel
APIs, the native MOMAland wrappers must be used. On top of conversion wrappers, there are also a few utility wrappers.
Wrappers for the AEC
and Parallel
wrappers are split into their own modules and can be accessed like momaland.utils.parallel_wrappers.LinearizeReward
.
Conversion¶
AEC to Parallel
¶
- class momaland.utils.conversions.mo_aec_to_parallel_wrapper(aec_env)[source]¶
Converts an AEC environment into a Parallel environment.
Overrides PZ behavior to handle vectorial rewards. Keeping inheritance avoids code duplication and checks for instance type.
Converts an MO AEC environment into a MO Parallel environment.
Parallel to AEC
¶
- class momaland.utils.conversions.mo_parallel_to_aec_wrapper(parallel_env)[source]¶
Converts a parallel environment into an AEC environment.
Overrides PZ behavior to handle vectorial rewards. Keeping inheritance avoids code duplication and checks for instance type.
Converts a MO parallel environment into an MO AEC environment.
AEC
¶
- class momaland.utils.aec_wrappers.LinearizeReward(env, weights: dict)[source]¶
Convert MO reward vector into scalar SO reward value.
weights represents the weights of each objective in the reward vector space for each agent.
Example
>>> weights = {"agent_0": np.array([0.1, 0.9]), "agent_1": np.array([0.2, 0.8]} ... env = LinearizeReward(env, weights)
Reward linearization class initializer.
- Parameters:
env – base env to add the wrapper on.
weights – a dict where keys are agents and values are vectors representing the weights of their rewards.
- class momaland.utils.aec_wrappers.NormalizeReward(env, agent, idx, gamma: float = 0.99, epsilon: float = 1e-08)[source]¶
This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.
The exponential moving average will have variance \((1 - \gamma)^2\).
Note
The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently.
Example
>>> for agent in env.possible_agents: ... for idx in range(env.reward_space(agent).shape[0]): ... env = AECWrappers.NormalizeReward(env, agent, idx)
This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.
- Parameters:
env – The environment to apply the wrapper
agent – the agent whose reward will be normalized
idx – the index of the rewards that will be normalized.
epsilon – A stability parameter
gamma – The discount factor that is used in the exponential moving average.
Parallel
¶
- class momaland.utils.parallel_wrappers.LinearizeReward(env, weights: dict)[source]¶
Convert MO reward vector into scalar SO reward value.
weights represents the weights of each objective in the reward vector space for each agent.
Example
>>> weights = {"agent_0": np.array([0.1, 0.9]), "agent_1": np.array([0.2, 0.8])} ... env = LinearizeReward(env, weights)
Reward linearization class initializer.
- Parameters:
env – base env to add the wrapper on.
weights – a dict where keys are agents and values are vectors representing the weights of their rewards.
- class momaland.utils.parallel_wrappers.NormalizeReward(env, agent, idx, gamma: float = 0.99, epsilon: float = 1e-08)[source]¶
This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.
The exponential moving average will have variance \((1 - \gamma)^2\).
Note
The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently.
Example
>>> for agent in env.possible_agents: ... for idx in range(env.reward_space(agent).shape[0]): ... env = AECWrappers.NormalizeReward(env, agent, idx)
This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.
- Parameters:
env – The environment to apply the wrapper
agent – the agent whose reward will be normalized
idx – the index of the rewards that will be normalized.
epsilon – A stability parameter
gamma – The discount factor that is used in the exponential moving average.
- class momaland.utils.parallel_wrappers.RecordEpisodeStatistics(env)[source]¶
This wrapper will record episode statistics and print them at the end of each episode.
This wrapper will record episode statistics and print them at the end of each episode.
- Parameters:
env (env) – The environment to apply the wrapper