Agents names

agent_i for i in [0, 4199]

Action Space


Observation Space


Reward Space

Box(-3.0, 0.0, (2,), float32)



A Parallel environment where drivers learn to travel from a source to a destination while avoiding congestion.

Multi-objective version of Braess’ Paradox where drivers have two objectives: travel time and monetary cost. The environment is a road network and the agents are the drivers that needs to travel from an origin to a destination point.

Observation Space

This environment is stateless, so the observation space is a constant 0. (Discrete with shape (1,)).

Action Space

The action space is a discrete space representing the possible routes that the agent can take. The number of routes is different for each agent, as it depends on the number of possible routes for the OD pair of the agent. Selecting an action corresponds to choosing a route.

Reward Space

The reward space is a 2D vector containing rewards for:

  • Minimizing travel time (latency).

  • Minimizing monetary cost.

Starting State

The environment is stateless, so there is no starting state.

Episode Termination

The environment is stateless, so there are no episodes. Each “episode” is therefore terminated after each timestep.

Episode Truncation

Episodes are not truncated as there are terminated after each timestep.


  • render_mode (str, optional): The mode to display the rendering of the environment. Can be human or None.

  • problem_name (str, optional): The name of the road network that will be used.

  • num_agents (int, optional): The number of drivers in the network.

  • toll_mode (str, optional): The tolling mode that is used, tolls are either placed randomly “random” or using marginal cost tolling “mct”.

  • random_toll_percentage (float, optional): In the case of random tolling the percentage of roads that will be taxed.

  • num_timesteps (int, optional): The number of timesteps (stateless, therefore always 1 timestep).


The code was adapted from codebase of “Toll-Based Learning for Minimising Congestion under Heterogeneous Preferences”.