magnetsoli.blogg.se - Prey treasure hunt reward

#PREY TREASURE HUNT REWARD SERIES#

At the beginning of an episode, each agent is assigned a plate that only they can activate by moving to its location and staying on its location.

#PREY TREASURE HUNT REWARD SERIES#

The grid is partitioned into a series of connected rooms with each room containing a plate and a closed doorway. PressurePlate is a multi-agent environment, based on the Level-Based Foraging environment, that requires agents to cooperate during the traversal of a gridworld. Visualisation of PressurePlate linear task with 4 agents The agent’s vision is limited to a \(5 \times 5\) box centred around the agent. LBF-8x8-2p-3f, sight=2: Similar to the first variation, but partially observable. Therefore, the agents need to spread out and collect as many items as possible in the short amount of time. The time-limit (25 timesteps) is often not enough for all items to be collected. LBF-10x10-2p-8f: A \(10 \times 10\) grid-world with two agents and ten items. This is a cooperative version and all three agents will need to collect the item simultaneously. LBF-8x8-3p-1f-coop: An \(8 \times 8\) grid-world with three agents and one item. This is a cooperative version and agents will always need too collect an item simultaneously (cooperate). LBF-8x8-2p-2f-coop: An \(8 \times 8\) grid-world with two agents and two items. Item levels are random and might require agents to cooperate, depending on the level. LBF-8x8-2p-3f: An \(8 \times 8\) grid-world with two agents and three items placed in random locations. Rewards are fairly sparse depending on the task, as agents might have to cooperate (in picking up the same food at the same timestep) to receive any rewards.įor more details, see our blog post here. In the partially observable version, denoted with ‘sight=2’, agents can only observe entities in a 5 × 5 grid surrounding them.

By default, every agent can observe the whole map, including the positions and levels of all the entities and can choose to act by moving in one of four directions or attempt to load an item. Below, you can see visualisations of a collection of possible tasks. Agents receive reward equal to the level of collected items.

However, such collection is only successful if the sum of involved agents’ levels is equal or greater than the item level. In order to collect items, agents have to choose a certain action next to the item. Each agent and item is assigned a level and items are randomly scattered in the environment. The task for each agent is to navigate the grid-world map and collect items. The Level-Based Foraging environment consists of mixed cooperative-competitive tasks focusing on the coordination of involved agents. The action space is "Both" if the environment supports discrete and continuous actions. For actions, we distinguish between discrete actions, multi-discrete actions where agents choose multiple (separate) discrete actions at each timestep, and continuous actions. We call an environment "mixed" if it supports more than one type of task.įor observations, we distinguish between discrete feature vectors, continuous feature vectors, and Continuous (Pixels) for image observations. Based on these task/type definitions, we say an environment is cooperative, competitive, or collaborative if the environment only supports tasks which are in one of these respective type categories. We loosely call a task "collaborative" if the agents' ultimate goals are aligned and agents cooperate, but their received rewards are not identical. one agent's gain is at the loss of another agent. The task is "competitive" if there is some form of competition between agents, i.e. We say a task is "cooperative" if all agents receive the same reward at each timestep. as we did in our SEAC and MARL benchmark papers. setting a specific world size, number of agents, etc), e.g. We use the term "task" to refer to a specific configuration of an environment (e.g. At the end of this post, we also mention some general frameworks which support a variety of environments and game modes. We list the environments and properties in the below table, with quick links to their respective sections in this blog post.

This blog post provides an overview of a range of multi-agent reinforcement learning (MARL) environments with their main properties and learning challenges.