MADDPG¶

Multi-Agent Deep Deterministic Policy Gradient (MADDPG) is a multi-agent reinforcement learning algorithm for continuous action space:

Implementation is based on DDPG ✔️
Initialize n DDPG agents in MADDPG ✔️

Code Snippet¶def update_net(self, buffer, batch_size, repeat_times, soft_update_tau):
    buffer.update_now_len()
    self.batch_size = batch_size
    self.update_tau = soft_update_tau
    rewards, dones, actions, observations, next_obs = buffer.sample_batch(self.batch_size)
    for index in range(self.n_agents):
        self.update_agent(rewards, dones, actions, observations, next_obs, index)

    for agent in self.agents:
        self.soft_update(agent.cri_target, agent.cri, self.update_tau)
        self.soft_update(agent.act_target, agent.act, self.update_tau)

    return

Parameters¶

class elegantrl.agents.AgentMADDPG.AgentMADDPG[source]¶

Bases: AgentBase

Multi-Agent DDPG algorithm. “Multi-Agent Actor-Critic for Mixed Cooperative-Competitive”. R Lowe. et al.. 2017.

Parameters

net_dim[int] – the dimension of networks (the width of neural networks)
state_dim[int] – the dimension of state (the number of state vector)
action_dim[int] – the dimension of action (the number of discrete action)
learning_rate[float] – learning rate of optimizer
gamma[float] – learning rate of optimizer
n_agents[int] – number of agents
if_per_or_gae[bool] – PER (off-policy) or GAE (on-policy) for sparse reward
env_num[int] – the env number of VectorEnv. env_num == 1 means don’t use VectorEnv
agent_id[int] – if the visible_gpu is ‘1,9,3,4’, agent_id=1 means (1,9,4,3)[agent_id] == 9

explore_one_env(env, target_step) → list[source]¶: Exploring the environment for target_step. param env: the Environment instance to be explored. param target_step: target steps to explore.

save_or_load_agent(cwd, if_save)[source]¶

save or load training files for Agent

Parameters

cwd – Current Working Directory. ElegantRL save training files in CWD.
if_save – True: save files. False: load files.

select_actions(states)[source]¶

Select continuous actions for exploration

Parameters: state – states.shape==(n_agents,batch_size, state_dim, )
Returns: actions.shape==(n_agents,batch_size, action_dim, ), -1 < action < +1

update_agent(rewards, dones, actions, observations, next_obs, index)[source]¶

Update the single agent neural networks, called by update_net.

Parameters

rewards – reward list of the sampled buffer
dones – done list of the sampled buffer
actions – action list of the sampled buffer
observations – observation list of the sampled buffer
next_obs – next_observation list of the sample buffer
index – ID of the agent

update_net(buffer, batch_size, repeat_times, soft_update_tau)[source]¶

Update the neural networks by sampling batch data from ReplayBuffer.

Parameters

buffer – the ReplayBuffer instance that stores the trajectories.
batch_size – the size of batch data for Stochastic Gradient Descent (SGD).
repeat_times – the re-using times of each trajectory.
soft_update_tau – the soft update parameter.

Networks¶

class elegantrl.agents.net.Actor(*args: Any, **kwargs: Any)[source]¶

class elegantrl.agents.net.Critic(*args: Any, **kwargs: Any)[source]¶