0

I have the following RL model that I want to train (see the diagram below). My idea is to have two agents: agent A and agent B. Agent A observes the input I1 and decides an action action1, then immediately, agent B observes the input (action1, I2) where I2=f(action1, I1) is obtained with a known function f(.). Next, agent B decides the action action2. The two actions action1 and action2 are used to compute the reward, which is common for the two agents. The next state of agent A will be obtained I1' and the process continues.

 --------------------------- next state --------------------------------------
|                                                                             |
I1 -> |A| -> action1 ------------------------>|B| -> (action1, action2) -> reward
|                |                             |                
|                |                             |
 -----------------> f(I1, action1) -> I2 ------

I want to know if is it possible to implement a DRL algorithm for this model, or more importantly, if there is a similar two-agents model that is proposed that I can use?

zdm
  • 309
  • 2
  • 9

0 Answers0