0

I'm working on a reinforcement learning problem where the environment returns a reward pair $(r_{t+1}^{(a)}, r_{t+1}^{(b)})$. The goal is to maximize the following nonlinear objective function. $$ E[\lim_{T \to \infty } \frac{ \sum_{r=t}^{t+T-1} r_{r+1}^{(a)}}{\sum_{r=t}^{t+T-1} r_{r+1}^{(a)} + \sum_{r=t}^{t+T-1} r_{r+1}^{(b)}}] $$ which is the ratio of the agent's cumulative rewards as a fraction of the total cumulative rewards. My intention is to employ Deep Q-Networks as the primary model for reinforcement learning to solve this environment. However, due to the non-linear nature of the objective function, I encountered challenges in applying the original DQN algorithm. As a result, I explored an alternative approach by framing the problem as a multi-objective reinforcement learning problem. I seek validation regarding the appropriateness of this decision.

Alex
  • 1

0 Answers0