I am a newbie to reinforcement learning (RL) and am currently developing my own RL agent using the Stable-Baselines3 implementation of PPO. My work is based on a reward function that incorporates multiple sub-rewards. While tackling the problem, I’ve encountered several challenges for which I couldn’t find clear answers.
I’ve come across some discussions where engineers mentioned that using negative rewards might lead to different agent behavior compared to using positive rewards. Is this true? If so, how might the agent’s behavior differ?