7

Say I'm training a model for multiple tasks by trying to minimize sum of losses $L_1 + L_2$ via gradient descent.

If these losses are on a different scale, the one whose range is greater will dominate the optimization. I'm currently trying to fix this problem by introducing a hyperparameter $\lambda$, and trying to bring these losses to the same scale by tuning it, i.e., I try to minimize $L_1 +\lambda \cdot L_2$ where $\lambda > 0 $.

However, I'm not sure if this is a good approach. In short, what are some strategies to deal with losses having different scales when doing multi-task learning? I'm particularly interested in deep learning scenarios.

nbro
  • 42,615
  • 12
  • 119
  • 217
SpiderRico
  • 1,040
  • 10
  • 18

1 Answers1

1

I am currently working on a similar problem. I think your approach is good. As for setting the parameter lambda, since you are using deep neural networks, you can make it a learnable parameter, instead of a hyperparameter you set. This way, as the two losses fluctuate over your training iterations/epochs, the model will be able to adjust the lambda parameter accordingly.

rachkov91
  • 11
  • 1