How to deal with losses on different scales in multi-task learning?

Question

Say I'm training a model for multiple tasks by trying to minimize sum of losses $L_1 + L_2$ via gradient descent.

If these losses are on a different scale, the one whose range is greater will dominate the optimization. I'm currently trying to fix this problem by introducing a hyperparameter $\lambda$, and trying to bring these losses to the same scale by tuning it, i.e., I try to minimize $L_1 +\lambda \cdot L_2$ where $\lambda > 0 $.

However, I'm not sure if this is a good approach. In short, what are some strategies to deal with losses having different scales when doing multi-task learning? I'm particularly interested in deep learning scenarios.

score 1 · Answer 1 · answered Nov 25 '21 at 19:22

I am currently working on a similar problem. I think your approach is good. As for setting the parameter lambda, since you are using deep neural networks, you can make it a learnable parameter, instead of a hyperparameter you set. This way, as the two losses fluctuate over your training iterations/epochs, the model will be able to adjust the lambda parameter accordingly.

How to deal with losses on different scales in multi-task learning?

1 Answers1