2

Let's say I want to teach a neural to classify images, and, for some reason, I insist on using reinforcement learning rather than supervised learning.

I have a dataset of images and their matching classes. Then, for each image, I could define a reward function which is $1$ for classifying it right and $-1$ for classifying it wrong (or perhaps even define a more complicated reward function where some mistakes are less costly than others). For each image $x^i$, I can loop through each class $c$ and use a vanilla REINFORCE step: $\theta = \theta + \alpha \nabla_{\theta}log \pi_{\theta}(c|x^i)r$.

Would that be different than using standard supervised learning methods (for example, the cross-entropy loss)? Should I expect different results?

This method actually seems better since I could define a custom reward for each misclassification, but I've never seen anyone use something like that

nbro
  • 42,615
  • 12
  • 119
  • 217
Gilad Deutsch
  • 669
  • 6
  • 14

0 Answers0