Can we use reinforcement learning for sequence-to-sequence tasks? If yes, whether or not this is a good choice, how could this be done?
Asked
Active
Viewed 189 times
1 Answers
3
One renowned example for the specified case is SeqGAN
Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update. The RL reward signal comes from the GAN discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search. Extensive experiments on synthetic data and real-world tasks demonstrate significant improvements over strong baselines.
OmG
- 1,866
- 12
- 19