Can Reinforcement Learning be used to generate sequences?

Question

Can we use reinforcement learning for sequence-to-sequence tasks? If yes, whether or not this is a good choice, how could this be done?

score 3 · Answer 1 · answered May 26 '21 at 18:48

One renowned example for the specified case is SeqGAN

Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update. The RL reward signal comes from the GAN discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search. Extensive experiments on synthetic data and real-world tasks demonstrate significant improvements over strong baselines.

Can Reinforcement Learning be used to generate sequences?

1 Answers1