Which RL algorithms can be used in an environment where actions have to be performed only in specific situations?

Question

I am wondering which RL algorithms can be used in an environment where actions have to be performed only in specific situations. For example, on a conveyor belt on which a box that fulfills certain conditions must be sorted out. A signal must then be sent at the right time to sort it out correctly. If the signal comes too early or too late, it would be wrong.

Is it only the reward function that enables the agent to learn that or do I need to consider some special neural networks such as LSTM or do I need to consider certain RL algorithms?

Does somebody have some experience in this kind of scenario and can give suggestions?

foreverska · Answer 1 · 2024-01-07T20:33:56.027

The act of pushing something off a conveyor is probably best handled by something real-time (microcontroller, PLC, etc). When a box passes a gate and the pusher is armed, fire the piston which pushes. We understand how to fire a piston, no ML needed. The logic for when to arm the pusher could be learned by an RL algorithm.

The complexity of the model will primarily be driven by how Markovian the problem is. A problem is said to be Markovian if the decision can be made by looking at the information in the present situation only (no history needed). For Markovian problems a simple feed-forward network may be enough. The output being a signal either to arm or disarm the pusher.

If the decision depends on history, the decision to push the present box depends on the boxes which have passed before it, then one may consider an LSTM so the network may efficiently integrated the information necessary to solve n-step markov problems or non-markovian problems.

On the off chance one REALLY wanted to do the whole process by ML, construct it like a videogame. Pushing correct packages off gives a reward and incorrect a negative reward. A DQN family algorithm would likely get the gist after several million frames. Predicting future position based on current movement is non-markovian, the original DQN paper used three frames into a convolutional network. LSTM would also work there.

Which RL algorithms can be used in an environment where actions have to be performed only in specific situations?

1 Answers1