I am confused regarding the difference between policy and plan in reinforcement learning. According to my understanding, when we calculate the value of state using Bellman equation in deterministic environment :
 The plan in this case will be the strict state action pair, that is gathered using finding the max. value for each action in every state and it will be something like the below image in a maze game as an example :
The plan in this case will be the strict state action pair, that is gathered using finding the max. value for each action in every state and it will be something like the below image in a maze game as an example :

However, in a stochastic environment the Bellman equation will be:
And then we will have something like this:
In this case, to develop the policy we will need to know the state-action pair for every state like the above image plus mentioning the probability distribution for all the actions at every state and we need to keep in mind that the actions in the above image will not always happen due to the stochastic nature of the environment.
Is my understanding correct regarding the difference between policy and plan?


 
    