For questions about the optimality of AI algorithms and solutions.
Questions tagged [optimality]
15 questions
                    
                    5
                    
            votes
                
                2 answers
            
        Given two optimal policies, is an affine combination of them also optimal?
If there are two different optimal policies $\pi_1, \pi_2$ in a reinforcement learning task, will the linear combination (or affine combination) of the two policies $\alpha \pi_1 + \beta \pi_2, \alpha + \beta = 1$ also be an optimal policy?
Here I…
         
    
    
        yang liu
        
- 53
- 4
                    3
                    
            votes
                
                2 answers
            
        How to make minimax optimal?
By optimal I mean that:
If max has a winning strategy then minimax will return the strategy for max with the fewest number of moves to win.
If min has a winning strategy then minimax will return the strategy for max with the most number of moves to…
         
    
    
        Aadit M Shah
        
- 64
- 1
- 10
                    3
                    
            votes
                
                1 answer
            
        Can we achieve optimality with minimax using an evaluation function?
The following quote (from AIMA) refers to the situation in which the minimax algorithm computes its values directly from the terminal states.
(The) definition of optimal play for MAX assumes that MIN also plays optimally—it maximizes the worst-case…
         
    
    
        Kestina
        
- 31
- 1
                    3
                    
            votes
                
                1 answer
            
        Is there an error in A* optimality proof Russel-Norvig 4th edition?
In "AI: A Modern Approach", 4th edition, by Russell and Norvig, they give a purported proof that A* is cost-optimal for any admissible heuristic. The given proof seems most certainly wrong. They want to show that all nodes on the optimal path are…
         
    
    
        vdbuss
        
- 81
- 3
                    3
                    
            votes
                
                1 answer
            
        How is $v_*(s) = \max_{\pi} v_\pi(s)$ also applicable in the case of stochastic policies?
I am reading Sutton & Bartos's Book "Introduction to reinforcement learning". In this book, the defined the optimal value function as:
$$v_*(s) = \max_{\pi} v_\pi(s),$$ for all $s \in \mathcal{S}$.
Do we take the max over all deterministic policies,…
         
    
    
        Tamar
        
- 33
- 3
                    3
                    
            votes
                
                2 answers
            
        If uniform cost search is used for bidirectional search, is it guaranteed the solution is optimal?
If uniform cost search is used for both the forward and backward search in bidirectional search, is it guaranteed the solution is optimal?
        user42125
                    2
                    
            votes
                
                2 answers
            
        Why is the optimal policy for an infinite horizon MDP deterministic?
Could someone please help me gain some intuition as to why the optimal policy for a Markov Decision Process in the infinite horizon case (agent acts forever) is deterministic?
         
    
    
        stoic-santiago
        
- 1,201
- 9
- 22
                    1
                    
            vote
                
                2 answers
            
        Why is the better policy defined with respect to all the states values being greater?
In Sutton & Barto (Section 3.6 - Optimal Policies and Optimal Value Functions), they say that :
Value functions define a partial ordering over policies. A policy $\pi$ is defined to be better than or equal to a policy $\pi^{'}$ if its expected…
         
    
    
        pew31
        
- 13
- 3
                    1
                    
            vote
                
                1 answer
            
        How are these two equations for the optimal state-value function equivalent?
By substituting the optimal policy $\pi_{\star}$ into the Bellman equation, we get the Bellman equation for $v_{\pi_{\star}}(s)=v_{\star}(s)$:
$$ v_{\star}(s) = \sum\limits_a \pi_{\star}(a|s) \sum\limits_{s'} \sum_r p(s', r | s, a)[r + \gamma…
         
    
    
        DSPinfinity
        
- 1,223
- 4
- 10
                    1
                    
            vote
                
                1 answer
            
        Are optimal policies always deterministic, or can there also be optimal policies that are stochastic?
Let $M$ be an MDP with two states, $A$ and $B$, where $A$ is the starting state, and you always transit to the final state $B$ using two possible actions. $A_1$ gives you rewards that are normally distributed $\mathcal{N}(0, 1)$ and $A_2$ gives you…
         
    
    
        Abc1729
        
- 45
- 4
                    1
                    
            vote
                
                0 answers
            
        Minimax algorithm with only partial visibility
I'm trying to implement the minimax algorithm with alpha beta pruning on a game that works like this:
Player 1 plays (x1, y1).
Player 2 can only see the x-value (x1) that Player 1 played (and not the y-value y1). Player 2 plays (x2, y2).
An…
         
    
    
        SpiderManCity
        
- 11
- 1
                    0
                    
            votes
                
                1 answer
            
        Why is R(s) more restrictive than R(s, a) in an MDP?
I am quite new to RL. I would like to know why a state-dependent reward function R(s) is more restrictive than a state-action-dependent reward function R(s, a)? And why is it that a policy can be optimal for the latter R(s, a) but not for the…
         
    
    
        TicTacToemat
        
- 11
                    0
                    
            votes
                
                1 answer
            
        Can A* be non-optimal if it uses an admissible but inconsistent heuristic with graph search?
The book "Artificial Intelligence: A Modern Approach" (4th edition, global version) says
"With an admissible heuristic, A* is cost-optimal...".
An admissible heuristic is one that never overestimates the distance to the goal, while a consistent…
         
    
    
        numq
        
- 1
- 1
                    0
                    
            votes
                
                1 answer
            
        What is the equation for $\pi_*$ in terms of $q_*(s,a)$?
I am trying to solve the following exercise from Sutton and Barto:
Sutton and Barto Exercise 3.27 Give an equation for $\pi_*$ in terms of $q_*(s,a)$
However, I am struggling to do so. I know that $\pi_*$ is the policy which will pick the action…
         
    
    
        user
        
- 195
- 9
                    0
                    
            votes
                
                1 answer
            
        Are hill climbing variations always optimal and complete?
Are hill climbing variations (like steepest ascent hill climbing, stochastic hill climbing, random restart hill climbing, local beam search) always optimal and complete?
        user45792