For questions related to notation (in general).
Questions tagged [notation]
81 questions
11
votes
1 answer
What is the meaning of $V(D,G)$ in the GAN objective function?
Here is the GAN objective function.
$$\min _{G} \max _{D} V(D, G)=\mathbb{E}_{\boldsymbol{x} \sim p_{\text {data }}(\boldsymbol{x})}[\log D(\boldsymbol{x})]+\mathbb{E}_{\boldsymbol{z} \sim p_{\boldsymbol{z}}(\boldsymbol{z})}[\log…
i_rezic
- 245
- 1
- 6
8
votes
1 answer
How is the policy gradient calculated in REINFORCE?
Reading Sutton and Barto, I see the following in describing policy gradients:
How is the gradient calculated with respect to an action (taken at time t)? I've read implementations of the algorithm, but conceptually I'm not sure I understand how the…
Hanzy
- 519
- 3
- 11
7
votes
3 answers
How are the reward functions $R(s)$, $R(s, a)$ and $R(s, a, s')$ equivalent?
In this video, the lecturer states that $R(s)$, $R(s, a)$ and $R(s, a, s')$ are equivalent representations of the reward function. Intuitively, this is the case, according to the same lecturer, because $s$ can be made to represent the state and the…
nbro
- 42,615
- 12
- 119
- 217
5
votes
1 answer
What does the argmax of the expectation of the log likelihood mean?
What does the following equation mean? What does each part of the formula represent or mean?
$$\theta^* = \underset {\theta}{\arg \max} \Bbb E_{x \sim p_{data}} \log {p_{model}(x|\theta) }$$
arash moradi
- 181
- 1
- 6
5
votes
1 answer
What does the notation $\mathcal{N}(z; \mu, \sigma)$ stand for in statistics?
I know that the notation $\mathcal{N}(\mu, \sigma)$ stands for a normal distribution.
But I'm reading the book "An Introduction to Variational Autoencoders" and in it, there is this notation:
$$\mathcal{N}(z; 0, I)$$
What does it mean?
picture of…
Peyman
- 624
- 1
- 6
- 14
5
votes
2 answers
Why are the value functions sometimes written with capital letters and other times with lower-case letters?
Why are the state-value and action-value functions are sometimes written in small letters and other times in capitals? For instance, why in the Q-learning algorithm (page 131 of Barto and Sutton's book but not only), we the capitals are used $Q(S,…
d56
- 243
- 1
- 7
5
votes
2 answers
What is a probability distribution in machine learning?
If we were learning or working in the machine learning field, then we frequently come across the term "probability distribution". I know what probability, conditional probability, and probability distribution/density in math mean, but what is its…
Eka
- 1,106
- 8
- 24
5
votes
1 answer
What is the meaning of the square brackets in ant colony optimization?
I'm studying the paper "Minimizing Total Tardiness on a Single Machine Using Ant Colony Optimization" which has proposed to use Ant colony optimization to SMTWTP.
According to this paper:
Each artificial ant iteratively and independently decides…
Pablo
- 283
- 1
- 5
5
votes
1 answer
Understanding the equation of TD(0) in the paper "Learning to predict by the methods of temporal differences"
In the paper Learning to predict by the methods of temporal differences (p. 15), the weights in the temporal difference learning are updated as given by the equation
$$
\Delta w_t
= \alpha \left(P_{t+1} - P_t\right) \sum_{k=1}^{t}{\lambda^{t-k}…
Amanda
- 205
- 1
- 5
4
votes
1 answer
Notation used in paper on Continuous Time Reinforcement Learning
I am working on implementing the learning shown in this paper:
https://homes.cs.washington.edu/~todorov/courses/amath579/reading/Continuous.pdf
In the paper, the authors devise a continuous time learning extension of the actor critic method in…
Derick Diana
- 41
- 2
4
votes
1 answer
Is the Bandit Problem an MDP?
I've read Sutton and Barto's introductory RL book. They define a policy as a mapping from states to probabilities of selecting each possible action. If the agent is following policy $\pi$ at time $t$, then $\pi(a|s)$ as the probability of taking…
Snowball
- 225
- 1
- 7
4
votes
2 answers
Why do we use $X_{I_t,t}$ and $v_{I_t}$ to denote the reward received and the at time step $t$ and the distribution of the chosen arm $I_t$?
I'm doing some introductory research on classical (stochastic) MABs. However, I'm a little confused about the common notation (e.g. in the popular paper of Auer (2002) or Bubeck and Cesa-Bianchi (2012)).
As in the latter study, let us consider an…
MAB_N00B
- 41
- 3
4
votes
1 answer
What does the term $|\mathcal{A}(s)|$ mean in the $\epsilon$-greedy policy?
I've been looking online for a while for a source that explains these computations but I can't find anywhere what does the $|A(s)|$ mean. I guess $A$ is the action set but I'm not sure about that notation:
$$\frac{\varepsilon}{|\mathcal{A}(s)|}…
Metrician
- 195
- 5
4
votes
1 answer
What do the subscripts mean in $N_{t,n,\sigma,L}$?
A neural network can apparently be denoted as $N_{t,n,\sigma,L}$. What do these subscripts $t, n, \sigma$ and $L$ mean? Could you link me to a paper, article or webpage with an explanation for this?
J. Doe
- 143
- 5
4
votes
1 answer
What does $x,y \sim \hat{p}_{data}$ mean in the Deep Learning book by Goodfellow
In chapter 5 of Deep Learning book of Ian Goodfellow, some notations in the loss function as below make me really confused.
I tried to understand $x,y \sim p_{data}$ means a sample $(x, y)$ sampled from original dataset distribution (or $y$ is the…
David Ng
- 143
- 4