Questions tagged [probability]

For question involving probability as related to AI methods. (This tag is for general usage. Feel free to utilize in conjunction with the "math" and more specific probability tags.)

https://en.wikipedia.org/wiki/Probability

58 questions
22
votes
3 answers

Are softmax outputs of classifiers true probabilities?

BACKGROUND: The softmax function is the most common choice for an activation function for the last dense layer of a multiclass neural network classifier. The outputs of the softmax function have mathematical properties of probabilities and are--in…
10
votes
2 answers

Is Nassim Taleb right about AI not being able to accurately predict certain types of distributions?

So Taleb has two heuristics to generally describe data distributions. One is Mediocristan, which basically means things that are on a Gaussian distribution such as height and/or weight of people. The other is called Extremistan, which describes a…
7
votes
2 answers

What is a Markov chain and how can it be used in creating artificial intelligence?

I believe a Markov chain is a sequence of events where each subsequent event depends probabilistically on the current event. What are examples of the application of a Markov chain and can it be used to create artificial intelligence? Would a…
WilliamKF
  • 2,533
  • 1
  • 26
  • 31
6
votes
2 answers

Are probabilistic models dead ends in AI?

I am a strong believer of Marvin Minsky's idea about Artificial General Intelligence (AGI) and one of his thoughts was that probabilistic models are dead ends in the field of AGI. I would really like to know the thoughts and ideas of people who…
Parth Raghav
  • 345
  • 1
  • 8
5
votes
1 answer

What does the argmax of the expectation of the log likelihood mean?

What does the following equation mean? What does each part of the formula represent or mean? $$\theta^* = \underset {\theta}{\arg \max} \Bbb E_{x \sim p_{data}} \log {p_{model}(x|\theta) }$$
3
votes
1 answer

Why do I get small probabilities when implementing a multinomial naive Bayes text classification model?

When applying multinomial Naive Bayes text classification, I get very small probabilities (around $10e^{-48}$), so there's no way for me to know which classes are valid predictions and which ones are not. I'd the probabilities to be in the interval…
3
votes
1 answer

How can I improve this word-prediction AI?

I'm relatively new to AI, and I've tried to create one that "speaks". Here's how it works: 1. Get training data e.g 'Jim ran to the shop to buy candy' 2. The data gets split into overlapping 'chains' of three e.g ['Jim ran to', 'ran to the', 'to the…
user117279
  • 55
  • 2
3
votes
1 answer

How does $\mathbb{E}$ suddenly change to $\mathbb{E}_{\pi'}$ in this equation?

In Sutton-Barto's book on page 63 (81 of the pdf): $$\mathbb{E}[R_{t+1} + \gamma v_\pi(S_{t+1}) \mid S_t=s,A_t=\pi'(s)] = \mathbb{E}_{\pi'}[R_{t+1} + \gamma v_\pi(S_{t+1}) \mid S_{t} = s]$$ How does $\mathbb{E}$ suddenly change to…
3
votes
2 answers

How can supervised learning be viewed as a conditional probability of the labels given the inputs?

In the literature and textbooks, one often sees supervised learning expressed as a conditional probability, e.g., $$\rho(\vec{y}|\vec{x},\vec{\theta})$$ where $\vec{\theta}$ denotes a learned set of network parameters, $\vec{x}$ is an arbitrary…
3
votes
1 answer

Viterbi versus filtering

In Chapter 15 of Russel and Norvig's Artificial Intelligence -- A Modern Approach (Third Edition), they describe three basic tasks in temporal inference: Filtering, Likelihood, and Finding the Most Likely Sequence. My question is on the…
vdbuss
  • 81
  • 3
2
votes
1 answer

SEIF motion update algorithm doubt

I want to implement Sparse Extended information slam. There is four step to implement it. The algorithm is available in Probabilistic Robotics Book at page 310, Table 12.3. In this algorithm line no:13 is not very clear to me. I have 15 landmarks.…
2
votes
1 answer

What exactly is a Parzen?

I came across the term "Parzen" while reading the research paper titled Generative Adversarial Nets. It has been used in the research paper in two contexts. #1: In phrase "Parzen window" We estimate probability of the test set data under $p_g$ by…
hanugm
  • 4,102
  • 3
  • 29
  • 63
2
votes
1 answer

How would the probability of a document $P(d)$ be computed in the Naive Bayes classifier?

In naive Bayes classification, we estimate the class of a document as follows $$\hat{c} = \arg \max_{c \in C} P(c \mid d) = \arg \max_{c \in C} \dfrac{ P(d \mid c)P(c) }{P(d)} $$ It has been said in page 4 of this textbook that we can ignore the…
2
votes
1 answer

How do I calculate the probabilities of the BERT model prediction logits?

I might be getting this completely wrong, but please let me first try to explain what I need, and then what's wrong. I have a classification task. The training data has 50 different labels. The customer wants to differentiate the low probability…
2
votes
0 answers

PPO2: Intuition behind Gumbel Softmax and Exploration?

I'm trying to understand the logic behind the magic of using the gumbel distribution for action sampling inside the PPO2 algorithm. This code snippet implements the action sampling, taken from here: def sample(self): u =…
1
2 3 4