1

I'm looking at the Bernoulli naïve Bayes classifier on Wikipedia and I understand Bayes theorem along with Gaussian naïve Bayes. However, when looking at how $P(x|c_k)$ is calculated, I don't understand it. The Wikipedia page says its calculated as follows

$$P(x|c_k) = \prod^{n}_{i=1} p^{x_i}_{ki} (1-p_{ki})^{(1-x_i)}. $$

They mention that $p_{ki}$ is the probability of class $c_k$ generating the term $x_i$, does that mean $P(x|c_k)$? Because if so then that doesn't make sense since to calculate that we need to have calculated it already. So what is $p_{ki}$?

And in the first part, after the product symbol, are they raising this probability to the power pf $x_i$ or does that again just mean 'probability of class $c_k$ generating the term $x_i$'?

I also don't understand the intuition behind why or how this calculates $P(x|c_i)$.

nbro
  • 42,615
  • 12
  • 119
  • 217
Aguy
  • 65
  • 1
  • 7

1 Answers1

1

Bernoulli naïve Bayes

$P(x \mid c_k) = \prod^{n}_{i=1} p^{x_i}_{ki} (1-p_{ki})^{(1-x_i)}$

Let's examine the example of document classification.
Let K different text classes and n different terms that our vocabulary contains. $x_i$ are boolean variables (0, 1) expressing if the $i^{th}$ term exists in document $\mathbf{x}$. $\mathbf{x}$ is a vector of dimension $n$.

$P(x \mid c_k)$ is the probability that given the class $k$, document $\mathbf{x}$ to be generated. The equation uses a common trick to represent a multivariate Bernoulli event model, taking into account that when $x_i = 1$, then $1 - x_i = 0$ and inversely. In other words, for each term, it takes the probability that the document does contain this term or it does not.

$p_{ki}$ is the probability of class $c_k$ generating the term $x_i$, that is it could be the prior probability a document that belongs to $k$ class contains this term of the vocabulary.

nbro
  • 42,615
  • 12
  • 119
  • 217
ddaedalus
  • 947
  • 1
  • 7
  • 21