7

I am trying to understand the difference between a Bayesian Network and a Markov Chain.

When I search for this one the web, the unanimous solution seems to be that a Bayesian Network is directional (i.e. it's a DAG) and a Markov Chain is not directional.

However, often a Markov Chain example is overtime, where the weather today is impacting the weather tomorrow, but the weather tomorrow is not (obviously) impacting the weather today. So I am quite confused how is a Markov Chain not directional?

I seem to be missing something here. Can someone please help me understand?

nbro
  • 42,615
  • 12
  • 119
  • 217
Newskooler
  • 173
  • 6

2 Answers2

1

The main difference between a Bayesian network and a Markov chain is not that a Markov Chain is not directional, it is that the graph of the Bayesian network is not trivial whereas the graph of a Markov chain would be somewhat trivial, as all the previous $k$ nodes would just point to the current node. To illustrate further why this would be trivial, we let each node represent a random variable $X_i$. Then the nodes representing $X_i$ for $ t-k \leq i < t$ would be connected by a directed edge to $X_t$. That is, the edges $(X_i, X_t) \in E$ for $ t-k \leq i < t$ and where $E$ is the set of edges of the graph.

To illustrate this please see the examples below.

Assume that we have a $k$th order Markov chain, then by definition we have $\forall t > k$ $\mathbb{P}(X_t = x | X_0,...,X_{t-1}) = \mathbb{P}(X_t = x | X_{t-k},...,X_{t-1})$.

The main difference between the above definition and the definition of a Bayesian Network is that due to the direction of the graph we can have different dependencies for each $X_t$. Consider the Bayesian Network in the Figure below Bayesian Network

We would get that $\mathbb{P}(X_4 = x| X_1, X_2, X_3) = \mathbb{P}(X_4 = x | X_2, X_3)$ and $\mathbb{P}(X_5 = x| X_1, X_2, X_3, X_4) = \mathbb{P}(X_5 = x | X_3)$.

So, the past events that the current random variable depends on don't have to have the same 'structure' in a Bayesian Network as in a Markov Chain.

David
  • 5,100
  • 1
  • 11
  • 33
1

I am not an expert on this, but I'll try to explain my understnding of this.

A Bayesian Network is a Directed Graphical Model (DGM) with the ordered Markov property i.e the relationship of a node (random variable) depends only on its immediate parents and not its predecessors (generalized from first order Markov process).

A Markov chain on the other hand can be of order $\geq 1$. Thus it may depends on not so immediate predecessors (although there can be no gap between predecessors).

(Defintions according to Kevin Murphy's book, A Probabilistic Approach to MAchine Learning).

Now the reason why such a confusion/ambiguity exists is, in my opinion, due to the fact that Bayesian Nets has been generally used to model causal relationships (and hence directed cause $\rightarrow$ effect) between random variables, which are of different type, i.e the each random variable has a completely different state space (e.g sky is cloudy or sunny vs the ground is wet or dry).

Whereas, we generally use Markov Chains to represent Stochastic Process (a collection of r.v's indexed by time: $X_0, X_1...$) having the Markov Property. And thus for Markov Chain's we have a state transition matrix. Thus, the collection of r.v's indexed with time has the same state space along with a transition matrix specifying the transition probabilities. The directed graph or DGM exists in Markov Chains to show you are moving forward in time, but the state space for each $X_k$ remains the same, and hence no 'real parent' exists.