1

I am trying to understand the probability as used in machine learning. So far I understand there is multipile approaches to probability. The two I know of are bayseian and frequntist approach. So far I understand these two approaches are conceptually different starting points.

In a book introducing ML I read speech and language processing, I find that many results involving bayes is introduced. So I am presuming bayseian view of probability is one being used.

I also heard elsewhere that you need measure theory to do probability but in neither of the approaches to probability I mentioned nor in the book, is there any point any mention of measure theory.

So how do all these things connect together?

Are there equivalent different conceptual starting points to what probability is in machine learning? If so, are they equivalent? Further where does measure theory come in the picture?

1 Answers1

2

ML relies on probability theory founded by Kolmogorov’s measure‑theoretic axioms, which underpin both Bayesian and frequentist interpretations of probability. The frequentist view defines probability as the long‑run relative frequency of events in repeatable trials, while the Bayesian view treats probability as a degree of belief updated via Bayes’ theorem.

Despite these semantic differences, both frameworks share the same underlying mathematical structure, that is, a $3$-tuple probability measure space ${\displaystyle (\Omega ,F,P)}$ with events in a $σ$‑algebra $F$ and a probability measure $P$ satisfying Kolmogorov’s axioms. Measure theory provides the rigorous language of $σ$‑algebras, measures, and integrals that makes it possible to handle both discrete and continuous random variables in ML.

A measure space is a basic object of measure theory, a branch of mathematics that studies generalized notions of volumes. It contains an underlying set, the subsets of this set that are feasible for measuring (the σ-algebra) and the method that is used for measuring (the measure). One important example of a measure space is a probability space.
A measurable space consists of the first two components without a specific measure.

In practice, ML probabilistic models often work with finite sums or well‑behaved densities, abstracting away the full technical measure‑theoretic machinery encompassing all pathological cases.

cinch
  • 11,000
  • 3
  • 8
  • 17