I am trying to understand the probability as used in machine learning. So far I understand there is multipile approaches to probability. The two I know of are bayseian and frequntist approach. So far I understand these two approaches are conceptually different starting points.
In a book introducing ML I read speech and language processing, I find that many results involving bayes is introduced. So I am presuming bayseian view of probability is one being used.
I also heard elsewhere that you need measure theory to do probability but in neither of the approaches to probability I mentioned nor in the book, is there any point any mention of measure theory.
So how do all these things connect together?
Are there equivalent different conceptual starting points to what probability is in machine learning? If so, are they equivalent? Further where does measure theory come in the picture?