I don’t know who needs to read this, since it’s fairly clear in textbooks.

In standard probability theory (over discrete/enumerable domains), a probability is a P(A) is a weight or mass attached to a set of events A={u,v,…}. Sometimes rigorous presentations will start by defining probabilities over the faces of a die as possible elementary events; in practice, no one worries about this. Having the notion of a set of elementary events is nevertheless useful because it will have a total mass of 1, and much of the maths for continuous domain becomes easier to swallow if you’re continually thinking about distributing masses and finding centers of gravity.

A *conditional *probability is a probability under a reduced set of events. So we may compare the unconditional P(rain) with the conditional P(rain | cloudy). Now, the actual definition of conditional probabilities is rather gnarly (they’re actually conditional *expectations *of binary functions of events, and conditional expectations are, well, best served with gagh’), but the intuitively defensible formula is that P(rain | cloudy) = P(rain and cloudy) / P(cloudy). A simple way of defending it is to claim that (rain | cloudy) implies we *already *know that it is cloudy, so the total mass to be distributed is P(cloudy). But hark, this is a backdoor, we introduced an epistemic interpretation of probabilities that isn’t really warranted from what I presented.

At any rate: P(A|B) = P(A and B)/P(B) is the actual formula, and it’s presented as the *definition *of conditional probability in elementary textbooks.

See, nothing in this sleeve, nothing in that sleeve, except a good-hearted assumption that “A and B” and “B and A” mean the same (and what world would it be otherwise). Then:

P(A|B) = P(B and A)/P(B) = P(B|A) x P(A) / P(B)

*boo yah*. My first probability textbook didn’t call this the Bayes *theorem*, rather the Bayes *formula*. There’s nothing to it that isn’t contained in the definition of (A|B). [An interesting, more exotic formalization of probability *starts *from A|B to show that that probability theory is a kind of natural extension of classical logic. But it does nothing to the discussion above.]

So what is with all the Bayesian stuff? Well, it doesn’t use the theorem (a proposition that can be given this name because it admits mathematical proof). Instead, it relates *two *different probability measures P and Q in the following fashion

Q(B) = Q(B|A) x P(A) + Q(B|not A) x P(not A)

Q(A|B) = Q(B|A) x P(A) / Q(B)

in such a way that (we’re led to believe) P(.) describe old beliefs and Q(.) describe beliefs in the light of new data. Of course, this is patently true if P = Q, and reasonable if P and Q are thought to be approximation to some ground truth.

But there’s no epistemology there. The common (albeit often very useful, yielding powerful tools) appeal to Bayes’ formula as a “theorem” hides a prayer — please, Universe, let there be a ground truth that’s amenable to undergraduate-level mathematics.