# Bayes’ theorem is trivial

I don’t know who needs to read this, since it’s fairly clear in textbooks.

In standard probability theory (over discrete/enumerable domains), a probability is a P(A) is a weight or mass attached to a set of events A={u,v,…}. Sometimes rigorous presentations will start by defining probabilities over the faces of a die as possible elementary events; in practice, no one worries about this. Having the notion of a set of elementary events is nevertheless useful because it will have a total mass of 1, and much of the maths for continuous domain becomes easier to swallow if you’re continually thinking about distributing masses and finding centers of gravity.

A conditional probability is a probability under a reduced set of events. So we may compare the unconditional P(rain) with the conditional P(rain | cloudy). Now, the actual definition of conditional probabilities is rather gnarly (they’re actually conditional expectations of binary functions of events, and conditional expectations are, well, best served with gagh’), but the intuitively defensible formula is that P(rain | cloudy) = P(rain and cloudy) / P(cloudy). A simple way of defending it is to claim that (rain | cloudy) implies we already know that it is cloudy, so the total mass to be distributed is P(cloudy). But hark, this is a backdoor, we introduced an epistemic interpretation of probabilities that isn’t really warranted from what I presented.

At any rate: P(A|B) = P(A and B)/P(B) is the actual formula, and it’s presented as the definition of conditional probability in elementary textbooks.

See, nothing in this sleeve, nothing in that sleeve, except a good-hearted assumption that “A and B” and “B and A” mean the same (and what world would it be otherwise). Then:

P(A|B) = P(B and A)/P(B) = P(B|A) x P(A) / P(B)

boo yah. My first probability textbook didn’t call this the Bayes theorem, rather the Bayes formula. There’s nothing to it that isn’t contained in the definition of (A|B). [An interesting, more exotic formalization of probability starts from A|B to show that that probability theory is a kind of natural extension of classical logic. But it does nothing to the discussion above.]

So what is with all the Bayesian stuff? Well, it doesn’t use the theorem (a proposition that can be given this name because it admits mathematical proof). Instead, it relates two different probability measures P and Q in the following fashion

Q(B) = Q(B|A) x P(A) + Q(B|not A) x P(not A)

Q(A|B) = Q(B|A) x P(A) / Q(B)

in such a way that (we’re led to believe) P(.) describe old beliefs and Q(.) describe beliefs in the light of new data. Of course, this is patently true if P = Q, and reasonable if P and Q are thought to be approximation to some ground truth.

But there’s no epistemology there. The common (albeit often very useful, yielding powerful tools) appeal to Bayes’ formula as a “theorem” hides a prayer — please, Universe, let there be a ground truth that’s amenable to undergraduate-level mathematics.