Apr 1, 2024

The Theorem

Bayes' Theorem states that: \[ P(A|B) = P(A) \times \frac{P(B|A)}{P(B)} \] Where \(P(A)\) means the probability of \(A\) and \(P(A|B)\) means the probability of \(A\) given \(B\). To prove the theorem, observe that the probability of both \(A\) and \(B\) occuring, denoted as \(P(A\cap B)\), is: \[ P(A \cap B) = P(A) \times P(B | A) = P(B) \times P(A | B) \] Rearranging the last equality leads to Bayes' Theorem.

Here's an example to illustrate how Bayes' Theorem can be used: imagine there is a disease that affects 1% of the population. The test for the disease is 90% accurate (if you have the disease, then the test will be positive 90% of the time; if you don't have the disease, then the test will be negative 90% of the time). If you test positive, what is the probability that you have the disease? To apply Bayes' Theorem, we'll define \(A\) to mean that you have the disease, and \(B\) to mean that you test positive. So we're solving for \(P(A|B)\), and we know:

Plugging these in to Bayes' Theorem gives us that \(P(A|B) \approx 0.083\).

Beyond the Theorem

Sometimes when people talk about Bayes' Theorem, it feels like they're describing a way of life. I was confused how it could be more than just a formula until I read this post.

The future is fundamentally unknowable. But in order to live, we need to make predictions about the future. In order to make predictions, we need to have a set of assumptions or axioms. Bayes' Theorem gives rise to one set of assumptions, hence it is akin to a way of life.

Before describing the Bayesian approach to predicting the future, I will first describe the Frequentist approach. Frequentists make the assumption that the probability an event occurs in the future is equal to the frequency that the event occurred in the past. For example, if a coin is flipped 100 times and comes up heads 60 times, a frequentist would believe that there is a 60% chance the next flip will come up heads. One problem with this approach is that if we don't have that much historical data, then our prediction may not generalize to the future. In other words, we may overfit on past events.

In the Bayesian approach, you start with a belief about the probability that an event occurs, before observing any data. This initial belief is called the prior. Then, you update your beliefs after observing the data. This resulting belief is called the posterior. Thus, instead of solely basing predictions off of historical data, you also incorporate your own prior beliefs. In this way, the Bayesian approach mitigates the problem of overfitting to historical data that is present in the frequentist approach - the prior belief acts as a form of regularization. In Bayes' Theorem, \(P(A)\) is the prior, \(P(A|B)\) is the posterior, and \(\frac{P(B|A)}{P(B)}\) is how much you should update your beliefs by given the observed data \(B\). In practice, the prior belief could just be your human intuition, so it may be difficult to apply Bayes' Theorem quantitatively. In the coin tossing example, you would probably start with the prior belief that the coin has a miniscule chance of being unfair. After observing it come up heads 60 times after 100 flips, your posterior belief may be that the coin has a slightly larger, but still miniscule, chance of being unfair. So unlike a frequentist, you would predict the next flip to be heads somewhere between 50-60%, instead of exactly 60%.

This Reddit post also explores the question, how can "Bayesian" describe a way of thought?