# Bayes

## The Theorem

Bayes' Theorem states that: \[ P(A|B) = P(A) \times \frac{P(B|A)}{P(B)} \] Where \(P(A)\) means the probability of \(A\) and \(P(A|B)\) means the probability of \(A\) given \(B\). To prove the theorem, observe that the probability of both \(A\) and \(B\) occuring, denoted as \(P(A\cap B)\), is: \[ P(A \cap B) = P(A) \times P(B | A) = P(B) \times P(A | B) \] Rearranging the last equality leads to Bayes' Theorem.

Here's an example to illustrate how Bayes' Theorem can be used: imagine there is a disease that affects 1% of the population. The test for the disease is 90% accurate (if you have the disease, then the test will be positive 90% of the time; if you don't have the disease, then the test will be negative 90% of the time). If you test positive, what is the probability that you have the disease? To apply Bayes' Theorem, we'll define \(A\) to mean that you have the disease, and \(B\) to mean that you test positive. So we're solving for \(P(A|B)\), and we know:

- \(P(A) = 0.01\)
- \(P(B|A) = 0.9\) (the probability you test positive given you have the disease)
- \(P(B) = P(A) \times P(B|A) + P(\neg A) \times P(B | \neg A) = 0.01 \times 0.9 + 0.99 \times 0.1 = 0.108\) (the probability that you test positive)

Plugging these in to Bayes' Theorem gives us that \(P(A|B) \approx 0.083\).

## Beyond the Theorem

Sometimes when people talk about Bayes' Theorem, it feels like they're describing a way of life. I was confused how it could be more than just a formula until I read this post.

The future is fundamentally unknowable. But in order to live, we need to make predictions about the future. In order to make predictions, we need to have a set of assumptions or axioms. Bayes' Theorem gives rise to one set of assumptions, hence it is akin to a way of life.

Before describing the *Bayesian* approach to predicting the future, I will first describe the
*Frequentist* approach. Frequentists make the assumption that the probability an event occurs in the future
is equal to the frequency that the event occurred in the past. For example, if a coin is flipped 100 times and comes
up heads 60 times, a frequentist would believe that there is a 60% chance the next flip will come up heads. One
problem with this approach is that if we don't have that much historical data, then our prediction may not
generalize to the future. In other words, we may overfit on past events.

In the Bayesian approach, you start with a belief about the probability that an event occurs, before observing any
data. This initial belief is called the *prior*. Then, you update your beliefs after observing the data. This
resulting belief is called the *posterior*. Thus, instead of solely basing predictions off of historical
data, you also incorporate your own prior beliefs. In this way, the Bayesian approach mitigates the problem of
overfitting to historical data that is present in the frequentist approach - the prior belief acts as a form of
regularization. In Bayes' Theorem, \(P(A)\) is the prior, \(P(A|B)\) is the posterior, and \(\frac{P(B|A)}{P(B)}\)
is how much you should update your beliefs by given the observed data \(B\). In practice, the prior belief could
just be your human intuition, so it may be difficult to apply Bayes' Theorem quantitatively. In the coin tossing
example, you would probably start with the prior belief that the coin has a miniscule chance of being unfair. After
observing it come up heads 60 times after 100 flips, your posterior belief may be that the coin has a slightly
larger, but still miniscule, chance of being unfair. So unlike a frequentist, you would predict the next flip to be
heads somewhere between 50-60%, instead of exactly 60%.

This Reddit post also explores the question, how can "Bayesian" describe a way of thought?