Bayesian statistics is an approach to statistical inference grounded in Bayes’ theorem to update the probability of a hypothesis as more evidence or data becomes available. That theorem gives a formal definition for how prior beliefs about uncertain quantities should be updated with newly observed data to produce an estimate of the likelihood of an event happening.
There are two fundamental paradigms for statistical inference: Bayesian statistics, which treats of unknown parameters parameters as random variables characterized by probability distributions, and Frequentist statistics, which treats all unknown parametersparameters as unknown constants.
Bayesian statistics focuses on conditional probability, which is the probability of an event A given event B. This is usually written . A classic example of calculating conditional probability is a test for a rare disease. Imagine that a test detects the disease 99% of the time if the patient has the disease (true positive) and returns a false positive 1% of the time it is administered. However the disease is relatively rare, occurring in only 1 in 10000 people, which is an occurrence rate of 0.01%. If the test returns positive for a patient, how likely is it that they actually have the disease? Conditional probability shows us that the likelihood is 0.0099 or 0.9%.
Bayesian statistics are named for the Reverend Thomas Bayes, an English statistician born in 1701. Bayes became interested in the problem of inverse probability in 1755 and formulated what became known as Bayes’ Theorem (or Bayes’ Rule) sometime between 1755 and his death in 1761. Bayes was exploring techniques to compute a distribution for the probability parameter of a binomial distribution across multiple identical Bernoulli trials. His theorem computes the reverse conditional probability of an event, and it states:
- represents a hypothesis
- is the observed data
- is the prior probability distribution which expresses beliefs about θ before seeing the data. These can be based upon previous research, or the experimental set up in the case of an experiment. This makes the impact of the rest of the world outside of the data generating process explicit in the calculation.
- is the likelihood function and it represents the probability of observing the data given the hypothesis
- is the evidence (sometimes called ‘marginal likelihood’) that serves as a normalizing constant in the equation. This is how likely the observed data is, independent of any testing.
- this is the posterior distribution, the updated belief about the hypothesis after incorporating the evidence and the priors.
Bayes theorem enables inference in the form of a posterior distribution that can be used to compute point estimates like the mean, mode, or median of the posterior. The posterior distribution also contains interval estimates as well as predictive distributions for new data.1
Conceptually, we use the prior information and the observed data to come up with a new estimate of what is likely to happen based on previous belief and new information: