R code

Probability

Probability theory is the branch of mathematics concerned with the analysis of random phenomenon.

The mathematical definition of probability typically starts with Kolmogorov’s axioms of probability. As a reminder, axioms are assumed to be true (i.e. they can’t be proven).

  1. The probability of an event is a non-negative real number: \[P(E) \in \mathbb{R}, P(E) \ge 0 \quad\forall E \in F\] where \(F\) is some event space.
  2. The probability that at least one elementary event in the sample space will occur is 1, i.e. \[P(\Omega)=1\] where \(\Omega\) is the sample space.
  3. Any countable sequence of disjoint events \(E_1,E_2,\ldots\) satisfies \[P\left( \bigcup_{i=1}^\infty E_i \right) = \sum_{i=1}^\infty P(E_i).\]

Independence

Two events are independent if their joint probability is the product of their marginal probabilities, i.e.  \[P(A,B) = P(A)P(B).\]

Conditional probability

The conditional probability of one event given another event is \[P(A|B) = \frac{P(A,B)}{P(B)} \qquad P(B) > 0.\]

Equally likely outcomes

Typically, we start out teaching probability using intuitive approaches like rolling dice or drawing cards out of a deck. These examples all rely on our understanding that the elementary outcomes in the sample space are equally likely, e.g. each side of a die is equally likely to be rolled. From this belief, we can calculate probabilities through the formula: \[P(A) = \frac{|A|}{|\Omega|}\] where \(|cdot|\) is the cardinality or size of the set.

For complicated sets, we rely on the Fundamental Counting Principle which states “if there are \(n\) ways of doing a first thing and \(m\) ways of doing a second thing, then there are \(n\times m\) ways of doing both things.”

Combinations

One application is a combination, i.e. how many ways are there to pull \(k\) items out of a set of \(n\) items? The number of combinations are \[C(n,k) = {n\choose k} = \frac{n!}{(n-k)!k!}.\]

# Example combination: C(10,4)
choose(10, 4) 
## [1] 210

Permutations

Another application is a permutation, i.e. how many ways are there to arrange \(k\) items out of \(n\)? The number of permutations are \[P(n,k) = n(n-1)(n-2)\cdots(n-k+1) = \frac{n!}{(n-k)!}.\] A common question is when \(n=k\), i.e. you are rearranging all \(n\) items. In this case, there are \(n!\) permutations.

# Example permutation: P(10,4)
factorial(10) / factorial(4)
## [1] 151200

Calculating probabilities

A common application of these ideas is to calculate probabilities for the sum of two 6-sided dice.

# Probabilities for the sum of two 6-sided dice
d <- data.frame(sum = 2:12, probability = c(1:6,5:1)/36)
plot(probability ~ sum, data = d)

Another application is to determine probabilities of obtaining a particular set of cards drawn from a deck. For example, the probability of obtaining a pair, two cards of the same rank in a standard deck of 52 cards with 4 suits. The easiest way to understand this probability is to realize that the first card doesn’t matter, but the second card delt must match the rank of the first card. Since there are 3 cards left that match the rank and there are 51 cards left in the deck, the probability is

# Probability of a pair when being delt two cards
3/51
## [1] 0.05882353

While situations with equally likely outcomes are interesting, their primary application is to games of chance and not much else in the scientific world.

Thus to extend the applicability of probability, we introduce the idea of a random variable. The basic definition of a random variable is that it is a function of the outcome of an experiment to the real numbers, i.e. \[X: \Omega \to \mathbb{R}.\] If the image of \(X\) is countable or finite, then \(X\) is a discrete random variable. Otherwise, \(X\) is a continuous random variable.

Discrete

Discrete random variables have a probability mass function that provides the probability for each possible value of the random variable, i.e. \(f(x)= P(X=x)\), and a cumulative distribution function that provides the probability the random variable is less than or equal to that value, i.e. \(F(x) = P(X \le x)\).

Discrete random variables can further be classified by whether the image is finite or countably infinite. This is important for the application of statistical methods but less important for the mathematics of discrete random variables.

Independent discrete random variables have the joint probability mass function equal to the marginal probability mass functions. For example, if \(X\) and \(Y\) are two independent, discrete random variables then \[p_{X,Y}(x,y) = P(X=x,Y=y) = P(X=x)P(Y=y) = p_X(x)p_Y(y).\]

Binomial

The binomial distribution is commonly used when count the number of success out of some number of attempts where each attempt has the same probability and the attempts are independent. Common examples are flipping coins and rolling dice.

Let \(Y\sim Bin(n,p)\) indicate a binomial random variable with \(n>0\) attempts and probability of success \(0<p<1\). The probability mass function is \[p(y) = {n\choose y}p^y(1-p)^{n-y}, \quad y=0,1,2,\ldots,n.\] If \(n=1\), then this is also referred to as a Bernoulli random variable.

The expected value (or mean) of a binomial random variable is \(E[Y] = np\) and the variance is \(Var[Y] = np(1-p)\).

Example

Suppose \(Y \sim Bin(25, 0.9)\), i.e. \(Y\) is a binomial random variable with \(25\) attempts and probability of success \(0.9\).

#############################################################################
# Binomial
#############################################################################


# Binomial parameters
n <- 25
p <- 0.9

# Expected value (mean)
n*p
## [1] 22.5
# Variance
n*p*(1-p)
## [1] 2.25

What is the probability that \(Y\) is equal to \(24\)?

# Calculate probability using probability mass function
dbinom(24, size = n, prob = p)
## [1] 0.1994161

What is the probability that \(Y\) is less than \(23\)?

# Calculate probability using the cumulative distribution function
# Remember P(Y < y) = P(Y <= y-1)
pbinom(23 - 1, size = n, prob = p)
## [1] 0.4629059
# Alternatively using the probability mass function
sum(dbinom(0:22, size = n, prob = p))
## [1] 0.4629059

Poisson

The Poisson distribution is commonly used when our data are counts, but there is clear or obvious maximum possible count. Typically these counts are over some amount of time, space, or space-time. For example, - the number of cars passing through an intersection in an hour, - the number of blades of grass in a square meter, or - the number of clicks on a website in a minute.

Let \(Y\sim Po(\lambda)\) indicate a Poisson random variable with rate \(\lambda>0\). The probability mass function is \[p(y) = \frac{e^{-\lambda} \lambda^y}{y!}, \quad y = 0,1,2,\ldots\] The expected value (or mean) is \(E[Y] = \lambda\) and the variance is \(Var[Y] = \lambda\).

Example

Suppose \(Y \sim Po(5.4)\), i.e. \(Y\) is a Poisson random variable with rate \(5.4\).

#############################################################################
# Poisson
#############################################################################

# Poisson parameter
rate <- 5.4

# Mean
rate
## [1] 5.4
# Variance
rate
## [1] 5.4

What is the probability \(Y\) is \(4\)?

# Calculate Poisson probability using probability mass function
dpois(4, lambda = rate)
## [1] 0.1600198

What is the probability \(Y\) is above 3 and below 9?

Note that \(P(3 < Y < 9) = P(Y \le 8) - P(Y \le 3)\). This allows us to use the cumulative distribution function.

# Calculate probability of a range using the cumulative distribution function
ppois(8, lambda = rate) - ppois(3, lambda = rate) # OR
## [1] 0.6893592
diff(ppois(c(3,8), lambda = rate))
## [1] 0.6893592

Also note that \(P(3 < Y < 9) = P(Y = 4) + P(Y = 5) + P(Y = 6) + P(Y = 7) + P(Y = 8)\) due to Kolmogorov’s third axiom.

# Calculate probability of a range using the sum of probability mass function values
dpois(4, lambda = rate) + 
  dpois(5, lambda = rate) +
  dpois(6, lambda = rate) +
  dpois(7, lambda = rate) +
  dpois(8, lambda = rate)   # OR
## [1] 0.6893592
sum(dpois(4:8, lambda = rate))
## [1] 0.6893592

Continuous

Continuous random variables have an image that is uncountably infinite and thus the probability of every value of the random variable is 0. To calculate the probability of the random variable falling into an interval \((a,b)\), we integrate (or find the area under) the probability density function between \(a\) and \(b\). Continuous random variables also have a cumulative distribution function that provides the probability the random variable is less than or equal to a particular value, i.e. \(F(x) = P(X \le x) = P(X < x)\).

Normal

The most important continuous random variable is the normal (or Gaussian) random variable. Let \(Y\sim N(\mu,\sigma^2)\) be a normal random variable with mean \(\mu\) and variance \(\sigma^2>0\). The probability density function (PDF) for a normal random variable is \[f(y) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left( -\frac{1}{2\sigma^2} (y-\mu)^2\right).\] This is the canonical bell-shaped curve. If \(\mu=0\) and \(\sigma^2=1\), we have a standard normal random variable.

#############################################################################
# Normal
#############################################################################

# Standard normal density, Y ~ N(0,1)
mu    <- 0
sigma <- 1 # Standard deviation

# Probability density function (PDF)
curve(dnorm(x, mean = mu, sd = sigma), from = mu-3*sigma, to = mu+3*sigma)