To follow along, use the lab05 code.
library("tidyverse")
For reproducibility, set the seed.
set.seed(20190219)
Let’s let \(y\sim Bin(n,\theta)\). Recall that the MLE is \(\hat{\theta}_{MLE} = \frac{y}{n}\). If we assume the prior \(\theta \sim Be(a,b)\), then the posterior is \(\theta|y \sim Be(a+y,b+n-y)\) and the posterior expectation is \[ \hat{\theta}_{Bayes} = E[\theta|y] = \frac{a+y}{a+b+n}. \] Specifically, if we assume a Uniform prior on (0,1), i.e. \(a=b=1\), then \(\theta|y \sim Be(1+y,1+n-y)\) and the posterior expectation is \[ \hat{\theta}_{Bayes} = E[\theta|y] = \frac{1+y}{2+n}. \] This is equivalent to adding one success and one failure to the data.
Suppose you randomly sample 22 buildings in Ames and find that 15 of them are residential buildings. Then the MLE for the probability of a building in Ames being residential is
n <- 22
y <- 15
(mle <- y/n)
## [1] 0.6818182
the Bayes estimator is
a <- b <- 1
(bayes <- (a+y)/(a+b+n))
## [1] 0.6666667
and a 95% credible interval is
qbeta(c(.025,.975), a+y, b+n-y)
## [1] 0.4708083 0.8362364
Suppose that you take a larger sample of buildings and find that out of 222 buildings 135 of them are residential. Calculate the MLE, Bayes estimator, and a 90% credible interval for the probability of a house in Ames being residential.
Click for solution
n <- 222
y <- 135
(mle <- y/n)
## [1] 0.6081081
(bayes <- (a+y)/(a+b+n))
## [1] 0.6071429
qbeta(c(.05,.95), a+y, b+n-y)
## [1] 0.5530313 0.6601642
Can you calculate a posterior median?
Recall that estimators have the properties
In addition, uncertainty intervals, i.e. credible intervals, have a property called
The bias of an estimator is the expected value of the estimator minus the true value. Generally, we don’t know the true value, but we can simulate data with a known truth. We can estimate the bias, by performing repeated simulations and taking the average value of the estimator across those simulations. This average value is an estimate of the expected value. If we take enough values in the average, then the Central Limit Theorem tells us the distribution of this average. Specifically
\[\overline{x} \sim N(\mu,\sigma^2/n)\] where \(\mu=E[\overline{x}]\) and we estimate \(\sigma^2\) with the sample variance.
Let’s use simulations, i.e. Monte Carlo, to estimate the bias of the MLE and Bayes estimator. To do this, we are going to repeatedly
Then we will
n <- 10
theta <- 0.5
n_reps <- 1e4
mle <- numeric(n_reps)
bayes <- numeric(n_reps)
for (i in 1:n_reps) {
y <- rbinom(1, size = n, prob = theta)
mle[i] <- y/n
bayes[i] <- (a+y)/(a+b+n)
}
mean(mle) - theta # estimate of MLE bias
## [1] 0.00096
mean(bayes)- theta # estimate of Bayes bias
## [1] 8e-04
Now, we probably want some idea of how close we are to the true bias. We will use the CLT. \[ \overline{\hat{\theta}} \pm z_{0.025} SE(\overline{\hat{\theta}}) \] where the \(SE(\overline{\hat{\theta}})\) is estimated by the sample standard deviation of \(\hat{\theta}\) divided by the square root of the number of values in the average, i.e. the sample size. The formula for a 95% interval is then \[ \overline{\hat{\theta}} \pm z_{0.025} s_{\hat{\theta}}/\sqrt{n} \]
mean(mle ) + c(-1,1)*qnorm(.975)*sd(mle )/sqrt(length(mle )) - theta
## [1] -0.002153667 0.004073667
mean(bayes) + c(-1,1)*qnorm(.975)*sd(bayes)/sqrt(length(bayes)) - theta
## [1] -0.001794722 0.003394722
We could have written the code a bit more succinctly (and, perhaps, obtusely).
y <- rbinom(n_reps, size = n, prob = theta)
mle <- y/n
bayes <- (a+y)/(a+b+n)
mean(mle ) + c(-1,1)*qnorm(.975)*sd(mle )/sqrt(length(mle )) - theta
## [1] -0.002545906 0.003605906
mean(bayes) + c(-1,1)*qnorm(.975)*sd(bayes)/sqrt(length(bayes)) - theta
## [1] -0.002121588 0.003004922
Repeat the simulation procedure we had above, but using \(n=5\) and \(\theta=0.2\). What do you find for the bias of the MLE and the Bayes estimator?
Click for solution
n <- 5
theta <- 0.2
y <- rbinom(n_reps, size = n, prob = theta)
mle <- y/n
bayes <- (a+y)/(a+b+n)
mean(mle ) + c(-1,1)*qnorm(.975)*sd(mle )/sqrt(length(mle )) - theta
## [1] -0.004077641 0.002877641
mean(bayes) + c(-1,1)*qnorm(.975)*sd(bayes)/sqrt(length(bayes)) - theta
## [1] 0.08280169 0.08776974
So the Bayes estimator appears to be quite biased.
Let’s study what happens to the bias of the Bayes estimator as we change \(n\) and \(\theta\). The following will look at \(n=1,10,100,1000\) and \(\theta\) from \(0\) to \(1\) in increments of 0.1.
settings <- expand.grid(n = 10^(0:3),
theta = seq(0,1,by=0.1))
We will use the dplyr package to help us iterate over all of these values of \(n\) and \(\theta\). First we need to construct the function to do a simulation study for every value of n
and theta
.
sim_function <- function(x) {
y <- rbinom(1e4, size = x$n, prob = x$theta)
mle <- y/x$n
bayes <- (a+y)/(a+b+x$n)
d <- data.frame(
estimator = c("mle", "bayes"),
bias = c(mean(mle), mean(bayes)) - x$theta,
var = c(var( mle), var( bayes)))
# d$se <- sqrt(d$var / x$n)
# d$lower <- d$bias-qnorm(.975)*d$se
# d$upper <- d$bias+qnorm(.975)*d$se
return(d)
}
Then we can loop over all the values of n
and theta
.
sim_study <- settings %>%
group_by(n, theta) %>%
do(sim_function(.))
Let’s plot the Monte Carlo estimated bias.
ggplot(sim_study, aes(x=theta, y=bias, color=estimator)) +
geom_line() +
facet_wrap(~n) +
theme_bw()
We can see that the Bayes estimator has the most bias when 1) the sample size is small and 2) \(\theta\) is far away from 0.5. But the variance of the Bayes estimator is smaller than the variance of the MLE estimator.
ggplot(sim_study, aes(x=theta, y=var, color=estimator)) +
geom_line() +
facet_wrap(~n) +
theme_bw()
We can see that the Bayes estimator always has smaller variance than the MLE.
Thus, we often use a new property of an estimator called the mean squared error (MSE): \[ MSE(\hat{\theta}) = E[(\hat\theta-\theta)^2] = Var(\hat{\theta}) + Bias(\hat{\theta},\theta)^2 \] This property combines both the bias and the variance into a single term. Generally the estimator with the smallest MSE should be utilized.
sim_study$mse <- sim_study$var + sim_study$bias^2
ggplot(sim_study, aes(x=theta, y=mse, color=estimator)) +
geom_line() +
facet_wrap(~n) +
theme_bw()
Generate a curve for the MSE for the MLE and Bayes estimators for a binomial probability of success when \(n=13\) for various values of \(\theta\).
Click for solution
settings <- expand.grid(n = 13,
theta = seq(0,1,by=0.1))
sim_study <- settings %>%
group_by(n, theta) %>%
do(sim_function(.))
sim_study$mse <- sim_study$var + sim_study$bias^2
ggplot(sim_study, aes(x=theta, y=mse, color=estimator)) +
geom_line() +
facet_wrap(~n) + # This line isn't necessary
theme_bw()
Why does the alternative approach give the same answer? (I’m not sure what I mean by this.)
Recall that an estimator is consistent if \[ P(|\hat\theta_n-\theta|<\epsilon) \to 1 \quad\mbox{ as }\quad n\to\infty \] for any \(\epsilon>0\). Here I have made explicit that the estimator depends on \(n\).
To evaluate consistency by simulation, we will perform a sequence of experiments, i.e. one coin flip at a time, and calculate the estimator after each coin flip. Then we will see how many times the estimator is more than \(\epsilon\) away from \(\theta\).
theta = 0.51
n_max <- 999
sim_function <- function(d) {
x <- rbinom(n_max, size = 1, prob = theta)
mle <- bayes <- numeric(n_max)
for (n in 1:n_max) {
y <- sum(x[1:n])
mle[n] <- y/n
bayes[n] <- (a+y)/(a+b+n)
}
data.frame(n = 1:n_max,
mle = mle,
bayes = bayes)
}
d <- data.frame(rep=1:1e3) %>%
group_by(rep) %>%
do(sim_function(.))
epsilon <- 0.05
sum <- d %>%
gather(estimator, estimate, mle, bayes) %>%
group_by(n, estimator) %>%
summarize(prob = mean(abs(estimate - theta) < epsilon))
## `summarise()` regrouping output by 'n' (override with `.groups` argument)
Now plot the results
ggplot(sum, aes(x=n, y=prob,
color=estimator, linetype = estimator)) +
geom_line() +
theme_bw()
This is an illustration of the property called consistency. For both the Bayes estimator and the MLE, the probability of being within \(\epsilon\) of \(\theta\) (the true success probability) converges to 1 as the number of observations increases.
Via simulation, show that if \(\theta=0.01\) that both the MLE and Bayes estimators are consistent.
Click for solution
theta = 0.01
d <- data.frame(rep=1:1e3) %>%
group_by(rep) %>%
do(sim_function(.))
sum <- d %>%
gather(estimator, estimate, mle, bayes) %>%
group_by(n, estimator) %>%
summarize(prob = mean(abs(estimate - theta) < epsilon))
## `summarise()` regrouping output by 'n' (override with `.groups` argument)
ggplot(sum, aes(x = n, y = prob, color = estimator, linetype = estimator)) +
geom_line() +
theme_bw()
Uncertainty intervals have a property called coverage. Coverage is the probability the interval will contain the truth when the method used to construct the interval is used repeatedly on different data sets.
We will evaluate coverage using simulations by repeatedly
n <- 100
theta <- 0.5
n_reps <- 1e4
y <- rbinom(n_reps, size = n, prob = theta)
lower <- qbeta(.025, a+y, b+n-y)
upper <- qbeta(.975, a+y, b+n-y)
mean( lower < theta & theta < upper )
## [1] 0.9407
If we want to understand our Monte Carlo uncertainty in this coverage, we can use the CLT for a proportion.
p <- mean( lower < theta & theta < upper)
p + c(-1,1)*qnorm(.975)*sqrt(p*(1-p)/n_reps)
## [1] 0.9360709 0.9453291
Via simulation, determine the coverage for a credible interval when \(n=10\) and \(\theta = 0.2\) and use the CLT to provide an uncertainty on this coverage.
Click for solution
n <- 10
theta <- 0.2
n_reps <- 1e4
y <- rbinom(n_reps, size = n, prob = theta)
lower <- qbeta(.025, a+y, b+n-y)
upper <- qbeta(.975, a+y, b+n-y)
(p <- mean( lower < theta & theta < upper))
## [1] 0.9695
p + c(-1,1)*qnorm(.975)*sqrt(p*(1-p)/n_reps)
## [1] 0.9661297 0.9728703
Now try with \(\theta=0.01\).
Let’s now look at coverage across a wide range of \(n\) and \(\theta\).
binomial_coverage_function <- function(d) {
y <- rbinom(1e4, size = d$n, prob = d$theta)
mle <- y/d$n
bayes <- (a+y)/(a+b+d$n)
lower <- qbeta(.025, a+y, b+d$n-y)
upper <- qbeta(.975, a+y, b+d$n-y)
data.frame(coverage = mean( lower <= d$theta & d$theta <= upper))
}
sim_study <- expand.grid(n = 10^(0:3),
theta = seq(0,1,by=0.1)) %>%
group_by(n,theta) %>%
do(binomial_coverage_function(.))
Now plot the results
ggplot(sim_study, aes(x=theta, y=coverage)) +
geom_line() +
facet_wrap(~n) +
geom_hline(yintercept = 0.95, color = "red") +
ylim(0,1) +
theme_bw()
The coverage is extremely poor when \(\theta\) is 0 (or 1) because \(y\) is always 0 (or \(n\)). In these cases, a simple way to fix the interval is to use a one-sided interval when \(y\) is 0 (or \(n\)). Specifically,
binomial_coverage_function_fixed <- function(d) {
n <- d$n
theta <- d$theta
y <- rbinom(1e4, size = n, prob = theta)
mle <- y/d$n
bayes <- (a+y)/(a+b+n)
lower <- qbeta(.025, a+y, b+n-y)
upper <- qbeta(.975, a+y, b+n-y)
# Fix intervals when y=0
lower[y==0] <- 0
upper[y==0] <- qbeta(.95, a+0, b+n)
# Fix intervals when y=n
upper[y==n] <- 1
lower[y==n] <- qbeta(.05, a+n, b+0)
data.frame(coverage = mean( lower <= theta & theta <= upper))
}
sim_study <- expand.grid(n = 10^(0:3),
theta = seq(0,1,by=0.1)) %>%
group_by(n,theta) %>%
do(binomial_coverage_function_fixed(.))
Now plot the results
ggplot(sim_study, aes(x=theta, y=coverage)) +
geom_line() +
facet_wrap(~n) +
geom_hline(yintercept = 0.95, color = "red") +
ylim(0,1) +
theme_bw()
The plot here has been constructed to have y-axis limits set to 0 and 1 for comparison with the previous plot.
In this class, we will use a uniform prior for the probability. An alternative is to use Jeffreys prior which is \(Be(0.5,0.5)\). Run a simulation study to look at coverage for Jeffreys prior.
Click for solution
a <- b <- 0.5
sim_study <- expand.grid(n = 10^(0:3),
theta = seq(0,1,by=0.1)) %>%
group_by(n,theta) %>%
do(binomial_coverage_function_fixed(.))
ggplot(sim_study, aes(x=theta, y=coverage)) +
geom_line() +
facet_wrap(~n) +
geom_hline(yintercept = 0.95, color = "red") +
ylim(0,1) +
theme_bw()