ModernDive

D The Theory Behind the Bootstrap Method

We explain the bootstrap method when use to construct an interval estimate for \(\mu\), but the explanation can be easily generalized to confidence intervals for other parameters. We let the letter \(F\) represent the population distribution. We can think of this distribution as the set of possible values that can be observed and the probability or chance that those values may appear in a sample taken from this population. Recall that a histogram can be thought of as the visual representation of this distribution. Moreover, imagine that \(\mu\) is the parameter we wish to estimate. Observe that, in general, we do not need to restrict our treatment to \(\mu\) but any other parameter or characteristic of this population.

We then take a sample of size \(n\) to estimate \(\mu\), and we construct an empirical distribution based on this sample, by assigning a probability \(1/n\) to each value in the sample. We call this empirical distribution \(F_n\). We treat this empirical distribution as if it would be an estimate of the population distribution. For example, the population mean of this empirical distribution is precisely the average of all the values in the sample, or the sample mean, but we call it \(\mu_n\) to emphasize that we are treating this as a population.

The bootstrap consists in using the empirical distribution only and nothing else for the construction of estimates, confidence intervals, etc. We are not introducing additional assumptions to the population distribution, or to the structure of the sample, we are simply taking advantage of the sample we have collected.

So now, \(F_n\) is considered the population distribution, and as described in Section 8.2 we obtain a bootstrap sample by sampling, with replacement, a sample of size \(n\) from this empirical distribution. This bootstrap sample is now considered an empirical distribution, and we call it \(F^*_n\).

Let’s recap. We want to study the population distribution \(F\), and we take a sample of size \(n\) from this distribution, call it \(F_n\), and use it as an empirical distribution that estimates \(F\). We then let \(F_n\) play the role of the population distribution, take a bootstrap sample from \(F_n\) (also of size \(n\)), call it \(F^*_n\), and use it as an empirical distribution that estimates \(F_n\).

So, at the first stage we do not know anything about \(F\) other than the sample we have taken. At the second stage, we know everything about \(F_n\), and we can study how \(F^*_n\) behaves with respect to \(F_n\). The idea of bootstrap is to translate whatever we learn from the relationship between \(F^*_n\) and \(F_n\) into the possible relationship between \(F_n\) and \(F\).

Furthermore, we could obtain every possible bootstrap sample \(F^*_n\) from \(F_n\), but this is a really large number, as discussed in Section 8.2. Instead, only a few thousand bootstrap samples are used.

Let’s translate these ideas to parameters and estimators. We are interested in estimating \(\mu\) from \(F\). We have constructed \(\mu_n\) from \(F_n\), that is, we know what the mean is from our empirical distribution, it is just the sample mean from the sample taken. Now, we can obtain the distribution of \(\mu^*_n\), that is, the distribution of the sample means from all the bootstrap samples, study how much variation they have to approximate the standard error, and use this information to construct confidence intervals, perform hypothesis tests, etc.

(Hall 1992) has provided the theoretical justification to use bootstrap confidence intervals using Edgeworth expansions. The description of Edgeworth expansions are beyond the scope of this book, but in simple terms, you can think of them as approximations to the population distribution and of parameter estimates, that in certain context provide better approximations than the Central Limit Theorem.