25.2 Bootstrapping

How can we estimate the variability in a statistic? Let’s say we want a probability interval for the mean of some data.

One option is to rely on normal theory and write,

\[var(\bar{x}) = \frac{1}{n}var(x),\] then use the right normal score to get the ``confidence interval’’ you want:

The accuracy of this estimate depends on the extent to which the data are normal. We know the data are from a rather skewed gamma, how much does that skewness affect the interval? How could we find out?

What if we had some large number of samples, say m, of the mean computed from \(n\) samples of this population? This is an awful idea in practice, but bear with me.

We can visualize this:

Obviously, we can’t just get a thousand means to estimate its variability, but we can do something close by resampling from our original data. This is called bootstrapping.

We see that this strange resampling technique seems to work well when we want to find a mean, at least as well as the normal approximation. Its true strength lies in estimating variability in more exotic statistics. For example, what is the variability in the width of the 95% confidence probability interval for our gamma data?

We can do this with any function at all!