9  The Bootstrap

9.1 Prerequisites

Answer the following questions to see if you can bypass this chapter. You can find the answers at the end of the chapter in Section 9.19.

  1. What is the primary problem the bootstrap is designed to solve?
  2. Given \(B\) bootstrap replications \(\hat\theta^*_1, \dots, \hat\theta^*_B\) of a statistic, how do you compute the bootstrap standard error?
  3. Why is Monte Carlo approximation used in practice rather than computing the exact bootstrap distribution over all \(n^n\) possible resamples?

9.2 Learning objectives

By the end of this chapter you should be able to:

  • Explain the plug-in principle that underlies the bootstrap.
  • Implement a nonparametric bootstrap in base R.
  • Compute percentile, basic, BCa, and studentised bootstrap confidence intervals.
  • Construct bootstrap p-values via null-enforcing resampling.
  • Apply case and residual bootstrap to regression coefficients.
  • Recognise when the bootstrap fails (extreme quantiles, time series, dependent data).
  • Use the boot package for standard cases.

9.3 Orientation

The bootstrap (Efron, 1979; Efron & Tibshirani, 1993) is the most useful general-purpose inference tool of the last 50 years. It works when analytic standard errors are impossible or wrong, and it fails in interesting, learnable ways when it does fail. Every biostatistician should be comfortable rolling their own bootstrap.

9.4 The statistician’s contribution

Before handing a dataset to a large language model with the instruction to ‘bootstrap the standard error’, four decisions require the statistician’s judgment. An LLM will execute a nonparametric iid bootstrap by default. Whether that is the appropriate bootstrap for the problem at hand is a determination the model cannot make without statistician input.

1. Is the bootstrap appropriate for this statistic at all? The bootstrap fails for extrema (sample max, min, modes), for non-smooth statistics (change-points, mode estimates), for heavy-tailed populations with infinite variance, and for samples below about \(n = 20\). An LLM will produce a numerical answer in every case; the number will be misleading in these cases. Recognising which case you are in is the statistician’s job.

2. What is the dependence structure of the data? An iid bootstrap on time series, clustered data, or longitudinal repeated measures systematically underestimates variance. The confidence interval will be too narrow, sometimes dramatically so. Only you can detect the dependence in the data and specify a block bootstrap, a cluster bootstrap, or a parametric alternative that respects the structure.

3. Which confidence-interval method matches the statistic’s behaviour? The percentile method is transformation-respecting and easy to explain, but undercoverage is common in small samples. BCa is second-order accurate and preferred for publication. The studentised method has the best coverage but requires a bootstrap-within-bootstrap. An LLM typically defaults to the percentile method regardless of whether the bootstrap distribution is skewed; selecting a more appropriate method is the statistician’s responsibility.

4. Is the bootstrap distribution itself plausible? Always plot the histogram of the \(B\) replicates. If it is wildly skewed, truncated sharply at a boundary, or multimodal, either the bootstrap is failing for one of the reasons above or the sampling distribution is genuinely unusual. Both demand investigation; neither should be silently accepted as an answer.

In summary: the LLM provides an efficient implementation of the bootstrap. The statistician determines what gets implemented, on what data, and how to interpret the output. The remainder of this chapter provides the technical vocabulary needed to make those determinations.

9.5 The plug-in principle

The bootstrap rests on a simple and consequential idea. We do not know the true population distribution \(F\), but we do know the empirical distribution \(\hat F_n\), the distribution that places probability \(1/n\) on each observed data point. The plug-in principle says: to estimate any functional \(\theta = t(F)\), substitute \(\hat F_n\) for \(F\) and compute \(\hat\theta = t(\hat F_n)\). The sample mean, median, quantiles, variance, and correlation are all plug-in estimates.

For sampling variability, the plug-in goes one step further. The sampling distribution of \(\hat\theta - \theta\) under repeated draws from \(F\) is approximated by the distribution of \(\hat\theta^* - \hat\theta\) under repeated draws from \(\hat F_n\). The right-hand side is something we can compute by resampling. The left-hand side is what we actually want to know. The bootstrap is the computational substitution of one for the other.

The substitution is justified by the Glivenko-Cantelli theorem: \(\hat F_n\) converges uniformly to \(F\) as \(n\) grows. For smooth statistics, the higher-order accurate bootstrap intervals (BCa, studentised) achieve a coverage error of \(O(1/n)\), faster than the \(O(1/\sqrt n)\) rate of a central-limit-theorem interval. The plain percentile bootstrap is only first-order accurate (\(O(1/\sqrt n)\) coverage error), the same rate as the CLT interval; it is worth using primarily for its non-parametric applicability rather than for higher-order accuracy.

Q: What is the empirical distribution \(\hat F_n\) in the bootstrap context?

A: The discrete distribution that places probability \(1/n\) on each of the \(n\) observed data points in the original sample.

9.6 A simple nonparametric bootstrap

The nonparametric bootstrap is eight lines of base R:

# Original data
x <- c(5, 10, 15, 20, 25)
B <- 1000
boot_means <- numeric(B)
set.seed(47)
for (b in seq_len(B)) {
  resample       <- sample(x, size = length(x),
                           replace = TRUE)
  boot_means[b]  <- mean(resample)
}
sd(boot_means)   # bootstrap standard error

Three features of this loop define the nonparametric bootstrap:

  1. Same size. Each resample has the same length as the original sample.
  2. With replacement. Without replacement every resample would be a permutation of the original, with no new variability.
  3. Repeated \(B\) times. \(B\) is the number of Monte Carlo draws; 1000 is adequate for standard errors, 2000 or more for confidence intervals, 10,000 or more for extreme quantiles.

The vector boot_means is the bootstrap distribution of the sample mean. Its sample standard deviation is the bootstrap estimate of the standard error of \(\bar x\). Its quantiles give a percentile confidence interval.

Q: Why do we sample with replacement in the bootstrap?

A: Sampling with replacement creates variability between bootstrap samples. Without replacement, every resample would be a permutation of the original data, with no new information about sampling variability.

9.7 Bootstrap and sampling distributions

The bootstrap distribution and the sampling distribution are not the same thing; they differ in three ways.

First, centering. The sampling distribution is centered at the true parameter \(\theta\). The bootstrap distribution is centered at the observed estimate \(\hat\theta\). For this reason the bootstrap cannot correct a biased estimator: a biased \(\hat\theta\) produces a bootstrap distribution biased by the same amount in the same direction.

Second, support. The sampling distribution can in principle produce any value the true distribution can. The nonparametric bootstrap distribution is restricted to functions of the observed values. This shows up when estimating the maximum of a bounded distribution, where the bootstrap cannot produce values larger than the observed maximum.

Third, variability. The bootstrap distribution has slightly less spread than the sampling distribution, by a factor of roughly \(\sqrt{(n - 1)/n}\). The bias is negligible for large \(n\) but compounds with the other small-sample weaknesses of the percentile method discussed below.

The bootstrap does faithfully capture the shape of the sampling distribution, skewness, multimodality, and the boundary effects that arise near constrained parameters (such as a correlation coefficient near \(\pm 1\)). These shape diagnostics inform the choice of confidence-interval method.

Q: Can the bootstrap be used to improve our point estimate \(\hat\theta\)?

A: Generally, no. The bootstrap assesses the variability of \(\hat\theta\); it does not change the point estimate. The mean of the bootstrap distribution is approximately \(\hat\theta\) itself, and a biased \(\hat\theta\) produces a bootstrap distribution biased in the same direction.

9.8 Estimating standard errors

Let \(\hat\theta^*_1, \dots, \hat\theta^*_B\) denote the bootstrap replications and \(\bar\theta^* = B^{-1} \sum_b \hat\theta^*_b\) their mean. The bootstrap estimate of the standard error of \(\hat\theta\) is

\[ \widehat{\mathrm{SE}}_B(\hat\theta) = \sqrt{\frac{1}{B - 1} \sum_{b=1}^B (\hat\theta^*_b - \bar\theta^*)^2}. \]

In R, this is sd(boot_replicates). The principal value of the bootstrap SE is in cases where an analytic formula for \(\mathrm{SE}(\hat\theta)\) is unavailable or depends on assumptions the data do not satisfy: medians, trimmed means, correlation coefficients, and any custom statistic you can compute from a sample.

The Monte Carlo error of the bootstrap SE is approximately \(\widehat{\mathrm{SE}}_B / \sqrt{2B}\). With \(B = 1000\) this is roughly 2.2% of the bootstrap SE itself, which is negligible for most purposes.

Q: What statistical problems might become easier if we could repeatedly sample from the population?

A: We could empirically estimate standard errors, create confidence intervals without relying on parametric assumptions, and better understand sampling distributions for complex statistics. The bootstrap approximates this ideal using resampling from the observed data.

9.9 Confidence-interval flavours

Several bootstrap confidence-interval recipes exist, each trading simplicity for coverage accuracy. All assume \(B\) is large enough to estimate the required quantiles.

Percentile interval. The simplest: take the \(\alpha/2\) and \(1 - \alpha/2\) quantiles of the bootstrap distribution.

# Percentile CI by hand
quantile(boot_means, c(0.025, 0.975))

Transformation-respecting (a percentile interval on \(\log \hat\theta\) exponentiates to a percentile interval on \(\hat\theta\)), but first-order accurate only. Tends to undercover in small samples.

Basic interval. Reflects the bootstrap distribution around \(\hat\theta\):

\[ [2\hat\theta - q_{1 - \alpha/2},\ 2\hat\theta - q_{\alpha/2}], \]

where \(q_\alpha\) is the \(\alpha\) quantile of the bootstrap distribution. Corrects a subtle bias the percentile method inherits from centering at \(\hat\theta\), but loses the transformation-respecting property.

BCa interval. Bias-corrected and accelerated. Second-order accurate. Adjusts the percentile endpoints using a bias correction (read from the bootstrap distribution’s median) and an acceleration constant (estimated by jackknife). Available in boot::boot.ci(type = 'bca'). Preferred for publication-grade intervals.

Studentised interval. Bootstraps a pivot \((\hat\theta^* - \hat\theta) / \widehat{\mathrm{SE}}^*\), then inverts to an interval on \(\theta\). Second-order accurate and often has the best coverage, but requires a bootstrap-within-bootstrap computation to estimate the inner standard error.

In practice: reach for the percentile interval first; switch to BCa if the sample is small or the statistic is skewed; use the studentised interval only when the cost of the double bootstrap is justified.

9.10 Hypothesis testing by resampling

A confidence interval and a hypothesis test answer overlapping questions about a parameter, and the bootstrap supplies both. Hypothesis testing requires one extra construction: resampling must occur under conditions in which the null hypothesis is true. This null enforcement step is the load-bearing distinction between a bootstrap confidence interval and a bootstrap test.

For a two-sample mean comparison, \(H_0\!: \mu_A = \mu_B\), the standard construction is pool and center. Concatenate the two samples after centering each at its own sample mean; the pooled vector has mean zero by construction, and resampling from it yields two new groups whose populations have identical means. Compute the observed test statistic on the original data, repeat the calculation on each bootstrap replicate drawn from the centered pool, and report the proportion of replicates whose absolute statistic meets or exceeds the observed.

# Pool-and-center bootstrap test of H0: mu_A = mu_B
a <- sleep$extra[sleep$group == 1]
b <- sleep$extra[sleep$group == 2]
obs_diff <- mean(a) - mean(b)

pooled <- c(a - mean(a), b - mean(b))
B <- 10000
set.seed(47)
null_diffs <- replicate(B, {
  a_star <- sample(pooled, length(a), replace = TRUE)
  b_star <- sample(pooled, length(b), replace = TRUE)
  mean(a_star) - mean(b_star)
})
mean(abs(null_diffs) >= abs(obs_diff))

The two-sided p-value is mean(abs(null_diffs) >= abs(obs_diff)). For a one-sided alternative, drop the absolute values and adjust the comparison direction. Monte Carlo error on the p-value is approximately \(\sqrt{p(1 - p)/B}\), around \(0.002\) near \(p = 0.05\) when \(B = 10{,}000\).

Two comparisons clarify what the bootstrap test is doing. Permutation tests also generate a null distribution by shuffling group labels, but without replacement. Permutation tests are exact under exchangeability and are preferable when the assumption is defensible. The bootstrap test is approximate but extends naturally to settings where exchangeability fails: paired data, regression coefficients, or any statistic with a bootstrap implementation. Classical t-tests make stronger distributional assumptions: a pool-and-center bootstrap parallels the equal-variance two-sample t-test, while t.test() defaults to Welch’s unequal-variance test. The methods produce different p-values not because of Monte Carlo error but because they implement different null hypotheses.

The pool-and-center construction generalises. Testing a correlation against zero is achieved by resampling pairs \((x_i, y_i^*)\) where the \(y\) values are reshuffled to break the dependence. Testing a single regression coefficient against zero is achieved by bootstrapping the residuals of the reduced model and rebuilding the response under the null. The principle is constant: design a resampling scheme under which the null is true, then read off the tail probability of the observed statistic.

Q: Why must the bootstrap resampling step enforce the null hypothesis when conducting a test, while no such adjustment is needed for a confidence interval?

A: A confidence interval describes the variability of \(\hat\theta\) around the unknown population parameter; the bootstrap distribution centered at \(\hat\theta\) is exactly the reference distribution wanted. A hypothesis test asks whether the data are compatible with a specified null value \(\theta_0\), which requires the distribution of \(\hat\theta\) under \(\theta = \theta_0\). That in turn requires resampling from a population modified to satisfy the null.

9.11 Monte Carlo implementation

The theoretical bootstrap enumerates all \(n^n\) possible resamples with replacement, and for even modest \(n\) this is astronomical: \(n = 20\) produces more resamples than there are atoms in the observable universe. In practice we sample a manageable number \(B\) of resamples uniformly at random. This introduces a Monte Carlo error distinct from the statistical error of using a finite original sample.

The Monte Carlo error of a percentile confidence-interval endpoint is approximately \(\sqrt{\alpha(1 - \alpha)/B} \cdot \widehat{\mathrm{SE}}_B\). For a 95% interval with \(B = 1000\), this is about \(0.005 \cdot \widehat{\mathrm{SE}}_B\), an order of magnitude smaller than the statistical error for typical \(n\).

Rules of thumb: \(B = 200\) suffices for standard errors; \(B = 1000\) to \(2000\) for percentile or BCa confidence intervals; \(B = 10{,}000\) or more for extreme quantiles or p-values below \(0.01\). When in doubt, rerun with a different seed and confirm results are stable.

Q: What can happen if you use too few bootstrap samples (for example, \(B = 50\))?

A: Substantial Monte Carlo error. Standard-error estimates are unstable across seeds, and percentile confidence-interval endpoints are unreliable because a handful of bootstrap replicates anchor each tail quantile.

9.12 When the bootstrap fails

The bootstrap works well when the statistic is a smooth function of the data and the sample size is moderate to large. It breaks down in several recognisable patterns.

Extrema. The sample maximum of a bounded distribution cannot exceed the observed maximum in any bootstrap resample, so the bootstrap distribution has no mass above the observed max. Bootstrap CIs for extrema, modes, and other boundary statistics are unreliable.

Non-smooth statistics. Sample quantiles at extreme tails, modes, and change-points are not smooth functions of the empirical distribution; a small change in data can produce a large change in the estimate. The bootstrap distribution of a sample median, for instance, is discrete even for continuous data.

Small samples. For \(n < 20\) the empirical distribution is a poor estimate of the true distribution, especially in the tails. Parametric bootstrap (resampling from a fitted parametric model) or a normal approximation with bias correction may perform better.

Dependent data. An iid bootstrap on time series or clustered data destroys the dependence structure and underestimates variance. Use a block bootstrap for time series (resample contiguous blocks of length \(\ell\)) or a cluster bootstrap for hierarchical data (resample whole clusters, keeping within-cluster structure intact).

Heavy tails. When the population has infinite variance, the bootstrap distribution converges slowly or not at all. Sample means and regression coefficients from heavy-tailed data require trimming, robust statistics, or the parametric bootstrap with a heavy-tailed assumption.

Q: Why might bootstrap methods struggle with the sample median when the sample size is small (say, \(n = 5\))?

A: With small samples the bootstrap distribution of the median is discrete and limited to the observed values. This coarse approximation of the sampling distribution yields inaccurate standard errors and confidence intervals.

9.13 The boot package

For production use, the boot package (Canty and Ripley, based on Davison and Hinkley’s book) handles the resampling, parallelism, and confidence-interval calculations. The standard workflow has three steps.

library(boot)

# Step 1: define the statistic as a function of data
#         and indices.
median_stat <- function(data, indices) {
  median(data[indices])
}

# Step 2: run the bootstrap.
set.seed(47)
x <- c(10, 14, 18, 23, 27, 32, 38, 42, 52, 68)
b <- boot(data = x, statistic = median_stat, R = 2000)
print(b)

# Step 3: compute confidence intervals.
boot.ci(b, type = c('perc', 'basic', 'bca'))

The statistic function must accept (data, indices) and return the statistic computed on data[indices] (or data[indices, ] for a data frame). The boot() function handles the resampling via the indices, which is more memory-efficient than constructing each resample as a copy of the data. For stratified designs pass strata = group; for parallel execution set parallel = 'multicore' (Linux/macOS) or 'snow' (portable) with ncpus = parallel::detectCores() - 1.

For specialised bootstraps:

  • car::Boot() wraps boot() for regression models, handling case resampling, residual resampling, and wild bootstrap for heteroscedastic errors.
  • boot::tsboot() implements block bootstrap for time series.
  • rsample::bootstraps() integrates with the tidymodels ecosystem.

Q: What is the role of the indices parameter in the boot statistic function?

A: indices carries the positions of observations selected during each resample. Rather than constructing an actual bootstrap sample and passing it to the function, boot() passes only the indices; the function subsets the original data using data[indices]. This is more memory-efficient than copying the data for every replicate.

Q: In what scenarios would you prefer traditional methods over the bootstrap?

A: When sample sizes are very small, when the statistic has a known sampling distribution under assumptions that the data plausibly satisfy, or when computational resources are limited. Traditional methods also provide analytic insights (closed-form expressions, exact distributions) that numerical methods can obscure.

9.14 Bootstrapping regression coefficients

Linear regression provides a productive testbed for the bootstrap machinery. Two resampling strategies are standard, and a third is worth knowing for diagnostic-failure cases.

Case (paired) bootstrap. Resample whole rows of the data matrix with replacement, refit the model, and record the coefficient. This makes no assumption about the conditional distribution of \(y\) given \(x\) and remains valid under heteroscedasticity, because joint resampling of \((x, y)\) preserves whatever variance structure the data contain.

Residual bootstrap. Fit the model once, extract residuals \(e_i = y_i - \hat y_i\), resample residuals with replacement to obtain \(e_i^*\), and construct synthetic responses \(y_i^* = \hat y_i + e_i^*\). Refit the model on \((x_i, y_i^*)\). This strategy assumes residuals are exchangeable: their distribution does not depend on the predictors. When that assumption holds, residual bootstrap is more efficient than case bootstrap; when it fails, residual-bootstrap intervals are too narrow because they impose a constant-variance error structure the data lack.

Wild bootstrap. When residuals are clearly heteroscedastic but the conditional-mean structure is correct, set \(e_i^* = e_i v_i\) with \(v_i\) an independent random sign or a Mammen two-point variable. This preserves the conditional variance pattern while still generating a valid resampling distribution.

A worked case bootstrap for the slope of wt predicting mpg in mtcars:

library(boot)
set.seed(47)

slope_stat <- function(data, indices) {
  fit <- lm(mpg ~ wt, data = data[indices, ])
  coef(fit)[['wt']]
}

b_lm <- boot(mtcars, slope_stat, R = 2000)
boot.ci(b_lm, type = c('perc', 'bca'))

# Compare with the classical normal-theory interval
confint(lm(mpg ~ wt, data = mtcars))['wt', ]

The percentile and BCa intervals from the case bootstrap are slightly wider than the classical interval from confint(). The discrepancy is informative: a residuals-versus-fitted plot for this model shows variance increasing with fitted value, the mild heteroscedasticity that the classical interval ignores by assuming constant error variance. The bootstrap absorbs this through joint resampling and produces a more defensible interval.

The choice among strategies follows from the assumptions:

  • Use the case bootstrap when residual diagnostics suggest heteroscedasticity, when the model may be misspecified, or as a robust default in the absence of contrary information.
  • Use the residual bootstrap when residuals are clean and the conditional mean is well captured. The narrower intervals it produces are warranted only when the assumption is defensible.
  • Use the wild bootstrap when heteroscedasticity is severe but the mean structure is sound, particularly in small samples where case bootstrap may resample the same influential observation multiple times.

The same machinery extends to generalised linear models (case bootstrap on glm objects), random-effect models (cluster bootstrap that respects the grouping structure), and robust regression. The car::Boot() function introduced in the previous section handles several of these cases automatically.

Q: Suppose a residuals-versus-fitted plot for a regression model shows residual spread increasing with the fitted value. Which regression bootstrap would you choose, and why?

A: The case (paired) bootstrap, because it resamples whole \((x, y)\) rows and therefore preserves any heteroscedastic variance structure in the original data. The residual bootstrap imposes a single residual distribution that does not depend on \(x\) and would systematically underestimate variability in this setting.

9.15 Collaborating with an LLM on the bootstrap

Section Section 9.4 identified the four decisions that require the statistician’s judgment. This section provides practice exercises. Each prompt below is adversarial: it is designed to expose a specific LLM failure mode. The model’s response should be treated as a hypothesis requiring verification, not as a result to be accepted at face value.

9.15.1 Correlation and the Fisher-z transformation

Prompt. ‘Bootstrap a 95% confidence interval for the correlation between waiting and eruptions in the faithful dataset.’

What to watch for. A correlation is bounded in \([-1, 1]\), so its sampling distribution is asymmetric near the boundaries. A default nonparametric percentile bootstrap on the raw correlation is inefficient and can produce endpoints that fall outside \([-1, 1]\). The textbook remedy is to bootstrap on the Fisher-z transformation \(\tfrac{1}{2}\log\tfrac{1+r}{1-r}\) and then back-transform.

Verification. Inspect the code. Does it wrap the correlation in atanh() before computing quantiles and tanh() after? If not, compare interval widths against cor.test(). A noticeable discrepancy indicates the LLM skipped the transformation; ask it a follow-up on whether the percentile endpoints are guaranteed to stay in the valid range.

9.15.2 The bootstrap maximum

Prompt. ‘Write R code to compute a 95% percentile confidence interval for the maximum of a sample drawn from runif(50, 0, 1).’

What to watch for. The bootstrap fails for extrema. The bootstrap maximum cannot exceed the observed sample maximum, so the upper CI endpoint is always the observed max itself, a hard ceiling that is not a property of the true sampling distribution.

Verification. Run the code. Check whether the upper endpoint equals the sample maximum. Then ask the LLM: ‘Is this interval trustworthy? What does standard bootstrap theory say about statistics defined as extrema of the sample?’ See whether it recovers the failure mode on its own or insists the interval is valid.

9.15.3 Time-series mean

Prompt. Supply a ts object of monthly clinical biomarker observations over three years and ask: ‘Bootstrap the mean and a 95% confidence interval.’

What to watch for. Autocorrelated data violate the iid assumption. An iid bootstrap destroys the dependence structure and underestimates variance, typically producing an interval much narrower than the correct one.

Verification. Does the code call boot::tsboot() or a block-bootstrap function? Or does it call plain boot::boot()? If the latter, the answer is wrong. Compute acf(x) and confirm non-trivial autocorrelation, then redo the bootstrap with tsboot(x, ..., l = round(length(x)^(1/3)), sim = 'fixed'). Compare interval widths.

9.15.4 Principle in use

These three prompts exercise the four statistician’s contributions from Section 9.4. The bootstrap maximum and the time-series mean both test decision 1 (is the bootstrap appropriate for this statistic or this data?). The time-series mean additionally tests decision 2 (dependence structure). The correlation prompt tests decisions 3 and 4 (confidence-interval method and bootstrap-distribution shape). None of these verifications requires advanced skill; all require that you ask the question the LLM will not ask for you.

9.16 Exercises

  1. Using only base R, write bootstrap_ci(x, stat, R = 1000, conf = 0.95) that returns a percentile confidence interval for an arbitrary statistic of a numeric vector x.
  2. Apply your function to compute a 95% CI for the median of a right-skewed sample (e.g., rexp(50, 1)). Compare to the interval from boot::boot.ci(type = 'perc').
  3. Compute coverage of the percentile CI in a simulation with 2000 replicates. Does it cover at the nominal 95%? Explain any under-coverage.

9.17 Further reading

9.18 Practice test

The following multiple-choice questions exercise the chapter’s content. Attempt each question before expanding the answer.

9.18.1 Question 1

What is the primary purpose of the bootstrap method?

    1. To replace traditional statistical analysis completely
    1. To assess the accuracy of parameter estimates when closed-form standard errors are difficult to derive
    1. To create larger datasets from small samples
    1. To identify outliers in statistical datasets

B. The bootstrap provides approximate standard errors, bias, and confidence intervals by resampling.

9.18.2 Question 2

How does the bootstrap estimate the standard error of a statistic?

    1. By calculating the exact formula for every possible case
    1. By approximating the sampling distribution through repeatedly resampling with replacement from the observed data
    1. By assuming the data follows a normal distribution
    1. By comparing the statistic to previously established benchmarks

B. The bootstrap treats the empirical distribution of the data as a proxy for the true population distribution and resamples from it with replacement.

9.18.3 Question 3

Given \(B\) bootstrap replications of a statistic, how is the bootstrap standard error calculated?

    1. By taking the square root of the sample variance divided by \(n\)
    1. By computing the standard deviation of the \(B\) bootstrap replications
    1. By using a predefined formula based on the Central Limit Theorem
    1. By applying maximum likelihood estimation to the data

B. The bootstrap SE is the sample SD of the \(B\) bootstrap replicates.

9.18.4 Question 4

Why is Monte Carlo approximation typically used in bootstrap estimation rather than computing the exact bootstrap distribution?

    1. Because the exact bootstrap distribution requires evaluating all \(n^n\) possible resamples, which is computationally infeasible
    1. Because the Central Limit Theorem does not apply to bootstrap estimates
    1. Because the original dataset is always too small to be reliable
    1. Because exact enumeration produces biased results

A. Even for small \(n\), \(n^n\) explodes quickly. Monte Carlo samples a practical number \(B\) of resamples, controlling accuracy by choice of \(B\).

9.18.5 Question 5

How is the bootstrap estimate of bias for a statistic \(\hat\theta\) defined?

    1. The difference between the mean of the bootstrap replications and the original estimate \(\hat\theta\)
    1. The standard deviation of the bootstrap replications
    1. The skewness of the bootstrap distribution
    1. The difference between the median and mean of the bootstrap replications

A. Bootstrap bias is \(\bar\theta^* - \hat\theta\), where \(\bar\theta^*\) is the mean of bootstrap replicates.

9.19 Prerequisites answers

  1. The bootstrap assesses the sampling distribution of a statistic (its standard error, bias, and confidence interval) when a closed-form expression is unavailable, complicated, or depends on distributional assumptions the data do not satisfy.
  2. The bootstrap standard error is the sample standard deviation of the \(B\) bootstrap replications: \(\widehat{\mathrm{SE}}_B(\hat\theta) = \sqrt{\tfrac{1}{B-1} \sum_{b=1}^B (\hat\theta^*_b - \bar\theta^*)^2}\).
  3. The exact bootstrap distribution requires evaluating the statistic on all \(n^n\) possible resamples (with replacement from \(n\) observations), which is computationally infeasible for any meaningful \(n\). Monte Carlo approximation draws a manageable number \(B\) of resamples at random.