Nonparametric Methods
Distribution-free inference, resampling methods, and flexible estimation
Historical Context
Nonparametric statistics arose from dissatisfaction with the strong distributional assumptions required by classical parametric methods. Frank Wilcoxon introduced his signed-rank test in 1945, and Henry Mann and Donald Whitney developed the rank-sum test independently in 1947, providing distribution-free alternatives to the t-test. These rank-based methods drew on the insight that the ranks of observations carry substantial information about location differences while being invariant to the underlying distribution.
Kernel density estimation was formalized by Murray Rosenblatt in 1956 and Emanuel Parzen in 1962, providing a smooth nonparametric alternative to histograms. The bootstrap, introduced by Bradley Efron in 1979, revolutionized statistical inference by showing that resampling from the data itself could approximate the sampling distribution of virtually any statistic. Permutation tests, rooted in Fisher's exact test from the 1930s, were made practical by modern computing. Together, these methods provide a flexible toolkit that makes minimal assumptions about the data-generating process.
3.1 Order Statistics and Rank Tests
Given a sample $X_1, \ldots, X_n$, the order statistics$X_{(1)} \leq X_{(2)} \leq \cdots \leq X_{(n)}$ form the sorted sample. The rank of $X_i$ is $R_i = \#\{j : X_j \leq X_i\}$. Under the null hypothesis of exchangeability, all $n!$ permutations of ranks are equally likely, which provides exact null distributions for rank-based tests.
Wilcoxon Signed-Rank Test
For paired differences $D_i = X_i - Y_i$, testing $H_0$: the distribution of $D_i$ is symmetric about zero. Compute:
where $R_i^*$ is the rank of $|D_i|$ among$|D_1|, \ldots, |D_n|$. Under $H_0$,$\mathbb{E}[W^+] = n(n+1)/4$ and$\text{Var}(W^+) = n(n+1)(2n+1)/24$.
Mann-Whitney U Test (Wilcoxon Rank-Sum)
For two independent samples of sizes $m$ and $n$, the test statistic is based on the sum of ranks of the first sample in the combined ranking:
$U$ counts the number of pairs $(X_i, Y_j)$ where$X_i > Y_j$. Under $H_0$,$\mathbb{E}[U] = mn/2$. The Mann-Whitney test is the most powerful rank test for detecting location shifts and has asymptotic relative efficiency$3/\pi \approx 0.955$ relative to the t-test under normality, and can be much more efficient under heavy-tailed distributions.
3.2 Kernel Density Estimation
The kernel density estimator (KDE) provides a smooth, nonparametric estimate of a probability density function from a random sample.
Definition: Kernel Density Estimator
Given data $X_1, \ldots, X_n$, the KDE with kernel $K$and bandwidth $h > 0$ is:
where $K$ is a non-negative function integrating to 1 (typically the Gaussian kernel $K(u) = (2\pi)^{-1/2}e^{-u^2/2}$).
The bandwidth $h$ controls the bias-variance tradeoff. The mean integrated squared error (MISE) decomposes as:
The asymptotically optimal bandwidth minimizing MISE is:
where $R(g) = \int g^2$ and $\mu_2(K) = \int u^2 K(u) \, du$. Silverman's rule of thumb uses a Gaussian reference:$h = 1.06 \hat{\sigma} n^{-1/5}$. Cross-validation methods (least-squares or likelihood) provide data-driven bandwidth selection without reference distribution assumptions.
3.3 Bootstrap Methods
The bootstrap principle approximates the sampling distribution of a statistic$T_n = t(X_1, \ldots, X_n)$ by resampling from the empirical distribution$\hat{F}_n$.
The Bootstrap Algorithm
1. Draw $X_1^*, \ldots, X_n^*$ iid from$\hat{F}_n$ (sample with replacement).
2. Compute $T_n^* = t(X_1^*, \ldots, X_n^*)$.
3. Repeat $B$ times to obtain$T_n^{*(1)}, \ldots, T_n^{*(B)}$. The empirical distribution of$T_n^{*(b)} - T_n$ approximates the distribution of $T_n - \theta$.
The percentile method uses quantiles of $T_n^{*(b)}$ directly as confidence limits. The basic bootstrap inverts the percentile method using $2T_n - q^*_{1-\alpha/2}$. The most refined approach is the BCa (bias-corrected and accelerated) method:
BCa Confidence Interval
The BCa interval adjusts the percentile boundaries using a bias correction$\hat{z}_0$ and an acceleration constant $\hat{a}$:
where $\hat{z}_0 = \Phi^{-1}(\#\{T_n^{*(b)} < T_n\}/B)$ corrects for median bias, and $\hat{a} = \sum_i \ell_i^3 / (6(\sum_i \ell_i^2)^{3/2})$uses jackknife influence values $\ell_i$ to correct for skewness. BCa intervals are second-order accurate and transformation-respecting.
3.4 Permutation Tests
Permutation tests derive exact (or approximate via Monte Carlo) p-values by exploiting the exchangeability of observations under the null hypothesis. Unlike rank tests, they can use any test statistic.
Permutation Test Framework
Given two samples $\mathbf{x} = (x_1, \ldots, x_m)$ and$\mathbf{y} = (y_1, \ldots, y_n)$, testing $H_0$: the two samples come from the same distribution.
1. Compute the observed statistic $T_{\text{obs}} = T(\mathbf{x}, \mathbf{y})$(e.g., difference in means).
2. Pool all $N = m + n$ observations and consider all $\binom{N}{m}$ways to partition them into groups of size $m$ and $n$.
3. The p-value is $p = P(T \geq T_{\text{obs}} \mid H_0) = \#\{T_\pi \geq T_{\text{obs}}\} / \binom{N}{m}$.
When $\binom{N}{m}$ is too large for exact enumeration, a Monte Carlo approximation draws $B$ random permutations and estimates$\hat{p} = (\#\{T_\pi^{(b)} \geq T_{\text{obs}}\} + 1) / (B + 1)$. The $+1$ in numerator and denominator ensures the p-value is never zero and accounts for the identity permutation. Permutation tests control the Type I error exactly at level $\alpha$ under exchangeability, without any distributional assumptions. They generalize naturally to multivariate settings, correlation tests, and complex experimental designs.
3.5 Nonparametric Regression
Nonparametric regression estimates the function $m(x) = \mathbb{E}[Y \mid X = x]$without imposing a parametric form. The Nadaraya-Watson kernel regression estimator is:
This is a weighted average of $Y_i$ values, with weights proportional to the kernel evaluated at the distance from $x$ to $X_i$. Local polynomial regression generalizes this by fitting a polynomial in a neighborhood of each prediction point, reducing boundary bias.
Regression Splines
A cubic spline with knots $\xi_1 < \cdots < \xi_K$ is a piecewise cubic polynomial that is continuous with continuous first and second derivatives at each knot. The truncated power basis representation is:
where $(x - \xi_k)_+ = \max(0, x - \xi_k)$. The smoothing splineminimizes $\sum_{i=1}^n (Y_i - s(X_i))^2 + \lambda \int s''(x)^2 \, dx$, where $\lambda \geq 0$ controls the smoothness. The solution is a natural cubic spline with knots at all data points, and the effective degrees of freedom$\text{df}(\lambda) = \text{tr}(\mathbf{S}_\lambda)$ decreases from$n$ to 2 as $\lambda$ increases from 0 to $\infty$.
3.6 Computational Lab
We implement kernel density estimation with multiple bandwidths, bootstrap confidence intervals, a permutation test for two samples, and kernel regression.
Nonparametric Methods: KDE, Bootstrap, Permutation Tests, and Kernel Regression
PythonClick Run to execute the Python code
Code will be executed with Python 3 on the server
3.7 Summary and Key Takeaways
Rank Tests
The Wilcoxon and Mann-Whitney tests provide distribution-free inference for location differences, with nearly the efficiency of parametric tests under normality and superior efficiency under heavy tails.
Kernel Density Estimation
KDE smoothly estimates densities with the bandwidth controlling the bias-variance tradeoff. The optimal rate is $O(n^{-4/5})$ for MISE, achieved by Silverman's rule or cross-validation.
Bootstrap
Resampling from the empirical distribution approximates the sampling distribution of any statistic. BCa intervals provide second-order accurate, transformation-respecting confidence intervals.
Permutation Tests
Exact p-values under exchangeability, using any test statistic. Monte Carlo approximation makes them practical for any sample size.
Nonparametric Regression
Kernel regression and splines estimate the conditional mean without parametric assumptions, with smoothing parameters controlling complexity.