Part IV: Advanced Topics | Chapter 1

Time Series Analysis

Modeling temporal dependence, spectral decomposition, and state space methods

Historical Context

Time series analysis has roots stretching back to the 18th century, when astronomers sought to extract periodic signals from noisy observations. Fourier's 1807 work on heat conduction introduced harmonic decomposition, which Schuster later applied to meteorological data through the periodogram in 1898. The autoregressive model was formalized by Yule in 1927 to study sunspot cycles, and Walker extended it to moving averages for monsoon prediction.

The modern era began with Box and Jenkins, whose 1970 monograph systematized the ARIMA methodology into an iterative identification-estimation-diagnostic framework that remains a cornerstone of applied time series analysis. Simultaneously, Kalman's 1960 recursive filtering algorithm revolutionized engineering and econometrics by providing optimal state estimation in linear dynamical systems. Spectral methods matured through the work of Wiener, Kolmogorov, and Blackman-Tukey, connecting the time and frequency domains via the spectral representation theorem.

1.1 Stationarity and Autocorrelation

A time series $\{X_t\}_{t \in \mathbb{Z}}$ is a collection of random variables indexed by time. The fundamental concept enabling tractable analysis is stationarity, which asserts that the statistical properties of the process do not change over time.

Definition: Strict Stationarity

A process $\{X_t\}$ is strictly stationary if for all$k, t_1, \ldots, t_k$ and $h$, the joint distribution of$(X_{t_1+h}, \ldots, X_{t_k+h})$ equals that of $(X_{t_1}, \ldots, X_{t_k})$.

Definition: Weak (Second-Order) Stationarity

A process is weakly stationary (or covariance stationary) if:

(i) $\mathbb{E}[X_t] = \mu$ is constant, (ii) $\text{Var}(X_t) < \infty$, (iii) $\text{Cov}(X_t, X_{t+h}) = \gamma(h)$ depends only on the lag $h$.

The autocovariance function (ACVF) is $\gamma(h) = \text{Cov}(X_t, X_{t+h})$, and the autocorrelation function (ACF) is:

$$\rho(h) = \frac{\gamma(h)}{\gamma(0)} = \frac{\text{Cov}(X_t, X_{t+h})}{\text{Var}(X_t)}$$

The partial autocorrelation function (PACF) at lag $h$ measures the correlation between $X_t$ and $X_{t+h}$ after removing the linear effect of $X_{t+1}, \ldots, X_{t+h-1}$. Formally, the PACF$\phi_{hh}$ is the last coefficient in the best linear predictor of$X_{t+h}$ from $X_t, \ldots, X_{t+h-1}$. The ACF and PACF together serve as the primary diagnostic tools for model identification: an AR($p$) process has PACF that cuts off after lag $p$, while an MA($q$) process has ACF that cuts off after lag $q$.

Key Property: Positive Semi-Definiteness

A function $\gamma: \mathbb{Z} \to \mathbb{R}$ is a valid autocovariance function if and only if it is even ($\gamma(h) = \gamma(-h)$) and positive semi-definite:$\sum_{i=1}^n \sum_{j=1}^n a_i a_j \gamma(t_i - t_j) \geq 0$ for all$n, a_1, \ldots, a_n \in \mathbb{R}$ and $t_1, \ldots, t_n \in \mathbb{Z}$.

1.2 Autoregressive, Moving Average, and ARMA Models

The building blocks of linear time series models are the autoregressive (AR) and moving average (MA) components.

Definition: AR(p) Process

An autoregressive process of order $p$ satisfies:

$$X_t = \phi_1 X_{t-1} + \phi_2 X_{t-2} + \cdots + \phi_p X_{t-p} + \varepsilon_t$$

where $\varepsilon_t \sim \text{WN}(0, \sigma^2)$ is white noise. Using the backshift operator $B$, this is $\phi(B) X_t = \varepsilon_t$ where$\phi(z) = 1 - \phi_1 z - \cdots - \phi_p z^p$. The process is causal (stationary) if all roots of $\phi(z) = 0$ lie outside the unit circle.

Definition: MA(q) Process

A moving average process of order $q$ is:

$$X_t = \varepsilon_t + \theta_1 \varepsilon_{t-1} + \cdots + \theta_q \varepsilon_{t-q} = \theta(B)\varepsilon_t$$

The MA($q$) process is always stationary. It is invertible if all roots of $\theta(z) = 0$ lie outside the unit circle, enabling a unique AR($\infty$) representation.

Definition: ARMA(p, q) Process

The general linear model combines both components:

$$\phi(B) X_t = \theta(B) \varepsilon_t$$

The ARMA($p,q$) is causal if $\phi(z) \neq 0$ for $|z| \leq 1$and invertible if $\theta(z) \neq 0$ for $|z| \leq 1$.

The Yule-Walker equations provide a direct link between the AR parameters and the autocovariance function. For an AR($p$) process:

$$\begin{pmatrix} \gamma(0) & \gamma(1) & \cdots & \gamma(p-1) \\ \gamma(1) & \gamma(0) & \cdots & \gamma(p-2) \\ \vdots & \vdots & \ddots & \vdots \\ \gamma(p-1) & \gamma(p-2) & \cdots & \gamma(0) \end{pmatrix} \begin{pmatrix} \phi_1 \\ \phi_2 \\ \vdots \\ \phi_p \end{pmatrix} = \begin{pmatrix} \gamma(1) \\ \gamma(2) \\ \vdots \\ \gamma(p) \end{pmatrix}$$

These equations can be solved efficiently via the Levinson-Durbin recursion in$O(p^2)$ operations, which simultaneously computes the PACF at all lags up to $p$.

1.3 ARIMA and Seasonal Models (Box-Jenkins)

Many real-world time series exhibit trends and non-stationarity. The ARIMA framework handles this by differencing. An ARIMA($p, d, q$) model applies$d$ differences before fitting an ARMA($p, q$):

$$\phi(B)(1 - B)^d X_t = \theta(B)\varepsilon_t$$

The Box-Jenkins methodology proceeds in three iterative stages:

1. Identification: Examine ACF/PACF of the (differenced) series to determine tentative orders $p$ and $q$. Unit root tests (ADF, KPSS) determine $d$.

2. Estimation: Estimate parameters via maximum likelihood or conditional least squares. The exact likelihood for Gaussian ARMA is computed via the innovations algorithm or Kalman filter.

3. Diagnostics: Check residuals for white noise using the Ljung-Box test: $Q = n(n+2) \sum_{k=1}^{m} \hat{\rho}_k^2 / (n-k)$, which is asymptotically $\chi^2(m - p - q)$.

For data with seasonal patterns of period $s$, the seasonal ARIMA (SARIMA) model is written as ARIMA$(p,d,q) \times (P,D,Q)_s$:

$$\phi(B)\Phi(B^s)(1-B)^d(1-B^s)^D X_t = \theta(B)\Theta(B^s)\varepsilon_t$$

Here $\Phi$ and $\Theta$ are the seasonal AR and MA polynomials. The seasonal differencing operator $(1 - B^s)^D$ removes seasonal non-stationarity. Information criteria (AIC, BIC) are commonly used for model selection among candidate orders.

1.4 Spectral Analysis

The frequency domain provides a complementary view of time series. The fundamental result is Herglotz's theorem: a function $\gamma(h)$ is a valid autocovariance function if and only if it can be represented as:

$$\gamma(h) = \int_{-\pi}^{\pi} e^{i\omega h} \, dF(\omega)$$

where $F$ is a bounded, non-decreasing function called the spectral distribution function. If $F$ is absolutely continuous, its density$f(\omega) = dF/d\omega$ is the power spectral density (PSD):

$$f(\omega) = \frac{1}{2\pi} \sum_{h=-\infty}^{\infty} \gamma(h) e^{-i\omega h}, \qquad \omega \in [-\pi, \pi]$$

The Periodogram

The sample analogue of the PSD is the periodogram:

$$I(\omega_j) = \frac{1}{n}\left|\sum_{t=1}^{n} X_t e^{-i\omega_j t}\right|^2$$

evaluated at Fourier frequencies $\omega_j = 2\pi j / n$. The periodogram is an asymptotically unbiased but inconsistent estimator of $f(\omega)$. Smoothing via Welch's method or multitaper estimates yields consistent spectral estimates.

For an ARMA($p,q$) process, the spectral density takes the rational form:

$$f(\omega) = \frac{\sigma^2}{2\pi} \cdot \frac{|\theta(e^{-i\omega})|^2}{|\phi(e^{-i\omega})|^2}$$

Peaks in the spectral density correspond to dominant periodicities in the data. The spectral density of white noise is flat at $\sigma^2 / (2\pi)$, reflecting equal power at all frequencies.

1.5 State Space Models and the Kalman Filter

State space models provide a unifying framework that encompasses ARMA models, structural time series models, and dynamic factor models. The linear Gaussian state space model is:

State Space Representation

$$\alpha_{t+1} = T_t \alpha_t + R_t \eta_t, \qquad \eta_t \sim N(0, Q_t) \quad \text{(state equation)}$$

$$y_t = Z_t \alpha_t + \varepsilon_t, \qquad \varepsilon_t \sim N(0, H_t) \quad \text{(observation equation)}$$

Here $\alpha_t$ is the unobserved state vector, $y_t$ is the observation, and $T_t, R_t, Z_t$ are system matrices.

The Kalman filter recursively computes the optimal (minimum mean squared error) estimate of the state given observations up to time $t$:

Kalman Filter Recursion

Prediction step:

$$\hat{\alpha}_{t|t-1} = T_t \hat{\alpha}_{t-1|t-1}, \qquad P_{t|t-1} = T_t P_{t-1|t-1} T_t' + R_t Q_t R_t'$$

Update step:

$$v_t = y_t - Z_t \hat{\alpha}_{t|t-1}, \qquad F_t = Z_t P_{t|t-1} Z_t' + H_t$$

$$K_t = P_{t|t-1} Z_t' F_t^{-1} \qquad \text{(Kalman gain)}$$

$$\hat{\alpha}_{t|t} = \hat{\alpha}_{t|t-1} + K_t v_t, \qquad P_{t|t} = (I - K_t Z_t) P_{t|t-1}$$

The Kalman filter also provides the prediction error decomposition of the log-likelihood: $\log L = -\frac{n}{2}\log(2\pi) - \frac{1}{2}\sum_{t=1}^n \left(\log|F_t| + v_t' F_t^{-1} v_t\right)$, enabling maximum likelihood estimation of unknown system parameters. The Kalman smoother extends the filter to compute$\hat{\alpha}_{t|n}$ using all observations, via a backward recursion.

1.6 Computational Lab

We simulate AR and MA processes, compute ACF/PACF, estimate the power spectral density, and implement a Kalman filter for local level tracking.

Time Series: AR/MA Simulation, Spectral Analysis, and Kalman Filter

Python

script.py154 lines

import numpy as np
np.random.seed(42)

# =============================================
# 1. Simulate AR(2) and MA(2) processes
# =============================================
n = 500
eps = np.random.normal(0, 1, n + 100)

# AR(2): X_t = 0.6 X_{t-1} - 0.3 X_{t-2} + eps_t
phi1, phi2 = 0.6, -0.3
x_ar = np.zeros(n + 100)
for t in range(2, n + 100):
    x_ar[t] = phi1 * x_ar[t-1] + phi2 * x_ar[t-2] + eps[t]
x_ar = x_ar[100:]

# MA(2): X_t = eps_t + 0.5 eps_{t-1} - 0.3 eps_{t-2}
theta1, theta2 = 0.5, -0.3
eps2 = np.random.normal(0, 1, n + 100)
x_ma = np.zeros(n + 100)
for t in range(2, n + 100):
    x_ma[t] = eps2[t] + theta1 * eps2[t-1] + theta2 * eps2[t-2]
x_ma = x_ma[100:]

# Compute sample ACF
def sample_acf(x, max_lag=20):
    n = len(x)
    xbar = np.mean(x)
    c0 = np.sum((x - xbar)**2) / n
    acf_vals = []
    for h in range(max_lag + 1):
        ch = np.sum((x[:n-h] - xbar) * (x[h:] - xbar)) / n
        acf_vals.append(ch / c0)
    return np.array(acf_vals)

# Compute sample PACF via Levinson-Durbin
def sample_pacf(x, max_lag=20):
    acf_vals = sample_acf(x, max_lag)
    pacf_vals = [1.0]
    for k in range(1, max_lag + 1):
        # Solve Yule-Walker for order k
        R = np.array([[acf_vals[abs(i-j)] for j in range(k)] for i in range(k)])
        r = np.array([acf_vals[i+1] for i in range(k)])
        try:
            phi = np.linalg.solve(R, r)
            pacf_vals.append(phi[-1])
        except np.linalg.LinAlgError:
            pacf_vals.append(0.0)
    return np.array(pacf_vals)

acf_ar = sample_acf(x_ar, 15)
pacf_ar = sample_pacf(x_ar, 15)
acf_ma = sample_acf(x_ma, 15)
pacf_ma = sample_pacf(x_ma, 15)

print("=== AR(2) Process: phi1=0.6, phi2=-0.3 ===")
print("Lag   ACF       PACF")
for h in range(1, 16):
    print("  %2d   %7.4f   %7.4f" % (h, acf_ar[h], pacf_ar[h]))

print("")
print("=== MA(2) Process: theta1=0.5, theta2=-0.3 ===")
print("Lag   ACF       PACF")
for h in range(1, 16):
    print("  %2d   %7.4f   %7.4f" % (h, acf_ma[h], pacf_ma[h]))

# =============================================
# 2. Spectral analysis via periodogram
# =============================================
print("")
print("=== Spectral Analysis of AR(2) ===")

# Compute periodogram
from numpy.fft import fft
n_fft = len(x_ar)
X_fft = fft(x_ar - np.mean(x_ar))
periodogram = (np.abs(X_fft[:n_fft//2+1])**2) / n_fft
freqs = np.arange(n_fft//2+1) / n_fft * 2 * np.pi

# Theoretical PSD for AR(2)
sigma2 = 1.0
psd_theo = []
for w in freqs:
    phi_eiw = 1 - phi1 * np.exp(-1j*w) - phi2 * np.exp(-2j*w)
    psd_theo.append(sigma2 / (2 * np.pi * np.abs(phi_eiw)**2))
psd_theo = np.array(psd_theo)

# Report at selected frequencies
print("Freq(rad)   Periodogram   Theoretical PSD")
idx_sel = np.linspace(1, len(freqs)-1, 12, dtype=int)
for i in idx_sel:
    print("  %5.3f      %8.4f       %8.4f" % (freqs[i], periodogram[i], psd_theo[i]))

peak_freq = freqs[np.argmax(psd_theo[1:])+1]
print("Peak theoretical PSD at frequency: %.4f rad (period ~ %.1f)" % (peak_freq, 2*np.pi/peak_freq if peak_freq > 0 else float('inf')))

# =============================================
# 3. Kalman filter for local level model
# =============================================
print("")
print("=== Kalman Filter: Local Level Model ===")

# Generate local level model data
# State: alpha_{t+1} = alpha_t + eta_t, eta ~ N(0, q)
# Obs:   y_t = alpha_t + eps_t,  eps ~ N(0, r)
n_kf = 200
q_true = 0.1   # state noise variance
r_true = 1.0   # observation noise variance
alpha_true = np.zeros(n_kf)
alpha_true[0] = 0.0
for t in range(1, n_kf):
    alpha_true[t] = alpha_true[t-1] + np.sqrt(q_true) * np.random.normal()
y_obs = alpha_true + np.sqrt(r_true) * np.random.normal(size=n_kf)

# Kalman filter
alpha_filt = np.zeros(n_kf)
P_filt = np.zeros(n_kf)
alpha_pred = np.zeros(n_kf)
P_pred = np.zeros(n_kf)

# Initialize
alpha_filt[0] = 0.0
P_filt[0] = 10.0

for t in range(1, n_kf):
    # Predict
    alpha_pred[t] = alpha_filt[t-1]
    P_pred[t] = P_filt[t-1] + q_true
    # Update
    v_t = y_obs[t] - alpha_pred[t]
    F_t = P_pred[t] + r_true
    K_t = P_pred[t] / F_t
    alpha_filt[t] = alpha_pred[t] + K_t * v_t
    P_filt[t] = (1 - K_t) * P_pred[t]

# Evaluate performance
mse_filter = np.mean((alpha_filt - alpha_true)**2)
mse_obs = np.mean((y_obs - alpha_true)**2)
avg_P = np.mean(P_filt[10:])

print("MSE of Kalman filter:   %.4f" % mse_filter)
print("MSE of raw observations: %.4f" % mse_obs)
print("Improvement ratio:       %.2fx" % (mse_obs / mse_filter))
print("Avg filter uncertainty:  %.4f" % avg_P)
print("Steady-state Kalman gain: %.4f" % (P_filt[-1] / (P_filt[-1] + r_true)))

# Show some filtered vs true values
print("")
print("  t    True     Obs      Filtered   |Err_filt|  |Err_obs|")
for t in [10, 50, 100, 150, 199]:
    ef = abs(alpha_filt[t] - alpha_true[t])
    eo = abs(y_obs[t] - alpha_true[t])
    print("  %3d  %7.3f  %7.3f  %7.3f    %6.4f     %6.4f" % (t, alpha_true[t], y_obs[t], alpha_filt[t], ef, eo))

Click Run to execute the Python code

Code will be executed with Python 3 on the server

1.7 Summary and Key Takeaways

Stationarity

Weak stationarity (constant mean, lag-dependent autocovariance) is the foundational assumption enabling consistent estimation of ACF and PACF, the primary model identification tools.

ARMA Models

AR($p$) models have PACF cutoff at lag $p$; MA($q$) models have ACF cutoff at lag $q$. ARMA combines both for parsimonious modeling of complex autocorrelation structures.

Box-Jenkins Methodology

The iterative identify-estimate-diagnose cycle, extended to SARIMA for seasonal data, remains the standard approach for applied time series modeling.

Spectral Analysis

The power spectral density decomposes variance by frequency. The periodogram is asymptotically unbiased but requires smoothing for consistency.

Kalman Filter

The Kalman filter is the optimal recursive estimator for linear Gaussian state space models, providing both filtered estimates and the prediction error decomposition of the likelihood.

← Model Selection Multivariate Analysis →

Share:X Reddit LinkedIn