10.4 Ocean Data Analysis
Modern oceanography generates massive datasets in formats like netCDF and HDF5, served via OPeNDAP. Analysis techniques span classical statistics (EOF/PCA, spectral analysis, optimal interpolation), time series methods, and increasingly machine learning—all essential for extracting physical insight from observations, reanalyses, and model output.
EOF / PCA Analysis
Empirical Orthogonal Functions (EOFs) decompose a space-time dataset into orthogonal spatial patterns and their temporal coefficients (Principal Components). Given a data matrix$\mathbf{X}$ (time × space), the anomaly covariance matrix is:
$$\mathbf{C} = \frac{1}{n-1} \mathbf{X}'^T \mathbf{X}', \quad \mathbf{C} \mathbf{e}_k = \lambda_k \mathbf{e}_k$$
$\mathbf{X}' = \mathbf{X} - \bar{\mathbf{X}}$ (anomalies),$\mathbf{e}_k$ = EOF pattern, $\lambda_k$ = eigenvalue (variance explained)
The principal component time series is: $\text{PC}_k(t) = \mathbf{X}'(t) \cdot \mathbf{e}_k$. The fraction of total variance explained by mode $k$ is:
$$f_k = \frac{\lambda_k}{\sum_j \lambda_j} \times 100\%$$
EOF1 of tropical Pacific SST captures ENSO (~50% of variance)
Spectral & Wavelet Analysis
Fourier spectral analysis identifies periodicities in ocean time series (tidal harmonics, seasonal cycles, ENSO). The discrete Fourier transform of a time series $x(t_n)$ is:
$$X(f_k) = \sum_{n=0}^{N-1} x(t_n) \, e^{-2\pi i f_k t_n}, \quad S(f_k) = \frac{2|\hat{X}(f_k)|^2}{N \Delta t}$$
$S(f_k)$ = power spectral density; peaks indicate dominant periodicities
For non-stationary signals (e.g., ENSO whose period and amplitude vary over time), wavelet analysis provides time-frequency localization. The continuous wavelet transform using a Morlet mother wavelet $\psi_0(\eta) = \pi^{-1/4} e^{i\omega_0 \eta} e^{-\eta^2/2}$ is:
$$W(s, \tau) = \frac{1}{\sqrt{s}} \sum_{n=0}^{N-1} x(t_n) \, \psi^*\!\left(\frac{t_n - \tau}{s}\right)$$
$s$ = scale (related to period), $\tau$ = time shift
Optimal Interpolation (Objective Analysis)
Optimal interpolation (OI) maps irregularly-spaced ocean observations onto a regular grid by minimizing the expected mean-squared error. The analysis at grid point $\mathbf{x}_0$ is:
$$\hat{\phi}(\mathbf{x}_0) = \bar{\phi} + \mathbf{c}^T (\mathbf{C} + \epsilon^2 \mathbf{I})^{-1} (\mathbf{d} - \bar{\phi})$$
$\mathbf{c}$ = covariance between grid point and observations,$\mathbf{C}$ = observation-observation covariance matrix,$\epsilon^2$ = noise variance
The background covariance is typically modeled as a Gaussian function of distance:$C(r) = \sigma^2 \exp(-r^2 / (2L^2))$, where $L$ is the decorrelation length scale (~100–300 km for mesoscale SST). The analysis error variance is:
$$\sigma_a^2 = \sigma^2 - \mathbf{c}^T (\mathbf{C} + \epsilon^2 \mathbf{I})^{-1} \mathbf{c}$$
Error is reduced where observations are dense and nearby
Derivation: EOF/PCA from the Covariance Matrix
Step 1: Form the Anomaly Matrix
Given a data matrix $\mathbf{X}$ of dimensions $n \times p$ ($n$ time steps, $p$ spatial locations), remove the temporal mean at each location to form the anomaly matrix:
$$\mathbf{X}' = \mathbf{X} - \mathbf{1}\bar{\mathbf{x}}^T, \quad \bar{x}_j = \frac{1}{n}\sum_{i=1}^n X_{ij}$$
Step 2: Compute the Covariance Matrix
The $p \times p$ spatial covariance matrix is:
$$\mathbf{C} = \frac{1}{n-1}\mathbf{X}'^T\mathbf{X}'$$
$C_{jk}$ gives the covariance between locations $j$ and $k$. The matrix is symmetric and positive semi-definite.
Step 3: Eigenvalue Problem
Find the eigenvectors $\mathbf{e}_k$ (EOF patterns) and eigenvalues $\lambda_k$ of $\mathbf{C}$:
$$\mathbf{C}\mathbf{e}_k = \lambda_k \mathbf{e}_k, \quad \mathbf{e}_k^T \mathbf{e}_l = \delta_{kl}$$
The EOFs are orthonormal: each represents an independent spatial pattern of variability.
Step 4: Optimality Property
EOF1 is the unit vector that maximizes the projected variance. Using the method of Lagrange multipliers with constraint $\|\mathbf{e}\| = 1$:
$$\max_{\mathbf{e}} \; \mathbf{e}^T \mathbf{C} \mathbf{e} \quad \text{s.t.} \quad \mathbf{e}^T\mathbf{e} = 1$$
$$\nabla_{\mathbf{e}}\left[\mathbf{e}^T\mathbf{C}\mathbf{e} - \lambda(\mathbf{e}^T\mathbf{e} - 1)\right] = 2\mathbf{C}\mathbf{e} - 2\lambda\mathbf{e} = 0 \quad \Rightarrow \quad \mathbf{C}\mathbf{e} = \lambda\mathbf{e}$$
Step 5: Principal Component Time Series
Project the anomaly data onto EOF pattern $k$ to obtain the $k$-th principal component (PC) time series:
$$\text{PC}_k(t) = \mathbf{X}'(t,:) \cdot \mathbf{e}_k = \sum_{j=1}^{p} X'_{tj} \, e_{k,j}$$
The PCs are uncorrelated: $\text{Cov}(\text{PC}_k, \text{PC}_l) = \lambda_k \delta_{kl}$. The variance of PC$_k$ equals $\lambda_k$.
Step 6: Variance Explained and Truncation
The fraction of total variance captured by mode $k$ is:
$$f_k = \frac{\lambda_k}{\sum_{j=1}^{p}\lambda_j} = \frac{\lambda_k}{\text{tr}(\mathbf{C})}$$
The data can be reconstructed from $m \ll p$ modes: $\mathbf{X}' \approx \sum_{k=1}^{m} \text{PC}_k \, \mathbf{e}_k^T$. The North et al. (1982) rule of thumb tests whether eigenvalues are well-separated: $\delta\lambda_k \approx \lambda_k \sqrt{2/n}$.
Step 7: SVD Equivalence (Practical Computation)
For large datasets, the SVD of the anomaly matrix is more efficient than eigendecomposition of $\mathbf{C}$:
$$\mathbf{X}' = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^T, \quad \mathbf{e}_k = \mathbf{V}_{:,k}, \quad \text{PC}_k = \mathbf{U}_{:,k}\sigma_k, \quad \lambda_k = \frac{\sigma_k^2}{n-1}$$
Derivation: Spectral Analysis & the Periodogram
Step 1: Discrete Fourier Transform
Given a uniformly sampled time series $x_n = x(n\Delta t)$ for $n = 0, 1, \ldots, N-1$, the discrete Fourier transform at frequency $f_k = k/(N\Delta t)$ is:
$$\hat{X}(f_k) = \sum_{n=0}^{N-1} x_n \, e^{-2\pi i k n / N}$$
Step 2: Parseval's Theorem (Energy Conservation)
The total energy is preserved between time and frequency domains:
$$\sum_{n=0}^{N-1} |x_n|^2 = \frac{1}{N}\sum_{k=0}^{N-1}|\hat{X}(f_k)|^2$$
Step 3: Define the Periodogram
The periodogram is the sample estimate of the power spectral density (PSD). It distributes the total variance across frequency bins:
$$S(f_k) = \frac{2\Delta t}{N}|\hat{X}(f_k)|^2 = \frac{2}{N\Delta t}\left|\sum_{n=0}^{N-1}x_n \, e^{-2\pi i f_k n\Delta t}\right|^2$$
The factor of 2 accounts for folding negative frequencies onto positive ones (one-sided spectrum). Units are $[\text{unit}]^2/\text{Hz}$.
Step 4: Frequency Resolution and Nyquist Limit
The fundamental frequency resolution is $\Delta f = 1/(N\Delta t)$ (the Rayleigh resolution). The maximum resolvable frequency (Nyquist) is:
$$f_{\text{Nyquist}} = \frac{1}{2\Delta t}$$
For monthly ocean data with $\Delta t = 1$ month: $f_{\text{Ny}} = 0.5$ cycle/month, and $N = 300$ months gives $\Delta f = 1/300$ cycle/month.
Step 5: Statistical Properties and Smoothing
The raw periodogram is an inconsistent estimator: its variance does not decrease with more data. Each periodogram ordinate follows a $\chi^2_2$ distribution. Smoothing (Welch's method, Bartlett averaging, or multitaper) reduces variance:
$$\hat{S}_{\text{smooth}}(f) = \frac{1}{M}\sum_{m=1}^{M} S_m(f), \quad \text{d.o.f.} = 2M$$
Confidence intervals for the smoothed PSD: $\frac{\nu \hat{S}(f)}{\chi^2_{\nu,\alpha/2}} \leq S(f) \leq \frac{\nu \hat{S}(f)}{\chi^2_{\nu,1-\alpha/2}}$ where $\nu = 2M$ degrees of freedom.
Step 6: Red Noise Background and Significance Testing
Ocean time series typically have red spectra (more variance at low frequencies). The AR(1) red noise null spectrum for a process with lag-1 autocorrelation $r_1$ is:
$$S_{\text{red}}(f) = \frac{S_0 (1 - r_1^2)}{1 - 2r_1\cos(2\pi f\Delta t) + r_1^2}$$
Spectral peaks exceeding the 95% confidence level above this background are considered statistically significant periodicities.
Derivation: Objective Analysis / Optimal Interpolation
Step 1: Statement of the Problem
We have $m$ observations $d_i = \phi(\mathbf{x}_i) + \epsilon_i$ of a field $\phi$ at irregular locations, where $\epsilon_i$ is measurement noise with variance $\epsilon^2$. We seek the best linear unbiased estimate at an arbitrary point $\mathbf{x}_0$:
$$\hat{\phi}(\mathbf{x}_0) = \bar{\phi} + \sum_{i=1}^{m} w_i (d_i - \bar{\phi})$$
Step 2: Minimize the Expected Mean-Squared Error
Define the analysis error as $e = \hat{\phi}(\mathbf{x}_0) - \phi(\mathbf{x}_0)$. Minimize the expected squared error $\langle e^2 \rangle$ with respect to the weights $w_i$:
$$\frac{\partial}{\partial w_j}\left\langle\left[\sum_i w_i(d_i - \bar{\phi}) - (\phi_0 - \bar{\phi})\right]^2\right\rangle = 0$$
Step 3: Expand Using Covariances
Define the signal covariance between two points: $C(\mathbf{x}_i, \mathbf{x}_j) = \langle\phi'(\mathbf{x}_i)\phi'(\mathbf{x}_j)\rangle$. The observation covariance matrix includes noise: $[\mathbf{C}_{obs}]_{ij} = C(\mathbf{x}_i, \mathbf{x}_j) + \epsilon^2\delta_{ij}$. Taking the derivative yields:
$$\sum_{i=1}^{m}\left[C(\mathbf{x}_j, \mathbf{x}_i) + \epsilon^2\delta_{ji}\right]w_i = C(\mathbf{x}_j, \mathbf{x}_0) \quad \forall j$$
Step 4: Matrix Form of the Solution
In matrix notation, $(\mathbf{C} + \epsilon^2\mathbf{I})\mathbf{w} = \mathbf{c}$, giving:
$$\mathbf{w} = (\mathbf{C} + \epsilon^2\mathbf{I})^{-1}\mathbf{c}$$
$$\hat{\phi}(\mathbf{x}_0) = \bar{\phi} + \mathbf{c}^T(\mathbf{C} + \epsilon^2\mathbf{I})^{-1}(\mathbf{d} - \bar{\phi})$$
Step 5: Analysis Error Variance
Substituting the optimal weights back into the error expression gives the minimum achievable error variance:
$$\sigma_a^2(\mathbf{x}_0) = \sigma^2 - \mathbf{c}^T(\mathbf{C} + \epsilon^2\mathbf{I})^{-1}\mathbf{c}$$
When $\mathbf{x}_0$ coincides with an observation, the error reduces to $\sigma_a^2 \to \epsilon^2 \sigma^2/(\sigma^2 + \epsilon^2)$. Far from any observation, $\mathbf{c} \to 0$ and $\sigma_a^2 \to \sigma^2$ (background variance).
Step 6: Choice of Covariance Model
The background covariance function must be specified a priori. Common choices for isotropic, homogeneous fields:
$$C(r) = \sigma^2 \exp\!\left(-\frac{r^2}{2L^2}\right) \quad \text{(Gaussian)}, \qquad C(r) = \sigma^2\left(1 + \frac{r}{L}\right)e^{-r/L} \quad \text{(SOAR)}$$
The decorrelation length $L$ controls the smoothness of the analysis. For SST: $L \sim 100\text{--}300$ km; for SSH: $L \sim 100\text{--}200$ km. Anisotropic models use different length scales along and across mean flow.
Step 7: Connection to Kriging and Kalman Filtering
OI is mathematically identical to simple Kriging in geostatistics and to the analysis step of the Kalman filter. The Kalman gain matrix is:
$$\mathbf{K} = \mathbf{P}^f\mathbf{H}^T(\mathbf{H}\mathbf{P}^f\mathbf{H}^T + \mathbf{R})^{-1}$$
where $\mathbf{P}^f$ is the forecast error covariance, $\mathbf{H}$ is the observation operator, and $\mathbf{R}$ is the observation error covariance. In OI, $\mathbf{P}^f$ is replaced by the static background covariance.
Machine Learning in Oceanography
Machine learning has transformed oceanographic data analysis, enabling pattern recognition in massive satellite datasets, gap-filling of sparse in-situ observations, and emulation of computationally expensive numerical models. The key techniques span supervised learning (regression, classification), unsupervised learning (clustering, dimensionality reduction), and physics-informed approaches that combine data-driven methods with domain knowledge.
Neural Networks for SST Prediction
Sea surface temperature forecasting is one of the most successful applications of deep learning in oceanography. Traditional approaches include persistence (tomorrow's SST = today's SST), linear inverse models (LIM), and analog forecasting. Neural networks outperform these methods, particularly for ENSO prediction at lead times of 6–18 months.
LSTM (Long Short-Term Memory): Recurrent neural network designed for sequential data. The cell state carries information over long time lags, making LSTMs ideal for capturing ENSO's multi-year memory. The gating mechanism (forget gate, input gate, output gate) selectively retains or discards information:
$$f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f), \quad i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$$
$$C_t = f_t \odot C_{t-1} + i_t \odot \tanh(W_C \cdot [h_{t-1}, x_t] + b_C)$$
ConvLSTM: Extends LSTM by replacing matrix multiplications with convolutions, preserving spatial structure. Input is a spatiotemporal field (e.g., monthly SST maps). The convolution kernels capture spatial patterns (warm pool, cold tongue) while the LSTM architecture captures temporal evolution. Ham et al. (2019) demonstrated ConvLSTM ENSO forecasts with correlation skill >0.5 at 17-month lead time, surpassing all dynamical models.
Transfer Learning: Pre-train on CMIP6 climate model output (abundant data), then fine-tune on short observational record (1982–present). This overcomes the limited training data problem that plagues purely data-driven ocean forecasting. The model learns physics-like relationships from simulations and adapts to real-world biases.
Water Mass Classification
Traditional water mass identification relies on expert analysis of T-S diagrams. Machine learning automates this process and reveals structure invisible to manual inspection:
K-means Clustering: Partitions T-S-depth-nutrient profiles into K clusters by minimizing within-cluster variance. The objective function is:
$$J = \sum_{k=1}^{K} \sum_{i \in C_k} \|\mathbf{x}_i - \boldsymbol{\mu}_k\|^2$$
Applied to Argo float profiles, K-means identifies water masses (e.g., AAIW, NADW, AABW) and their boundaries more objectively than manual methods. The silhouette score or BIC helps determine optimal K.
Gaussian Mixture Models (GMM): Soft clustering — each profile has a probability of belonging to each water mass. The EM algorithm fits a mixture of K Gaussians. Captures overlapping water masses and mixing zones that hard clustering misses. Probability maps show transition zones between water masses.
Self-Organizing Maps (SOM): Neural network that projects high-dimensional T/S/nutrient profiles onto a 2D map while preserving topological relationships. Adjacent neurons represent similar water masses. SOMs have been used to classify biogeochemical provinces in the Southern Ocean and to track water mass transformation along Argo float trajectories.
Physics-Informed Neural Networks (PINNs)
PINNs embed physical laws directly into the neural network loss function, combining the flexibility of neural networks with the constraints of known physics:
$$\mathcal{L} = \underbrace{\frac{1}{N_d}\sum_{i=1}^{N_d}|u_\theta(\mathbf{x}_i) - u_i^{\text{obs}}|^2}_{\text{data loss}} + \underbrace{\lambda \frac{1}{N_r}\sum_{j=1}^{N_r}|\mathcal{N}[u_\theta](\mathbf{x}_j)|^2}_{\text{physics loss}}$$
where $\mathcal{N}$ is a differential operator encoding the PDE (e.g., advection-diffusion, shallow water equations) and $\lambda$ balances data fidelity against physical consistency.
Ocean applications: Inferring subsurface temperature from surface observations and sparse Argo profiles; reconstructing velocity fields from altimetry while satisfying geostrophic balance; estimating turbulent diffusivity from tracer observations while obeying the advection-diffusion equation.
Neural Operator approaches: Fourier Neural Operators (FNO) and DeepONet learn the solution operator of PDEs, enabling 1000× faster ocean model emulation. Trained on NEMO or MOM6 output, they can predict ocean state evolution in milliseconds instead of hours.
Ocean Reanalyses and Data Assimilation
Ocean reanalyses combine numerical ocean models with observations through data assimilation algorithms to produce the best estimate of the 3D ocean state over time:
4D-Var (Variational): Minimizes the cost function $J(\mathbf{x}_0) = \frac{1}{2}(\mathbf{x}_0 - \mathbf{x}_b)^T \mathbf{B}^{-1}(\mathbf{x}_0 - \mathbf{x}_b) + \frac{1}{2}\sum_k (\mathbf{y}_k - H_k[\mathbf{x}_k])^T \mathbf{R}_k^{-1}(\mathbf{y}_k - H_k[\mathbf{x}_k])$ over a time window, requiring the adjoint model for gradient computation. Used in ECMWF's ORAS5.
Ensemble Kalman Filter (EnKF): Propagates an ensemble of model states forward, then updates each member using observations. The ensemble spread estimates the forecast error covariance. Used in GLORYS12 (Mercator Océan). Advantage: no adjoint model needed; handles nonlinearity better.
Key products: ORAS5 (ECMWF, 1958–present, 0.25°), GLORYS12 (Mercator, 1993–present, 1/12°), SODA (UMD, 1980–present, 0.25°). These provide global 3D temperature, salinity, currents, and sea level fields used for climate monitoring, initialization of seasonal forecasts, and validation of ocean models.
Python: Water Mass Classification & SST Prediction
Python: Water Mass Classification & SST Prediction
PythonK-means clustering, GMM, and LSTM-like SST forecasting
Click Run to execute the Python code
Code will be executed with Python 3 on the server
Fortran: Physics-Informed Temperature Interpolation
This Fortran program demonstrates a physics-informed approach to ocean temperature interpolation. Instead of purely statistical interpolation, it solves the steady-state advection-diffusion equation constrained by sparse observations — the core idea behind PINNs, implemented as a relaxation solver.
Fortran: Physics-Informed Temperature Interpolation
FortranAdvection-diffusion constrained interpolation
Click Run to execute the Fortran code
Code will be compiled with gfortran and executed on the server
Python: EOF Analysis, Wavelet Transform & Optimal Interpolation
Python: EOF Analysis, Wavelet Transform & Optimal Interpolation
Python!/usr/bin/env python3
Click Run to execute the Python code
Code will be executed with Python 3 on the server
Fortran: Optimal Interpolation for Irregularly-Spaced Ocean Data
This program implements the Bretherton, Davis, and Fandry (1976) optimal interpolation scheme for mapping scattered ocean observations (e.g., ship tracks, Argo floats) onto a regular grid with uncertainty estimates.
Fortran: Optimal Interpolation for Irregularly-Spaced Ocean Data
FortranOptimal interpolation (objective analysis) for ocean data
Click Run to execute the Fortran code
Code will be compiled with gfortran and executed on the server