Protein Folding & Misfolding
From Anfinsen's thermodynamic hypothesis through Levinthal's paradox to modern funnel theory — with full derivations of folding kinetics, the Zimm-Bragg helix-coil transition, and computational simulations.
Learning Objectives
- ●Understand Anfinsen's thermodynamic hypothesis and the concept of the native state
- ●Derive the numbers behind Levinthal's paradox and explain why it demands a directed search
- ●Analyze the free energy landscape and funnel theory of protein folding
- ●Derive two-state folding kinetics, chevron plots, and $\phi$-value analysis
- ●Apply the Zimm-Bragg model to helix-coil transitions
- ●Connect folding theory to amyloid diseases and modern structure prediction
1. Introduction — Anfinsen's Thermodynamic Hypothesis
In 1961, Christian Anfinsen demonstrated that the enzyme ribonuclease A, once fully denatured and reduced, could refold spontaneously into its catalytically active form when the denaturant was removed and disulfide bonds were allowed to re-form. This landmark experiment established the thermodynamic hypothesis:
The native structure of a protein is the thermodynamically most stable state — the unique conformation that minimizes the Gibbs free energy under physiological conditions.
Mathematically, the native state N satisfies:
$G(N) = \min_{\{\text{all conformations } C\}} G(C)$
The Gibbs free energy of the native state relative to the unfolded ensemble is:
$\Delta G_{\text{fold}} = G_N - G_U = \Delta H_{\text{fold}} - T\Delta S_{\text{fold}}$
For a typical small globular protein, $\Delta G_{\text{fold}} \approx -5$ to$-15 \; \text{kcal/mol}$ — remarkably small compared to the total enthalpic and entropic contributions, which can each be hundreds of kcal/mol. The native state is thus only marginally stable, a delicate balance between large opposing forces.
Anfinsen's insight implies that the amino acid sequence alone encodes all the information needed to determine the three-dimensional structure. This is the foundation of the entire field of protein structure prediction — from homology modeling to the revolutionary AlphaFold.
The key insight of Ken Dill, Jose Onuchic, Peter Wolynes, and others is that the energy landscape for a foldable protein is funnel-shaped. As the chain forms increasing numbers of native contacts, the energy decreases on average, creating a downhill bias toward the native state.
Quantifying the Funnel
Define a reaction coordinate $Q$ as the fraction of native contacts formed ($0 \leq Q \leq 1$). The free energy as a function of $Q$ is:
$G(Q) = E(Q) - T S(Q)$
The energy decreases roughly linearly with $Q$:
$E(Q) \approx -Q \cdot n \cdot \epsilon_0$
where $\epsilon_0$ is the average energy per native contact. The conformational entropy decreases as native contacts constrain the chain:
$S(Q) \approx (1 - Q) \cdot n \cdot k_B \ln(\Omega_0)$
The competition between these two terms creates a free energy profile with a barrier at intermediate $Q$, the height of which determines the folding rate. When $\epsilon_0 > k_B T \ln(\Omega_0)$, the funnel slope is sufficient to guide folding.
The principle of minimal frustration (Bryngelson and Wolynes, 1987) states that natural proteins have evolved sequences where the energy landscape is minimally frustrated — meaning that most local energy minima still point downhill toward the native state, avoiding deep kinetic traps.
4. Derivation 3: Two-State Folding Kinetics
Many small single-domain proteins fold in a cooperative, two-state manner with no detectable intermediates. The folding reaction is:
$N \rightleftharpoons U$
Equilibrium Thermodynamics
The equilibrium constant for unfolding is:
$K_U = \frac{[U]}{[N]} = \exp\left(\frac{-\Delta G_{N \to U}}{RT}\right) = \exp\left(\frac{\Delta G_{\text{fold}}}{RT}\right)$
Since $\Delta G_{\text{fold}} = G_N - G_U < 0$, we have $K_U < 1$, and the native state is favored. The fraction of unfolded protein is:
$f_U = \frac{K_U}{1 + K_U} = \frac{1}{1 + \exp(-\Delta G_{\text{fold}}/RT)}$
Linear Free Energy Relationships and Denaturant Dependence
The stability of a protein varies linearly with denaturant concentration [D] (urea or guanidinium chloride):
$\Delta G_U([\text{D}]) = \Delta G_U^{H_2O} - m_{\text{eq}} \cdot [\text{D}]$
where $\Delta G_U^{H_2O}$ is the stability in the absence of denaturant and$m_{\text{eq}}$ (the m-value) reflects the change in solvent-accessible surface area upon unfolding, typically$1$–$5$ kcal/(mol$\cdot$M).
Folding and Unfolding Rate Constants
Using transition state theory, the microscopic rate constants for folding ($k_f$) and unfolding ($k_u$) also depend linearly on denaturant:
$\ln k_f([\text{D}]) = \ln k_f^{H_2O} - \frac{m_f}{RT} \cdot [\text{D}]$
$\ln k_u([\text{D}]) = \ln k_u^{H_2O} + \frac{m_u}{RT} \cdot [\text{D}]$
where $m_f$ and $m_u$ are the kinetic m-values, with$m_{\text{eq}} = m_f + m_u$.
Derivation of the Chevron Plot
In a kinetic experiment (e.g., stopped-flow fluorescence), the observed relaxation rate is the sum of folding and unfolding rates:
$k_{\text{obs}} = k_f + k_u$
Substituting the denaturant dependencies:
$k_{\text{obs}}([\text{D}]) = k_f^{H_2O} \exp\left(-\frac{m_f}{RT}[\text{D}]\right) + k_u^{H_2O} \exp\left(\frac{m_u}{RT}[\text{D}]\right)$
Taking the logarithm of $k_{\text{obs}}$:
$\ln k_{\text{obs}}([\text{D}]) = \ln\left[k_f^{H_2O} e^{-m_f[\text{D}]/RT} + k_u^{H_2O} e^{+m_u[\text{D}]/RT}\right]$
Shape of the Chevron Plot
Plotting $\ln k_{\text{obs}}$ vs [D] yields the characteristic V-shaped chevron plot:
- ●Left arm (low [D]): $k_f \gg k_u$, so $\ln k_{\text{obs}} \approx \ln k_f^{H_2O} - (m_f/RT)[\text{D}]$ — decreasing slope
- ●Minimum (midpoint): At $[\text{D}]_{1/2}$ where $k_f = k_u$ and $\Delta G = 0$
- ●Right arm (high [D]): $k_u \gg k_f$, so $\ln k_{\text{obs}} \approx \ln k_u^{H_2O} + (m_u/RT)[\text{D}]$ — increasing slope
$\phi$-Value Analysis
Alan Fersht developed $\phi$-value analysis to map the structure of the transition state ensemble. For a mutation that changes the stability by $\Delta\Delta G_{N-U}$and the folding activation energy by $\Delta\Delta G_{\ddagger-U}$:
$\phi = \frac{\Delta\Delta G_{\ddagger - U}}{\Delta\Delta G_{N - U}} = \frac{RT \ln(k_f^{\text{wt}}/k_f^{\text{mut}})}{\Delta\Delta G_{N-U}}$
Interpretation of $\phi$-values:
- ●$\phi = 1$: The mutated residue is fully structured in the transition state (native-like interactions fully formed)
- ●$\phi = 0$: The mutated residue is fully unstructured in the transition state (no native contacts formed)
- ●$0 < \phi < 1$: Partial structure formation, indicating the residue is involved in partially formed interactions at the transition state
5. Derivation 4: Helix-Coil Transition (Zimm-Bragg Model)
The helix-coil transition is one of the simplest and most thoroughly understood conformational transitions in biophysics. The Zimm-Bragg model (1959) treats a polypeptide chain as a one-dimensional Ising-like system where each residue is either in a helical (h) or coil (c) state.
Model Parameters
- Propagation parameter $s$: The equilibrium constant for adding a helical residue to an existing helix.$s = \exp(-\Delta G_{\text{prop}}/k_BT)$. When $s > 1$, helix extension is favorable. The transition occurs near $s = 1$.
- Nucleation parameter $\sigma$: The statistical weight penalty for initiating a new helical segment.$\sigma \ll 1$ (typically $10^{-3}$ to $10^{-4}$for $\alpha$-helices) because forming the first turn of a helix requires constraining $\sim 3$ residues without gaining a hydrogen bond.
Transfer Matrix Method
The statistical weight of each residue depends on its own state and that of its predecessor. We define a $2 \times 2$ transfer matrix $\mathbf{M}$:
$\mathbf{M} = \begin{pmatrix} 1 & \sigma s \\ 1 & s \end{pmatrix}$
Rows: predecessor state (c, h). Columns: current state (c, h).
The element $M_{ij}$ gives the statistical weight for residue $k$being in state $j$ given that residue $k-1$ is in state $i$:
- $c \to c$: weight = 1 (reference state)
- $c \to h$: weight = $\sigma s$ (nucleation penalty $\times$ propagation)
- $h \to c$: weight = 1 (helix termination, no penalty)
- $h \to h$: weight = $s$ (helix propagation)
Partition Function
For a chain of $N$ residues, the partition function is obtained by multiplying transfer matrices:
$Z = \mathbf{v}_0^T \cdot \mathbf{M}^N \cdot \mathbf{v}_f$
where $\mathbf{v}_0 = (1, 0)^T$ (chain starts in coil) and $\mathbf{v}_f = (1, 1)^T$ (sum over final states).
Eigenvalue Solution
The eigenvalues of $\mathbf{M}$ are found from $\det(\mathbf{M} - \lambda \mathbf{I}) = 0$:
$(1 - \lambda)(s - \lambda) - \sigma s = 0$
$\lambda^2 - (1 + s)\lambda + s(1 - \sigma) = 0$
Using the quadratic formula:
$\lambda_{\pm} = \frac{(1 + s) \pm \sqrt{(1 - s)^2 + 4\sigma s}}{2}$
For large $N$, the partition function is dominated by the larger eigenvalue $\lambda_+$:
$Z \approx c_+ \lambda_+^N$
Fraction Helix
The average fraction of residues in the helical state is:
$\theta = \frac{s}{N} \frac{\partial \ln Z}{\partial s} \approx \frac{s}{\lambda_+}\frac{\partial \lambda_+}{\partial s}$
Evaluating the derivative:
$\frac{\partial \lambda_+}{\partial s} = \frac{1}{2}\left(1 + \frac{-(1-s) + 2\sigma}{\sqrt{(1-s)^2 + 4\sigma s}}\right)$
At the transition midpoint $s = 1$:
$\theta(s=1) = \frac{1}{2}$
Sharpness of the Transition
The sharpness of the helix-coil transition is controlled by $\sigma$. Smaller $\sigma$ (stronger nucleation penalty) gives a sharper, more cooperative transition. In the limit $\sigma \to 0$, the transition becomes an all-or-nothing phase transition. For real $\alpha$-helices with$\sigma \approx 10^{-3}$–$10^{-4}$, the transition is fairly sharp and occurs over a narrow temperature range of $\sim 10$–$20$ K.
6. Applications
AlphaFold & Structure Prediction
DeepMind's AlphaFold (2020) achieved near-experimental accuracy in protein structure prediction at CASP14, validating Anfinsen's hypothesis computationally. It uses multiple sequence alignments and attention-based neural networks to predict 3D coordinates directly from sequence. AlphaFold2 has predicted structures for over 200 million proteins, transforming structural biology.
Amyloid Diseases
Protein misfolding leads to aggregation into amyloid fibrils — ordered, cross-$\beta$ sheet structures that are thermodynamically stable but kinetically trapped. These are implicated in:
- ●Alzheimer's disease: A$\beta$ peptide and tau protein
- ●Parkinson's disease: $\alpha$-synuclein
- ●Prion diseases: PrP$^{\text{Sc}}$
- ●Type II diabetes: IAPP (amylin)
Molecular Chaperones
Chaperones (GroEL/GroES, Hsp70, Hsp90) do not provide folding information but rather prevent aggregation by sequestering unfolded or partially folded intermediates. GroEL provides an isolated cavity where a single protein can fold without intermolecular contacts. The iterative annealing mechanism (Thirumalai and Lorimer) suggests chaperones unfold kinetically trapped intermediates, giving them another chance to reach the native state.
Drug Design Targeting Misfolding
Therapeutic strategies include:
- ●Kinetic stabilizers: Tafamidis binds transthyretin (TTR) and prevents amyloid formation
- ●Pharmacological chaperones: Small molecules that stabilize the native fold
- ●Aggregation inhibitors: Compounds that block fibril elongation
- ●Immunotherapy: Antibodies targeting amyloid plaques (e.g., lecanemab for Alzheimer's)
7. Historical Context
Christian Anfinsen demonstrates spontaneous refolding of ribonuclease A, establishing the thermodynamic hypothesis. He receives the Nobel Prize in Chemistry in 1972 "for his work on ribonuclease, especially concerning the connection between the amino acid sequence and the biologically active conformation."
Cyrus Levinthal articulates the paradox bearing his name at a conference, published in 1969. The paradox motivates the search for folding pathways and intermediates.
Bryngelson & Wolynes introduce the energy landscape theory and the principle of minimal frustration, laying the groundwork for the funnel picture.
Ken Dill, Jose Onuchic, and Peter Wolynes formalize the folding funnel concept and develop the "new view" of protein folding based on statistical mechanics of heteropolymers.
David Baker and colleagues design the first computationally designed protein (Top7) and develop the Rosetta software suite, which becomes a cornerstone of computational protein design.
DeepMind's AlphaFold2 achieves near-experimental accuracy at CASP14, solving the protein structure prediction problem for single-domain proteins. Demis Hassabis and John Jumper share the 2024 Nobel Prize in Chemistry with David Baker for computational protein design and structure prediction.
Related Video Lectures
Anfinsen's Dogma (Thermodynamic Hypothesis)
Levinthal's Paradox
Protein Folding is a Hydrophobic Collapse
Free Energy of Alpha Helix Formation
8. Python Simulation
Below we simulate three key aspects of protein folding theory using only NumPy: (1) the free energy profile as a function of the reaction coordinate at different temperatures, (2) the chevron plot for two-state folding kinetics, and (3) the Zimm-Bragg helix-coil transition showing how the nucleation parameter $\sigma$ controls cooperativity.
Protein Folding: Free Energy Landscape & Kinetics
PythonClick Run to execute the Python code
Code will be executed with Python 3 on the server
Summary of Key Equations
Levinthal's Paradox
$\Omega = 3^{100} \approx 5 \times 10^{47}, \quad t_{\text{search}} = \frac{\Omega}{10^{13}\,\text{s}^{-1}} \approx 10^{27}\,\text{years}$
Folding Free Energy
$\Delta G_{\text{fold}} = \Delta H_{\text{fold}} - T\Delta S_{\text{fold}}$
Two-State Equilibrium
$K_U = \frac{[U]}{[N]} = \exp\!\left(\frac{\Delta G_{\text{fold}}}{RT}\right), \quad \Delta G_U([\text{D}]) = \Delta G_U^{H_2O} - m_{\text{eq}}[\text{D}]$
Chevron Plot
$k_{\text{obs}} = k_f^{H_2O}\,e^{-m_f[\text{D}]/RT} + k_u^{H_2O}\,e^{+m_u[\text{D}]/RT}$
$\phi$-Value Analysis
$\phi = \frac{\Delta\Delta G_{\ddagger - U}}{\Delta\Delta G_{N - U}}$
Zimm-Bragg Eigenvalues
$\lambda_{\pm} = \frac{(1 + s) \pm \sqrt{(1 - s)^2 + 4\sigma s}}{2}$