Protein Folding Dynamics

From two-state folding kinetics and chevron analysis through phi-value mapping of transition states to the Zimm-Bragg helix-coil theory and chaperone mechanisms.

Derivation 1: Two-State Folding & the Chevron Plot

Many small single-domain proteins fold in a two-state manner, with only the native (N) and unfolded (U) states significantly populated:

$$\text{U} \;\underset{k_u}{\overset{k_f}{\rightleftharpoons}}\; \text{N}$$

Equilibrium Thermodynamics

The equilibrium constant and free energy of folding are:

$$K = \frac{[\text{U}]}{[\text{N}]} = \frac{k_u}{k_f} = \exp\!\left(\frac{\Delta G}{RT}\right)$$

where $\Delta G = G_U - G_N > 0$ for a stable protein. At the midpoint concentration of denaturant$C_m$, $\Delta G = 0$ and $K = 1$ ($k_f = k_u$).

The fraction unfolded at equilibrium is:

$$f_U = \frac{K}{1 + K} = \frac{1}{1 + \exp(-\Delta G/RT)}$$

Kinetics: The Chevron Plot

For a two-state protein, relaxation to equilibrium follows single-exponential kinetics with observed rate:

$$\boxed{k_{\text{obs}} = k_f + k_u}$$

Both rates depend on denaturant concentration [D] according to linear free energy relationships:

$$\ln k_f = \ln k_f^{\text{H}_2\text{O}} + \frac{m_f[\text{D}]}{RT}, \qquad \ln k_u = \ln k_u^{\text{H}_2\text{O}} + \frac{m_u[\text{D}]}{RT}$$

where $m_f < 0$ (folding slows with denaturant) and $m_u > 0$ (unfolding accelerates). A plot of $\ln k_{\text{obs}}$ vs [D] produces the characteristic V-shaped chevron plot: the left arm reflects folding and the right arm reflects unfolding.

Tanford $\beta$-Value

The Tanford $\beta$ value quantifies the position of the transition state on the folding reaction coordinate:

$$\boxed{\beta_T = \frac{m_f}{m_f - m_u} = \frac{m_f}{m_{\text{eq}}}}$$

where $m_{\text{eq}} = m_f - m_u$ is the equilibrium m-value. A $\beta_T \approx 0.7$ indicates that the transition state has ~70% of the solvent-accessible surface area buried relative to the native state — it is compact and native-like.

Linear Extrapolation Method (LEM)

The free energy of folding in water is obtained by linear extrapolation:

$$\Delta G([\text{D}]) = \Delta G^{\text{H}_2\text{O}} - m_{\text{eq}}[\text{D}]$$

At $C_m$, $\Delta G = 0$, so $\Delta G^{\text{H}_2\text{O}} = m_{\text{eq}} \cdot C_m$. Kinetic and equilibrium measurements must agree: $\Delta G^{\text{H}_2\text{O}} = -RT\ln(k_f^{\text{H}_2\text{O}}/k_u^{\text{H}_2\text{O}})$.

Derivation 2: $\Phi$-Value Analysis

$\Phi$-value analysis, developed by Alan Fersht, is the most powerful experimental method for mapping the structure of the transition state (TS) at residue-level resolution.

Definition

For a mutation that destabilizes the native state by $\Delta\Delta G°$ and raises the folding barrier by $\Delta\Delta G^\ddagger$:

$$\boxed{\Phi = \frac{\Delta\Delta G^\ddagger}{\Delta\Delta G°}}$$

where the individual terms are computed from kinetic data:

$$\Delta\Delta G^\ddagger = -RT\ln\!\left(\frac{k_f^{\text{mut}}}{k_f^{\text{wt}}}\right)$$
$$\Delta\Delta G° = -RT\ln\!\left(\frac{K_{\text{eq}}^{\text{mut}}}{K_{\text{eq}}^{\text{wt}}}\right) = -RT\ln\!\left(\frac{k_f^{\text{mut}}/k_u^{\text{mut}}}{k_f^{\text{wt}}/k_u^{\text{wt}}}\right)$$

Interpretation

  • $\Phi = 1$: The mutation affects the TS as much as it affects the native state. The residue is fully structured (native-like) in the TS. The mutation slows folding but does not affect unfolding.
  • $\Phi = 0$: The mutation does not affect the TS at all. The residue is fully unstructured in the TS. The mutation accelerates unfolding but does not affect folding.
  • $0 < \Phi < 1$: Partial structure formation at the TS. Could reflect fractional native contacts, or an average over parallel pathways.

Requirements for Reliable $\Phi$-Values

Valid $\Phi$-value analysis requires:

  • Conservative mutations (e.g., $\text{Val} \rightarrow \text{Ala}$, deletion of a methyl group) that remove non-covalent interactions without introducing new ones
  • Sufficiently large $\Delta\Delta G° > 2\;\text{kJ/mol}$ to avoid noise artifacts
  • The protein must remain two-state (no change in folding mechanism upon mutation)
  • Linear chevron plots without rollover (curvature indicates non-two-state behavior)

Key Findings from $\Phi$-Value Studies

Decades of $\Phi$-value analysis across many proteins have revealed that:

  • Transition states are heterogeneous but compact — generally 60–80% of native contacts are formed
  • The nucleus (high-$\Phi$ residues) is typically formed by residues from multiple secondary structure elements, confirming the nucleation-condensation mechanism
  • Proteins with similar topology tend to have similar $\Phi$-value patterns, supporting the idea that topology determines the folding mechanism

Derivation 3: Zimm-Bragg Helix-Coil Transition

The Zimm-Bragg model (1959) provides an exact statistical mechanical treatment of the helix-coil transition in polypeptides using a transfer matrix formalism.

The Two Parameters

  • $s$ (propagation parameter): The equilibrium constant for adding a helical residue to an existing helix:$\cdots\text{hh}\text{c} \rightleftharpoons \cdots\text{hh}\text{h}$. When $s > 1$, helix propagation is favorable.
  • $\sigma$ (nucleation parameter): The equilibrium constant penalty for initiating a new helix segment:$\cdots\text{cc}\text{c} \rightleftharpoons \cdots\text{cc}\text{h}$. The statistical weight for nucleation is $\sigma s$. For polypeptides, $\sigma \approx 10^{-3}$ to $10^{-4}$, reflecting the entropic cost of fixing three consecutive residues to form the first H-bond.

Transfer Matrix Formulation

Each residue is in state c (coil) or h (helix). The statistical weight of each pair transition defines the transfer matrix:

$$\mathbf{M} = \begin{pmatrix} 1 & \sigma s \\ 1 & s \end{pmatrix}$$

The rows index the state of residue $i$ (top: c, bottom: h) and the columns index the state of residue $i+1$. The partition function for a chain of $N$ residues is:

$$Z = \mathbf{e}_1^T \cdot \mathbf{M}^N \cdot \mathbf{e}_2$$

where $\mathbf{e}_1$ and $\mathbf{e}_2$ are appropriate boundary vectors.

Eigenvalue Solution

The eigenvalues of $\mathbf{M}$ are:

$$\lambda_{1,2} = \frac{(1+s) \pm \sqrt{(1-s)^2 + 4\sigma s}}{2}$$

For large $N$, $Z \approx \lambda_1^N$ (the larger eigenvalue dominates), and the helix fraction is:

$$\boxed{\theta = \frac{1}{N}\frac{\partial \ln Z}{\partial \ln s} \approx \frac{1}{2}\left(1 + \frac{s - 1}{\sqrt{(1-s)^2 + 4\sigma s}}\right)}$$

Key features of the Zimm-Bragg model:

  • The transition midpoint occurs at $s = 1$
  • The sharpness of the transition is governed by $\sigma$: smaller $\sigma$ gives a sharper (more cooperative) transition
  • The width of the transition scales as $\Delta s \sim \sqrt{\sigma}$
  • Longer chains show sharper transitions (finite-size effects)

Temperature Dependence

The propagation parameter $s$ depends on temperature through:

$$s(T) = \exp\!\left(-\frac{\Delta H_{\text{res}}}{R}\left(\frac{1}{T} - \frac{1}{T_m}\right)\right)$$

where $\Delta H_{\text{res}} \approx -4\;\text{kJ/mol}$ is the enthalpy change per residue for helix formation and $T_m$ is the melting temperature. At $T = T_m$, $s = 1$ and the helix fraction is 0.5.

Derivation 4: Chaperone Mechanisms

The GroEL/GroES System

GroEL is a tetradecameric (14-subunit) chaperonin arranged as two stacked heptameric rings, forming a barrel with a central cavity (~45 ƅ diameter). GroES is a heptameric co-chaperonin lid.

Mechanism (Iterative Annealing):

  • Capture: Unfolded/misfolded substrate binds to the hydrophobic inner surface of the open (trans) ring of GroEL
  • Encapsulation: ATP binding triggers a conformational change; GroES caps the ring, creating an enclosed, hydrophilic chamber (~65 ƅ diameter, ~175,000 ƅ$^3$ volume)
  • Folding: The substrate folds in the chamber for ~10 s (the time for ATP hydrolysis), protected from aggregation. The confined space may accelerate folding by limiting the conformational search
  • Release: ATP hydrolysis weakens GroES binding. ATP binding to the opposite (trans) ring triggers GroES and substrate release
  • Reiteration: If not yet native, the substrate can rebind for another round. This is the "iterative annealing" mechanism

The energetic cost is 7 ATP per folding cycle. Approximately 10–15% of E. coli proteins are GroEL substrates, primarily those with complex $\alpha/\beta$ topologies (TIM barrels, Rossmann folds).

The Hsp70 System

Hsp70 (DnaK in bacteria) is the most abundant cellular chaperone. It works with the co-chaperone Hsp40 (DnaJ) and nucleotide exchange factor (GrpE/BAG). The mechanism:

  • ATP-bound state: Substrate-binding domain is open (lid up), fast on/off kinetics, low affinity
  • ATP hydrolysis (stimulated by Hsp40): Lid closes, trapping the substrate. High affinity, slow off-rate
  • Nucleotide exchange (by GrpE/BAG): ADP is replaced by ATP, lid opens, substrate is released

Hsp70 recognizes short hydrophobic segments (~5 residues) that are exposed in unfolded or misfolded proteins. By repeatedly binding and releasing, Hsp70 prevents aggregation and gives the substrate multiple opportunities to fold correctly — another form of iterative annealing.

Kinetic Partitioning Model

The competition between productive folding and aggregation can be described by kinetic partitioning:

$$\text{Yield}_{\text{native}} = \frac{k_{\text{fold}}}{k_{\text{fold}} + k_{\text{agg}}[\text{U}]}$$

Chaperones act by reducing $k_{\text{agg}}$ (sequestering aggregation-prone intermediates) and effectively increasing the folding yield. This is particularly important during heat shock, where the concentration of unfolded proteins rises dramatically.

Applications: Protein Misfolding & Disease

Amyloid Diseases

Amyloidoses are characterized by the deposition of cross-$\beta$ fibrils — highly ordered, insoluble protein aggregates with a shared structural core: a stack of $\beta$-strands running perpendicular to the fibril axis with inter-strand H-bonds running parallel to the axis (the cross-$\beta$ motif).

Major Amyloid Diseases

  • Alzheimer's disease: $\text{A}\beta_{42}$ peptide and tau protein fibrils
  • Parkinson's disease: $\alpha$-synuclein Lewy body fibrils
  • Type 2 diabetes: IAPP (amylin) fibrils in pancreatic islets
  • Huntington's disease: polyglutamine (polyQ) expansion in huntingtin protein
  • ALS: SOD1, TDP-43, FUS aggregates
  • Systemic amyloidosis: immunoglobulin light chain (AL), transthyretin (ATTR)

The thermodynamic driving force for amyloid formation is that the cross-$\beta$ structure is often the global free energy minimum for polypeptide chains. Native protein structures are kinetically trapped metastable states separated from the amyloid state by large barriers.

Prion Diseases

Prions ($\text{PrP}^{\text{Sc}}$) are infectious misfolded forms of the prion protein ($\text{PrP}^{\text{C}}$). The protein-only hypothesis (Prusiner, Nobel 1997) states that $\text{PrP}^{\text{Sc}}$ propagates by templating the conversion of normal$\text{PrP}^{\text{C}}$ to the misfolded form. The conversion involves a dramatic structural rearrangement:$\text{PrP}^{\text{C}}$ is predominantly $\alpha$-helical, while $\text{PrP}^{\text{Sc}}$ is rich in $\beta$-sheet. The mechanism follows a nucleated polymerization model:

$$\text{Rate} = k_{\text{elong}}[\text{PrP}^{\text{C}}][\text{seeds}] + k_{\text{nuc}}[\text{PrP}^{\text{C}}]^n$$

where the first term describes elongation (fast) and the second describes de novo nucleation (extremely slow, accounting for long incubation periods).

Drug Design Targeting Misfolded Proteins

  • Kinetic stabilizers: Tafamidis stabilizes the native tetrameric form of transthyretin (TTR), preventing dissociation to monomers that form amyloid. FDA-approved for ATTR cardiomyopathy.
  • Anti-amyloid antibodies: Lecanemab and aducanumab target $\text{A}\beta$ aggregates. Lecanemab (FDA-approved 2023) shows modest but significant slowing of cognitive decline.
  • Chemical chaperones: Small molecules (e.g., 4-phenylbutyrate) that stabilize native protein conformations, used in cystic fibrosis (stabilizing $\Delta\text{F508}$ CFTR).
  • Aggregation inhibitors: Compounds that cap growing fibril ends or redirect aggregation pathways toward off-pathway, non-toxic species.

Python Simulation: Two-State Chevron Plot & Kinetics

This simulation generates the chevron plot showing how $\ln(k_{\text{obs}})$ varies with denaturant concentration, along with the equilibrium denaturation curve and the free energy dependence on [denaturant].

Two-State Folding: Chevron Plot, Free Energy, and Denaturation Curve

Python
chevron_plot.py106 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Python Simulation: Zimm-Bragg Helix-Coil Transition

Exploring the Zimm-Bragg model with the transfer matrix eigenvalue solution. The three panels show the effects of the nucleation parameter $\sigma$, chain length $N$, and temperature on the sharpness of the helix-coil transition.

Zimm-Bragg Helix-Coil Transition: Nucleation, Chain Length, and Temperature

Python
zimm_bragg.py133 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Python Simulation: Folding Kinetics & $\Phi$-Value Analysis

This simulation shows: (1) two-state folding kinetics at different denaturant concentrations, (2) a simulated $\Phi$-value analysis scatter plot mapping transition state structure, and (3) free energy profiles comparing two-state, three-state, and downhill folding mechanisms.

Folding Kinetics, Phi-Value Analysis, and Free Energy Profiles

Python
folding_kinetics.py121 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Folding Rate Predictors & Contact Order

Relative Contact Order (RCO)

Plaxco, Simons, and Baker (1998) discovered that the folding rates of two-state proteins correlate remarkably well with a simple topological property: the relative contact order.

$$\boxed{\text{RCO} = \frac{1}{L \cdot N_c}\sum_{i < j}^{N_c} |i - j|}$$

where $L$ is the total number of residues, $N_c$ is the number of native contacts, and$|i - j|$ is the sequence separation between contacting residues $i$ and $j$. The correlation with folding rate is:

$$\ln k_f \approx a - b \times \text{RCO}$$

with correlation coefficient $r \approx 0.8$. Proteins with predominantly local contacts ($\alpha$-helical, low RCO) fold faster than those with many non-local contacts ($\beta$-sheet rich, high RCO). This supports the idea that topology is the primary determinant of folding rate.

Chain Length Dependence

For two-state folders, the folding rate also depends on chain length:

$$\ln k_f \approx c - d \cdot L^{0.6}$$

The exponent $\sim 0.6$ is consistent with polymer theory: the conformational search time scales with the number of effective chain segments raised to a power related to the Flory exponent.

Nucleation-Condensation Mechanism

The dominant mechanism for two-state folding is nucleation-condensation(Fersht, 1995). In contrast to the earlier framework model (secondary structure forms first) and hydrophobic collapse model (collapse occurs first), nucleation-condensation proposes that:

  • A diffuse folding nucleus forms in the transition state, comprising elements of secondary and tertiary structure simultaneously
  • The nucleus is stabilized by a combination of local (secondary structure) and non-local (tertiary) interactions
  • Once the nucleus forms, the rest of the chain rapidly condenses around it
  • $\Phi$-values for nucleus residues are typically 0.3–0.7 (fractional, not fully native-like)

Diffusion-Collision Model

For larger proteins, the diffusion-collision model (Karplus and Weaver) describes folding as a hierarchical process: pre-formed microdomains (secondary structure elements) diffuse and collide to form the native tertiary structure. The rate depends on the diffusion rate of the microdomains and the probability that a collision is productive:

$$k_{\text{fold}} = k_{\text{diff}} \times P_{\text{productive}} \times P_{\text{correct}}$$

where $P_{\text{productive}}$ is the probability that colliding microdomains are correctly oriented and$P_{\text{correct}}$ accounts for the combinatorics of forming all necessary contacts.

Experimental Methods for Studying Folding

Stopped-Flow Kinetics

The workhorse for measuring folding/unfolding kinetics on the millisecond timescale. Two solutions (protein + denaturant at different concentrations) are rapidly mixed (dead time ~1 ms), and the signal (fluorescence, CD, absorbance) is monitored as a function of time. The observed rate constant is extracted by fitting to single or multi-exponential functions:

$$S(t) = S_\infty + \sum_i A_i \exp(-k_i t)$$

Temperature Jump (T-Jump)

Ultrafast heating (nanoseconds) using infrared laser pulses or electrical discharge perturbs the folding equilibrium, enabling the study of folding dynamics on the microsecond timescale. The temperature change is typically 5–15°C. Combined with fluorescence or IR spectroscopy, T-jump can reveal the earliest events in folding: helix formation, hydrophobic collapse, and the formation of the folding nucleus.

Hydrogen/Deuterium Exchange (HDX)

Backbone amide hydrogens exchange with solvent D$_2$O at rates that depend on their structural environment. In a native protein, amides involved in H-bonds or buried in the core exchange slowly (protection factors of $10^3$ to $10^8$). The exchange rate is:

$$k_{\text{ex}} = k_{\text{int}} \cdot \frac{k_{\text{op}}}{k_{\text{op}} + k_{\text{cl}}} \approx \frac{k_{\text{int}} \cdot k_{\text{op}}}{k_{\text{cl}}} = \frac{k_{\text{int}}}{P_f}$$

where $k_{\text{int}}$ is the intrinsic (unprotected) exchange rate, $k_{\text{op}}$ and$k_{\text{cl}}$ are the local opening and closing rates, and the protection factor $P_f = k_{\text{cl}}/k_{\text{op}}$. Under EX2 conditions (most physiological):

$$\Delta G_{\text{HDX}} = -RT\ln\!\left(\frac{k_{\text{ex}}}{k_{\text{int}}}\right) = RT \ln P_f$$

HDX monitored by NMR provides residue-level information; HDX-MS provides peptide-level resolution for larger proteins and complexes.

Single-Molecule FRET

Fluorescence resonance energy transfer between donor and acceptor dyes attached to specific sites on the protein reports on intramolecular distances in real time. The FRET efficiency is:

$$E = \frac{1}{1 + (r/R_0)^6}$$

where $r$ is the donor-acceptor distance and $R_0$ is the Forster radius (typically 40–60 ƅ). Single-molecule experiments reveal conformational heterogeneity and rare folding intermediates that are hidden in ensemble-averaged experiments.

Key Equations Summary

Two-State Equilibrium

$$K = \frac{[\text{U}]}{[\text{N}]} = \frac{k_u}{k_f} = \exp\!\left(\frac{\Delta G}{RT}\right)$$

$\Phi$-Value Definition

$$\Phi = \frac{\Delta\Delta G^\ddagger}{\Delta\Delta G°} = \frac{-RT\ln(k_f^{\text{mut}}/k_f^{\text{wt}})}{-RT\ln(K_{\text{eq}}^{\text{mut}}/K_{\text{eq}}^{\text{wt}})}$$

Zimm-Bragg Helix Fraction

$$\theta \approx \frac{1}{2}\left(1 + \frac{s - 1}{\sqrt{(1-s)^2 + 4\sigma s}}\right)$$

Tanford $\beta$-Value

$$\beta_T = \frac{m_f}{m_f - m_u}$$

FRET Efficiency

$$E = \frac{1}{1 + (r/R_0)^6}$$

Relative Contact Order

$$\text{RCO} = \frac{1}{L \cdot N_c}\sum_{i < j}^{N_c} |i - j|$$