Protein Structure & Folding

How do linear chains of amino acids spontaneously fold into precise three-dimensional structures that perform all the functions of life? Explore the physics, chemistry, and quantum mechanics of protein folding.

🧬 Primary to Quaternary Structure📐 Anfinsen's Principle⏱️ Levinthal's Paradox🌄 Energy Landscapes

The Protein Folding Problem

Proteins are the workhorses of biology—enzymes, structural elements, transporters, signaling molecules, and more. Each protein is a polymer of 20 different amino acids, and the sequence determines the structure, which determines the function. But how does a protein "know" how to fold?

Anfinsen's Dogma (1973)

Christian Anfinsen demonstrated that the native structure of a protein is determined by its amino acid sequenceand represents the thermodynamic minimum of free energy under physiological conditions.

His experiments with ribonuclease A showed that denatured proteins can spontaneously refold to their native state, proving that all the information needed for folding is encoded in the sequence.

Levinthal's Paradox (1969)

If a protein were to sample all possible conformations randomly, even a small protein with 100 residues would require > 10²⁷ years to find the correct fold. Yet proteins fold in microseconds to seconds!

Resolution: Proteins do not search randomly—they follow folding pathways guided by energy landscapes.

📺 Video Lectures

Comprehensive lecture series from MIT 5.08J Biological Chemistry II with Prof. Elizabeth Nolan, covering protein folding mechanisms, kinetics, energy landscapes, and experimental techniques.

Lecture 8: Protein Folding 1

Introduction to protein folding: primary through quaternary structure, forces stabilizing folded proteins, and the thermodynamics of folding. Prof. Nolan explores Anfinsen's principle and Levinthal's paradox.

Recitation 3: Pre-Steady State and Steady-State Kinetic Methods Applied to Translation

Application of kinetic methods to study biological processes, focusing on translation machinery. Essential background for understanding protein folding kinetics and experimental approaches.

Lecture 9: Protein Folding 2

Energy landscapes, folding funnels, and kinetic pathways. Discussion of molten globule states, folding intermediates, and the role of conformational entropy in guiding folding.

Lecture 10: Protein Folding 3

Experimental methods for studying protein folding: circular dichroism, fluorescence spectroscopy, hydrogen-deuterium exchange, and NMR. Introduction to chaperones and protein quality control.

Lecture 11: Protein Folding 4

Molecular chaperones (GroEL/GroES, Hsp70, Hsp90), protein misfolding diseases (Alzheimer's, Parkinson's, prion diseases), and the cellular protein quality control system. Proteostasis and aggregation.

Connecting Molecular Details to Macroscopic Behaviors with Thermodynamics and Information Theory

An advanced lecture exploring how thermodynamic principles and information theory bridge molecular-scale details (amino acid sequences, atomic interactions, conformational states) to macroscopic observables (folding rates, stability, function). This connects statistical mechanics, entropy, free energy landscapes, and information content in sequences to the emergent behavior of protein systems.

Key Concepts:

  • Statistical mechanics of protein ensembles
  • Thermodynamic principles governing folding transitions
  • Information theory and sequence entropy
  • Bridging microscopic and macroscopic descriptions
  • Free energy landscapes from molecular interactions
  • Evolutionary information in protein families

Course Information

MIT 5.08J Biological Chemistry II
Instructor: Prof. Elizabeth Nolan
These lectures provide comprehensive coverage of protein folding from thermodynamic, kinetic, and structural perspectives, essential for understanding how sequence determines structure and function.

Four Levels of Protein Structure

1. Primary Structure

The linear sequence of amino acids connected by peptide bonds. This is the genetic information translated from mRNA.

NH₂ - Gly - Ala - Val - Leu - ... - Tyr - Trp - COOH

The sequence completely determines all higher levels of structure (Anfinsen's principle).

2. Secondary Structure

Local folding patterns stabilized by hydrogen bonds between backbone atoms:

  • α-helix: Right-handed helix with 3.6 residues per turn (φ ≈ -60°, ψ ≈ -45°)
  • β-sheet: Extended strands connected by hydrogen bonds (parallel or antiparallel)
  • β-turn: Reversal of chain direction, often connecting β-strands
  • Random coil: Unstructured regions with no regular pattern

Ramachandran angles define allowed conformations:

$\phi = \text{C}_{\text{i-1}} - \text{N}_{\text{i}} - \text{C}_{\alpha,\text{i}} - \text{C}_{\text{i}}$
$\psi = \text{N}_{\text{i}} - \text{C}_{\alpha,\text{i}} - \text{C}_{\text{i}} - \text{N}_{\text{i+1}}$

Dihedral angles defining backbone conformation

Derivation: Zimm-Bragg Helix-Coil Transition

The Zimm-Bragg model describes the cooperative transition between random coil and alpha-helix using a nucleation-propagation framework with two parameters.

Step 1: Define the nucleation and propagation parameters

Each residue is either in coil (c) or helix (h) state. Define the propagation parameter $s$ (equilibrium constant for adding a helical residue to an existing helix) and the nucleation parameter $\sigma$ (penalty for initiating a new helical segment, typically $\sigma \sim 10^{-3} - 10^{-4}$):

$$s = \exp\left(-\frac{\Delta G_{\text{prop}}}{k_BT}\right), \quad \sigma = \exp\left(-\frac{\Delta G_{\text{nuc}}}{k_BT}\right)$$

Step 2: Construct the transfer matrix

The statistical weight of each residue depends on its state and that of its predecessor. The 2x2 transfer matrix $\mathbf{M}$ relates the partition function contributions:

$$\mathbf{M} = \begin{pmatrix} 1 & \sigma s \\ 1 & s \end{pmatrix}$$

where rows represent the current state (c, h) and columns the next state (c, h).

Step 3: Write the partition function

For a chain of N residues, the partition function is obtained by matrix multiplication:

$$Z_N = (1, 0) \cdot \mathbf{M}^N \cdot \begin{pmatrix} 1 \\ 1 \end{pmatrix}$$

Step 4: Solve via eigenvalues

For large N, $Z_N$ is dominated by the largest eigenvalue $\lambda_+$ of $\mathbf{M}$:

$$\lambda_{\pm} = \frac{(1+s) \pm \sqrt{(1-s)^2 + 4\sigma s}}{2}$$

Step 5: Calculate the helical fraction

The fraction of residues in helix state is obtained from the derivative of the partition function:

$$\theta = \frac{\partial \ln Z_N}{\partial \ln s} \approx \frac{1}{2} + \frac{s - 1}{2\sqrt{(1-s)^2 + 4\sigma s}}$$

The transition is centered at $s = 1$ and its sharpness is controlled by $\sigma$: smaller $\sigma$ gives a more cooperative (sharper) transition, consistent with the observation that helix nucleation is the rate-limiting step.

Derivation: Polymer Scaling Laws for Protein Chains

The end-to-end distance of an unfolded protein chain follows scaling laws from polymer physics, with the exponent $\nu$ depending on the chain model and solvent conditions.

Step 1: The ideal (Gaussian) chain — random walk

Model the unfolded protein as N freely jointed segments of length b. Each step is independent and random. The mean-square end-to-end distance is the sum of uncorrelated steps:

$$\langle R^2 \rangle = \sum_{i=1}^{N}\sum_{j=1}^{N}\langle \mathbf{b}_i \cdot \mathbf{b}_j \rangle = Nb^2 \quad (\text{since } \langle \mathbf{b}_i \cdot \mathbf{b}_j \rangle = b^2\delta_{ij})$$

Step 2: Extract the scaling exponent for the ideal chain

The root-mean-square end-to-end distance scales as:

$$R = \sqrt{\langle R^2 \rangle} = bN^{1/2} \implies R \sim N^{\nu} \text{ with } \nu = \frac{1}{2}$$

This applies when the chain has no excluded volume — for example, in a theta solvent where polymer-solvent and polymer-polymer interactions exactly cancel.

Step 3: Self-avoiding walk — the Flory argument

Real chains cannot overlap. Flory balanced the entropic elasticity (which favors $R \sim N^{1/2}$) against the excluded volume repulsion energy. The free energy as a function of R:

$$F(R) \sim k_BT\left(\frac{R^2}{Nb^2} + \frac{vN^2}{R^3}\right)$$

where $v$ is the excluded volume per monomer. The first term is elastic free energy; the second is the two-body repulsion energy density.

Step 4: Minimize to find the Flory exponent

Setting $\partial F/\partial R = 0$:

$$\frac{2R}{Nb^2} - \frac{3vN^2}{R^4} = 0 \implies R^5 \sim vN^3b^2$$

$$R \sim N^{3/5} \implies \nu = \frac{3}{5} = 0.6$$

Step 5: Biological significance

The exact value from renormalization group theory is $\nu \approx 0.588$ in 3D, close to Flory's estimate. For denatured proteins in good solvent: $R_g \approx 2.0 \times N^{0.59}$ A. In poor solvent (collapse): $\nu = 1/3$ (compact globule). The folding funnel describes the transition from $\nu \approx 0.6$ (unfolded) through $\nu \approx 1/3$ (molten globule) to the native state.

3. Tertiary Structure

The overall 3D shape of a single polypeptide chain, determined by:

  • Disulfide bonds (S-S): Covalent bonds between cysteine residues
  • Hydrophobic effect: Nonpolar residues buried in core, polar residues on surface
  • Hydrogen bonds: Between side chains and backbone
  • Electrostatic interactions: Salt bridges between charged residues
  • Van der Waals forces: Weak attractions between atoms in close proximity

Free energy of folding:

$\Delta G_{\text{fold}} = \Delta H - T\Delta S$

Typically ΔGfold ≈ -5 to -15 kcal/mol (marginally stable!)

The balance between enthalpic stabilization (favorable interactions) and entropic cost (loss of conformational freedom) is delicate—proteins are only marginally stable.

4. Quaternary Structure

The assembly of multiple polypeptide chains (subunits) into a functional complex.

Examples:

  • Hemoglobin: α₂β₂ tetramer (4 subunits) — oxygen transport
  • DNA polymerase III: 10+ subunits — DNA replication
  • Proteasome: 28 subunits — protein degradation
  • Ribosome: 50+ proteins + RNA — protein synthesis

Subunit interactions provide allosteric regulation, increased stability, and functional diversity.

Energy Landscapes and Folding Funnels

Modern understanding of protein folding views the process as navigation through a funnel-shaped energy landscape. This explains how proteins fold quickly despite the astronomical number of possible conformations.

The Folding Funnel Model

Instead of a single folding pathway, proteins fold via many parallel routes through conformational space, all leading downhill toward the native state:

  • Unfolded ensemble: High entropy, high energy, many conformations
  • Molten globule: Partially collapsed with some secondary structure
  • Transition states: Rate-limiting barriers along folding pathways
  • Native state: Low entropy, low energy, unique structure

Conformational entropy vs. energy:

$S_{\text{conf}} = k_B \ln \Omega(\mathbf{r})$

Ω(r) = number of accessible conformations at structure r

$F(\mathbf{r}) = E(\mathbf{r}) - TS_{\text{conf}}(\mathbf{r})$

Free energy surface: folding proceeds downhill in F, not just E

Folding Kinetics

Two-state folding model (applicable to small, single-domain proteins):

$$\text{U} \underset{k_u}{\overset{k_f}{\rightleftharpoons}} \text{N}$$

U = unfolded, N = native, kf = folding rate, ku = unfolding rate

$$k_f = k_0 \exp\left(-\frac{\Delta G^\ddagger}{k_B T}\right)$$

Arrhenius/Eyring equation: folding rate depends on transition state barrier height

Typical folding times range from microseconds (ultra-fast folders) to seconds (complex multi-domain proteins).

Derivation: Two-State Folding Rate from the Free Energy Landscape

Starting from the free energy landscape picture, we derive the folding rate constant for a two-state folder that crosses a single dominant barrier.

Step 1: Define the two-state equilibrium

For a protein that folds without detectable intermediates, the equilibrium between unfolded (U) and native (N) states is characterized by:

$$K_{eq} = \frac{[\text{N}]}{[\text{U}]} = \frac{k_f}{k_u} = \exp\left(-\frac{\Delta G_{\text{fold}}}{k_BT}\right)$$

Step 2: Apply Kramers' theory for barrier crossing

In the high-friction (overdamped) regime relevant to protein folding in aqueous solution, the rate of barrier crossing is given by Kramers' theory. The rate depends on the curvature at the well and the barrier top:

$$k_f = \frac{\omega_U \omega^{\ddagger}}{2\pi\gamma}\exp\left(-\frac{\Delta G^{\ddagger}_{U \to N}}{k_BT}\right)$$

where $\omega_U$ and $\omega^{\ddagger}$ are angular frequencies at the unfolded well and barrier top, and $\gamma$ is the friction coefficient.

Step 3: Simplify to the Arrhenius-like form

Collecting the prefactor terms into a single attempt frequency $k_0$ (typically $10^5 - 10^7$ s$^{-1}$ for protein folding):

$$k_f = k_0 \exp\left(-\frac{\Delta G^{\ddagger}}{k_BT}\right)$$

Step 4: Relate barrier height to folding speed

For a barrier of $\Delta G^{\ddagger} \approx 5 k_BT$ (a fast folder) with $k_0 \approx 10^6$ s$^{-1}$: $k_f \approx 10^6 \times e^{-5} \approx 6700$ s$^{-1}$, giving a folding time of ~150 $\mu$s. For a barrier of $\Delta G^{\ddagger} \approx 15 k_BT$: $k_f \approx 10^6 \times e^{-15} \approx 0.3$ s$^{-1}$, giving a folding time of ~3 s.

Step 5: The observed relaxation rate

In a perturbation experiment (e.g., temperature jump), the observed relaxation rate to equilibrium is the sum of folding and unfolding rates:

$$k_{\text{obs}} = k_f + k_u$$

The fraction folded relaxes as $f_N(t) = f_N^{eq} + (f_N(0) - f_N^{eq})e^{-k_{\text{obs}}t}$, giving a single-exponential decay characteristic of two-state kinetics.

Derivation: Arrhenius Temperature Dependence of Folding Rates

The Arrhenius equation describes how reaction rates depend on temperature, providing a powerful tool for extracting activation energies from experimental kinetic data.

Step 1: Start from the Eyring/Kramers rate expression

$$k = k_0 \exp\left(-\frac{\Delta G^{\ddagger}}{k_BT}\right) = k_0 \exp\left(-\frac{E_a}{RT}\right)$$

where $E_a$ is the activation energy (closely related to $\Delta H^{\ddagger}$).

Step 2: Take the natural logarithm of both sides

$$\ln(k) = \ln(k_0) - \frac{E_a}{RT} = \ln(A) - \frac{E_a}{R}\cdot\frac{1}{T}$$

where $A = k_0$ is the pre-exponential (frequency) factor.

Step 3: Identify the Arrhenius plot

Plotting $\ln(k)$ vs. $1/T$ yields a straight line with slope $= -E_a/R$ and y-intercept $= \ln(A)$. For protein folding, this plot is often curved (non-Arrhenius behavior) due to the large heat capacity change $\Delta C_p^{\ddagger}$ associated with burying hydrophobic surface area.

Step 4: Extract activation energy from two temperatures

Measuring rates at two temperatures $T_1$ and $T_2$ allows direct calculation:

$$E_a = \frac{R \ln(k_2/k_1)}{1/T_1 - 1/T_2}$$

For typical small protein folding: $E_a \approx 30-80$ kJ/mol, corresponding to a 2-4 fold rate increase per 10 K rise in temperature.

Quantum Mechanics in Protein Folding

While protein folding is often treated classically, quantum effects contribute at multiple levels:

1. Hydrogen Bonding

H-bonds stabilizing secondary structures involve quantum mechanical effects:

  • Proton delocalization and tunneling between donor/acceptor
  • Zero-point vibrational energy affects bond strengths
  • Cooperative effects in α-helices and β-sheets require quantum treatment
$E_{\text{H-bond}} \approx -5 \text{ kcal/mol} \approx -0.2 \text{ eV}$

Comparable to kBT at room temperature — quantum effects matter!

2. Electronic Structure of Side Chains

Aromatic residues (Phe, Tyr, Trp), disulfide bonds, and metal coordination sites require quantum chemical treatment:

  • π-π stacking interactions between aromatic rings
  • Cation-π interactions (Arg/Lys with aromatics)
  • Charge transfer and dispersion forces

3. Hydrophobic Effect

The primary driving force for folding—burial of nonpolar residues—has quantum origins:

Water structure around hydrophobic groups involves hydrogen bonding networks that depend on quantum nuclear effects (particularly important for accurate computational modeling).

4. Conformational Dynamics

Proteins are not static—they undergo thermal fluctuations and conformational changes:

  • Quantum tunneling through torsional barriers in backbone and side chains
  • Zero-point motion affects conformational sampling
  • Protein breathing motions and allosteric transitions may involve quantum coherence

Protein Misfolding and Disease

When proteins fail to fold correctly, the consequences can be catastrophic. Misfolded proteins are implicated in numerous diseases.

Protein Aggregation Diseases

Alzheimer's Disease:Amyloid-β plaques and tau tangles
Parkinson's Disease:α-synuclein aggregation in Lewy bodies
Huntington's Disease:Huntingtin protein with expanded polyglutamine repeats
Prion Diseases:Infectious misfolded PrP proteins (Creutzfeldt-Jakob disease, mad cow disease)
Type 2 Diabetes:Islet amyloid polypeptide aggregation

Molecular Chaperones

Cells employ specialized proteins called chaperones to assist folding and prevent aggregation:

  • Hsp70 family: Bind hydrophobic patches on nascent chains
  • GroEL/GroES (Hsp60): Barrel-shaped chamber providing isolated folding environment
  • Hsp90: Stabilizes metastable conformations of signaling proteins
  • Small HSPs: Prevent aggregation under stress conditions

Computational Protein Folding

Predicting protein structure from sequence is one of the grand challenges in molecular biology.

Molecular Dynamics Simulations

Classical MD simulations propagate Newton's equations of motion for all atoms:

$m_i \frac{d^2\mathbf{r}_i}{dt^2} = -\nabla_i V(\mathbf{r}_1, \dots, \mathbf{r}_N)$

Force on atom i from potential energy function V

$V = V_{\text{bonds}} + V_{\text{angles}} + V_{\text{torsions}} + V_{\text{nonbonded}}$

Limitations: Force fields are approximate, long timescales difficult to reach, quantum effects neglected.

AlphaFold Revolution

DeepMind's AlphaFold2 (2020) achieved near-experimental accuracy in structure prediction using deep learning:

  • Transformer-based neural network trained on PDB structures
  • Multiple sequence alignments capture evolutionary constraints
  • Attention mechanisms model residue-residue contacts
  • Predicts inter-residue distances and backbone angles

Impact:

AlphaFold has predicted structures for > 200 million proteins across all known species, revolutionizing structural biology and drug discovery.

📚 Key References

Anfinsen, C. B. (1973)

"Principles that Govern the Folding of Protein Chains"

Science 181(4096): 223-230. Nobel Prize lecture.

Dill, K. A. & MacCallum, J. L. (2012)

"The Protein-Folding Problem, 50 Years On"

Science 338(6110): 1042-1046.

Jumper, J. et al. (2021)

"Highly accurate protein structure prediction with AlphaFold"

Nature 596: 583-589.

Dobson, C. M. (2003)

"Protein folding and misfolding"

Nature 426: 884-890.