Protein Structure & Folding

From secondary structure geometry through the hydrophobic effect and Levinthal's paradox to experimental methods for structure determination.

Derivation 1: Secondary Structure Geometry

The $\alpha$-Helix

The $\alpha$-helix, first predicted by Pauling and Corey (1951), is a right-handed helix stabilized by backbone hydrogen bonds between $\text{C=O}_i$ and $\text{N-H}_{i+4}$. Its geometry is defined by three key parameters:

Residues per turn (n): 3.6

Rise per residue (d): 1.5 Å

Pitch (p = n × d): 3.6 × 1.5 = 5.4 Å

H-bond pattern: $i \rightarrow i+4$

Dihedral angles: $\phi = -57°, \; \psi = -47°$

The angular rotation per residue about the helix axis is:

$$\theta = \frac{360°}{n} = \frac{360°}{3.6} = 100°\;\text{per residue}$$

The H-bond length ($\text{N}\cdots\text{O}$) is approximately 2.8 Å, close to optimal for an $\text{N-H}\cdots\text{O=C}$ hydrogen bond. The helix dipole creates a partial positive charge at the N-terminus and partial negative charge at the C-terminus, with a net dipole moment of approximately$0.5n$ Debye units (where $n$ is the number of residues). This dipole often stabilizes negatively charged ligands bound near the N-terminal end.

The $\beta$-Sheet

$\beta$-Sheets consist of extended strands connected by interchain hydrogen bonds. Two arrangements:

Parallel $\beta$-Sheet

$\phi = -119°, \; \psi = +113°$. Strands run same direction. H-bonds are evenly spaced but slightly angled. Rise per residue: 3.2 Å.

Antiparallel $\beta$-Sheet

$\phi = -139°, \; \psi = +135°$. Strands run opposite directions. H-bonds are perpendicular to strands (more stable). Rise per residue: 3.4 Å.

$3_{10}$ and $\pi$ Helices

Two other helical forms exist but are less common:

$3_{10}$-helix: H-bond pattern$i \rightarrow i+3$. 3.0 residues/turn, rise 2.0 Å/residue, pitch 6.0 Å.$\phi = -49°, \; \psi = -26°$. Tighter than $\alpha$, often found at helix termini. The name means 3 residues per turn, 10 atoms in the H-bonded ring.

$\pi$-helix: H-bond pattern$i \rightarrow i+5$. 4.4 residues/turn, rise 1.15 Å/residue, pitch 5.0 Å.$\phi = -57°, \; \psi = -70°$. Wider and less compact. Rare, found as insertions within $\alpha$-helices.

Deriving Helix Parameters from Bond Geometry

For any regular helix with $n$ residues per turn, rise per residue $d$, and H-bond from residue $i$ to residue $i+k$, the number of atoms in the H-bonded ring is:

$$\text{Ring size} = 3k + 1$$

For $\alpha$-helix ($k = 4$): ring size = 13. For $3_{10}$ ($k = 3$): ring size = 10. For $\pi$ ($k = 5$): ring size = 16. The relationship between pitch, rise, and residues per turn:

$$\boxed{p = n \cdot d}, \qquad \theta = \frac{360°}{n}$$

Derivation 2: Ramachandran Analysis from van der Waals Radii

G.N. Ramachandran and colleagues (1963) showed that the allowed backbone conformations of polypeptides can be predicted from hard-sphere contact distances between non-bonded atoms.

Coordinate Generation

Given a set of $(\phi, \psi)$ values and a fixed $\omega = 180°$, the Cartesian coordinates of all backbone atoms can be computed using standard bond lengths and angles:

  • $d(\text{N-C}_\alpha) = 1.47\;\text{\AA}$
  • $d(\text{C}_\alpha\text{-C}) = 1.53\;\text{\AA}$
  • $d(\text{C-N}) = 1.33\;\text{\AA}$ (partial double bond)
  • $d(\text{C=O}) = 1.24\;\text{\AA}$
  • $\angle(\text{N-C}_\alpha\text{-C}) = 111°$
  • $\angle(\text{C}_\alpha\text{-C-N}) = 116°$
  • $\angle(\text{C-N-C}_\alpha) = 122°$

Van der Waals Contact Criteria

For each $(\phi, \psi)$ combination, interatomic distances between non-bonded atoms are computed. A conformation is disallowed if any distance is shorter than the minimum contact distance:

$$d_{ij} < r_i^{\text{vdW}} + r_j^{\text{vdW}} \quad \Rightarrow \quad \text{steric clash}$$

The critical van der Waals radii used are:

  • Carbon: $r_{\text{vdW}} = 1.7\;\text{\AA}$ (normal), $1.5\;\text{\AA}$ (outer limit)
  • Nitrogen: $r_{\text{vdW}} = 1.55\;\text{\AA}$ (normal), $1.4\;\text{\AA}$ (outer limit)
  • Oxygen: $r_{\text{vdW}} = 1.52\;\text{\AA}$ (normal), $1.35\;\text{\AA}$ (outer limit)
  • Hydrogen: $r_{\text{vdW}} = 1.2\;\text{\AA}$ (normal), $1.0\;\text{\AA}$ (outer limit)

The most critical contacts that define the boundaries of the allowed regions are:

$$\text{O}_i \cdots \text{C}_{\beta,i+1}, \quad \text{C}_{\beta,i} \cdots \text{N}_{i+1}, \quad \text{O}_i \cdots \text{O}_{i+1}, \quad \text{H}_{N,i} \cdots \text{O}_{i-1}$$

Using normal radii yields the fully allowed regions (only ~7.5% of the plot). Using outer limit radii yields partially allowed regions (~22% of the plot). Analysis of high-resolution crystal structures confirms that >99% of non-glycine residues fall within the partially allowed regions.

Derivation 3: The Hydrophobic Effect & Folding Thermodynamics

The hydrophobic effect is the dominant driving force for protein folding. When nonpolar residues are buried in the protein interior, water molecules that were ordered around the exposed hydrophobic surface are released to bulk solvent, increasing the overall entropy of the system.

Free Energy of Folding

The Gibbs free energy change for the unfolded-to-native transition is:

$$\boxed{\Delta G_{\text{fold}} = \Delta H_{\text{fold}} - T\Delta S_{\text{fold}}}$$

Crucially, native proteins are only marginally stable:$\Delta G_{\text{fold}} \approx -20$ to $-60\;\text{kJ/mol}$, which is the equivalent of just a few hydrogen bonds. This marginal stability is the net result of large, opposing contributions:

  • Stabilizing: van der Waals contacts, H-bonds, hydrophobic effect, ion pairs, disulfide bonds
  • Destabilizing: loss of conformational entropy ($-T\Delta S_{\text{conf}} \approx +600\;\text{kJ/mol}$ for a 100-residue protein)

Temperature Dependence: The Stability Curve

The large negative heat capacity change $\Delta C_p < 0$ for folding (due to burial of hydrophobic surface) causes the free energy to be a parabolic function of temperature. Using the Kirchhoff equations:

$$\Delta H(T) = \Delta H_m + \Delta C_p(T - T_m)$$
$$\Delta S(T) = \Delta S_m + \Delta C_p \ln\!\left(\frac{T}{T_m}\right) = \frac{\Delta H_m}{T_m} + \Delta C_p \ln\!\left(\frac{T}{T_m}\right)$$

Since $\Delta G(T_m) = 0$ (by definition of the melting temperature), $\Delta S_m = \Delta H_m/T_m$. Therefore:

$$\boxed{\Delta G(T) = \Delta H_m\!\left(1 - \frac{T}{T_m}\right) - \Delta C_p\!\left[(T_m - T) + T\ln\!\left(\frac{T}{T_m}\right)\right]}$$

This stability curve has an inverted parabolic shape. The protein is stable ($\Delta G < 0$) between two temperatures: the heat denaturation temperature $T_m$ and a lower cold denaturation temperature $T_c$. Maximum stability occurs at $T_s$ where $d\Delta G/dT = 0$, which gives:

$$\frac{d\Delta G}{dT} = -\Delta S(T) = 0 \quad \Rightarrow \quad T_s = T_m \exp\!\left(-\frac{\Delta H_m}{T_m \Delta C_p}\right)$$

Entropy-Enthalpy Compensation

The hydrophobic contribution near room temperature is entropy-driven:$\Delta H_{\text{hphob}} \approx 0$ but $T\Delta S_{\text{hphob}} > 0$ (release of ordered water). At higher temperatures, the ordered water cage is already partially disrupted, and the hydrophobic contribution becomes enthalpy-driven. This temperature crossover is captured by the large negative$\Delta C_p$:

$$\Delta C_p \approx -50\;\text{J mol}^{-1}\text{K}^{-1}\;\text{per buried methylene group}$$

The total $\Delta C_p$ for folding correlates with the change in accessible nonpolar surface area:$\Delta C_p \approx -1.34 \cdot \Delta A_{\text{np}}$ (cal mol$^{-1}$ K$^{-1}$ Å$^{-2}$).

Derivation 4: Levinthal's Paradox & the Folding Funnel

The Paradox

Cyrus Levinthal (1969) noted that if each residue in a polypeptide can adopt just 3 backbone conformations (corresponding to the three allowed regions of the Ramachandran plot), a protein of $N$ residues has$3^N$ possible conformations. For a modest 100-residue protein:

$$3^{100} \approx 5.15 \times 10^{47}\;\text{conformations}$$

If the protein samples conformations at the fastest possible rate (one per bond vibration,$\sim 10^{13}\;\text{s}^{-1}$), the time to exhaustively search all conformations would be:

$$t = \frac{3^{100}}{10^{13}} \approx 5 \times 10^{34}\;\text{seconds} \approx 1.6 \times 10^{27}\;\text{years}$$

This is astronomically longer than the age of the universe ($\sim 1.4 \times 10^{10}$ years). Yet real proteins fold in milliseconds to seconds. Therefore, protein folding cannot proceed by random conformational search.

Resolution: The Folding Funnel

The resolution comes from the energy landscape theory (Bryngelson, Wolynes, Onuchic, 1990s). The free energy landscape is not flat but funnel-shaped: there is a general thermodynamic bias toward the native state. The protein need not sample all conformations; instead, it follows a downhill gradient.

The folding funnel can be described by plotting free energy $G$ as a function of the number of native contacts $Q$:

$$G(Q) = E(Q) - TS(Q)$$

where $E(Q)$ decreases as native contacts form (energy stabilization) and $S(Q)$ decreases as the chain becomes more ordered (fewer accessible microstates). The configurational entropy at a given$Q$ can be approximated as:

$$S(Q) \approx k_B N(1 - Q)\ln\Omega_0$$

where $\Omega_0$ is the number of conformations per residue in the unfolded state. The funnel arises because $|dE/dQ| > T|dS/dQ|$ for a foldable protein — the energy gain from native contacts outpaces the entropy loss from ordering. The folding rate is then limited not by a combinatorial search but by the roughness of the landscape:

$$\boxed{k_{\text{fold}} \approx k_0 \exp\!\left(-\frac{\Delta G^\ddagger}{k_BT}\right)}$$

where $\Delta G^\ddagger$ is the free energy barrier at the transition state, typically$10\text{--}60\;\text{kJ/mol}$, and $k_0 \approx N/100\;\mu\text{s}^{-1}$ is the prefactor from the Kramers theory of diffusion over barriers.

Applications: Experimental Structure Determination

X-Ray Crystallography

The dominant method for atomic-resolution protein structures. A protein crystal diffracts X-rays ($\lambda \approx 1.5\;\text{\AA}$) according to Bragg's law:

$$n\lambda = 2d\sin\theta$$

The diffraction pattern provides amplitudes $|F_{hkl}|$, but the phases are lost (the phase problem). Phases are recovered by molecular replacement, isomorphous replacement (heavy atoms), or anomalous scattering (MAD/SAD).

Nuclear Magnetic Resonance (NMR)

NMR determines structures in solution (no crystal needed). Key experiments:

  • NOESY: Nuclear Overhauser Effect — measures through-space distances ($< 5\;\text{\AA}$)
  • COSY/TOCSY: Through-bond correlations for sequential assignment
  • HSQC: $^{1}\text{H}$-$^{15}\text{N}$ correlation — fingerprint of protein
  • Residual dipolar couplings: Long-range orientational constraints

Practical size limit is approximately 40–50 kDa (with deuteration and TROSY, up to ~100 kDa).

Cryo-Electron Microscopy (Cryo-EM)

The resolution revolution in cryo-EM (Nobel Prize 2017) now routinely achieves 2–3 Å resolution for large complexes. Single particle analysis images thousands of individual protein particles frozen in vitreous ice, computationally classifies their orientations, and reconstructs a 3D map. No crystallization is needed, and the method can capture multiple conformational states.

AlphaFold & Computational Prediction

AlphaFold2 (DeepMind, 2020) achieved near-experimental accuracy in protein structure prediction. The model uses a novel architecture combining multiple sequence alignments (MSAs) with a structure module that iteratively refines atomic coordinates. AlphaFold has predicted structures for essentially all known protein sequences (~200 million), dramatically expanding structural coverage.

Comparison of Methods

MethodResolutionSize RangeKey Requirement
X-ray1–3 ÅAny sizeCrystal
NMR2–4 Å< 50 kDaIsotope labeling
Cryo-EM2–4 Å> 50 kDaHomogeneous sample
AlphaFold~1–2 Å (GDT)Single chainsMSA / templates

Python Simulation: Folding Landscape, Stability Curve & Levinthal's Paradox

This simulation visualizes: (1) the folding free energy landscape as a function of native contacts, (2) the protein stability curve $\Delta G(T)$ showing both heat and cold denaturation, and (3) Levinthal's paradox demonstrating the impossibility of random conformational search.

Folding Funnel, Stability Curve, and Levinthal Paradox

Python
folding_landscape.py134 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Python Simulation: Hydrophobic Effect Thermodynamics

This simulation shows the enthalpy-entropy compensation in protein folding and the temperature dependence of the hydrophobic contribution, illustrating why the hydrophobic effect is entropy-driven at room temperature but enthalpy-driven at higher temperatures.

Enthalpy-Entropy Compensation and Hydrophobic Effect

Python
hydrophobic_thermodynamics.py94 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Tertiary & Quaternary Structure

Tertiary Structure: The Full 3D Fold

Tertiary structure describes the complete three-dimensional arrangement of all atoms in a single polypeptide chain. It arises from the packing of secondary structure elements and is stabilized by:

  • Hydrophobic packing: Nonpolar side chains cluster in the protein interior, minimizing contact with water. The hydrophobic core typically has a packing density of $\sim 0.75$, comparable to crystalline amino acids.
  • Hydrogen bonds: Between side chains, between side chains and backbone, and between backbone and water. A buried unsatisfied H-bond donor/acceptor costs approximately $\sim 20\;\text{kJ/mol}$.
  • Ionic interactions (salt bridges): Between oppositely charged side chains (e.g., Asp—Lys). Contribute $\sim 5\text{--}20\;\text{kJ/mol}$ depending on solvent exposure.
  • Disulfide bonds: Covalent $\text{S-S}$ bridges between Cys residues. Common in extracellular proteins but rare in the reducing cytoplasm. Bond energy $\sim 250\;\text{kJ/mol}$.
  • Van der Waals interactions: Individually weak ($\sim 2\text{--}4\;\text{kJ/mol}$) but collectively significant due to the large number of contacts in a tightly packed core.

Common Structural Motifs

  • Coiled coil: Two or more $\alpha$-helices wrapped around each other with a heptad repeat (abcdefg), where positions a and d are hydrophobic. Superhelical pitch $\approx 140\;\text{\AA}$. Examples: leucine zippers, keratin, myosin.
  • Greek key motif: Four antiparallel $\beta$-strands connected by hairpins in a pattern resembling the Greek key border design. Found in immunoglobulin domains.
  • $\beta$-barrel: A closed structure of $\beta$-strands. TIM barrel ($(\beta/\alpha)_8$) is the most common enzyme fold, found in ~10% of all known enzyme structures.
  • Rossmann fold: $\beta\alpha\beta\alpha\beta$ motif for dinucleotide (NAD$^+$/FAD) binding. One of the most ancient and conserved folds.
  • EF-hand: Helix-loop-helix Ca$^{2+}$-binding motif. The loop coordinates Ca$^{2+}$ through 6–7 oxygen ligands. Found in calmodulin, troponin C, parvalbumin.
  • Zinc finger: Small structural motif (~30 residues) stabilized by coordination of Zn$^{2+}$ by Cys and/or His residues. The C$_2$H$_2$ type is the most common DNA-binding domain in the human genome.

Quaternary Structure

Quaternary structure refers to the arrangement of two or more polypeptide subunits in a multisubunit complex. Subunit interfaces are stabilized by the same noncovalent forces as the protein interior. Oligomerization provides several advantages:

  • Cooperativity: Allosteric regulation requires multiple subunits (e.g., hemoglobin tetramer, $\alpha_2\beta_2$)
  • Genetic economy: Large structures from small genes (e.g., viral capsids use 60 copies of one or a few subunits)
  • Error reduction: Self-assembly provides quality control — misfolded subunits often fail to assemble
  • Active site formation: Catalytic sites at subunit interfaces (e.g., aspartate transcarbamoylase)

Symmetry in Oligomers

Most oligomeric proteins display point group symmetry:

  • C$_n$ (cyclic): n-fold rotational symmetry. Example: C$_3$ for GroES heptamer (actually C$_7$).
  • D$_n$ (dihedral): n-fold axis plus perpendicular 2-fold axes. Example: D$_2$ for hemoglobin tetramer, D$_7$ for GroEL.
  • Cubic symmetries: T (tetrahedral, 12 subunits), O (octahedral, 24), I (icosahedral, 60). Example: ferritin (O, 24-mer), viral capsids (I, 60n-mer).

The subunit dissociation constant is related to the free energy of association:

$$K_d = \frac{[\text{monomer}]^n}{[\text{oligomer}]} = \exp\!\left(\frac{\Delta G_{\text{assoc}}}{RT}\right)$$

Typical subunit interfaces bury $600\text{--}4000\;\text{\AA}^2$ of surface area per subunit, with$K_d$ values ranging from $\text{pM}$ (very stable, e.g., hemoglobin) to$\mu\text{M}$ (transient complexes).

Intrinsically Disordered Proteins (IDPs)

A significant fraction of eukaryotic proteins (~30–40%) contain long disordered regions that lack a fixed three-dimensional structure under physiological conditions. These intrinsically disordered proteins (IDPs) or regions (IDRs) challenge the classical structure-function paradigm.

Sequence Characteristics

IDPs are enriched in charged and polar residues (Glu, Lys, Arg, Gln, Ser, Pro) and depleted in hydrophobic residues (Trp, Phe, Ile, Leu, Val). This low hydrophobicity and high net charge prevent the formation of a compact hydrophobic core. The mean hydrophobicity $\langle H \rangle$ and mean net charge $\langle |R| \rangle$ can predict disorder using the Uversky plot:

$$\langle |R| \rangle = 2.785 \langle H \rangle - 1.151$$

Proteins above this line in the $(\langle H \rangle, \langle |R| \rangle)$ plane tend to be disordered; those below tend to be globular.

Functional Advantages of Disorder

  • Binding promiscuity: IDPs can fold upon binding to different partners, enabling one protein to interact with many targets (hub proteins in signaling networks)
  • Large interaction surfaces: Extended conformations provide larger binding interfaces than compact domains of the same sequence length
  • Post-translational modification: Disordered regions are primary sites for phosphorylation, ubiquitination, and other modifications
  • Entropic springs: Disordered linkers between domains allow conformational flexibility (e.g., titin PEVK domain)
  • Liquid-liquid phase separation: IDPs with multivalent interaction motifs drive the formation of membraneless organelles (P-bodies, stress granules, nucleoli)

Polymer Physics of IDPs

IDPs can be modeled as polymer chains. The radius of gyration scales with chain length as:

$$R_g = R_0 N^\nu$$

where $N$ is the number of residues, $R_0 \approx 2\;\text{\AA}$, and the Flory exponent$\nu$ depends on solvent conditions: $\nu = 0.588$ for a good solvent (self-avoiding random walk), $\nu = 0.5$ for a theta solvent (ideal chain), and $\nu = 1/3$ for a poor solvent (compact globule). Most IDPs in physiological conditions have $\nu \approx 0.5\text{--}0.6$.

Key Equations Summary

Helix Pitch

$$p = n \cdot d = 3.6 \times 1.5\;\text{\AA} = 5.4\;\text{\AA}$$

Protein Stability Curve

$$\Delta G(T) = \Delta H_m\!\left(1 - \frac{T}{T_m}\right) - \Delta C_p\!\left[(T_m - T) + T\ln\!\left(\frac{T}{T_m}\right)\right]$$

Levinthal's Number

$$\Omega = 3^N, \qquad t_{\text{search}} = \frac{3^N}{10^{13}\;\text{s}^{-1}}$$

Bragg's Law

$$n\lambda = 2d\sin\theta$$

Radius of Gyration (IDP)

$$R_g = R_0 N^\nu, \qquad \nu \approx 0.5\text{--}0.6$$

Steric Clash Criterion

$$d_{ij} < r_i^{\text{vdW}} + r_j^{\text{vdW}} \quad \Rightarrow \quad \text{disallowed}$$

Folding Rate (Kramers)

$$k_{\text{fold}} \approx k_0 \exp\!\left(-\frac{\Delta G^\ddagger}{k_BT}\right)$$

Subunit Dissociation

$$K_d = \frac{[\text{monomer}]^n}{[\text{oligomer}]}$$

Quantifying Non-Covalent Forces in Proteins

Protein stability results from a delicate balance of large opposing forces. Understanding each contribution quantitatively is essential for rational protein design and stability engineering.

Van der Waals Interactions

The Lennard-Jones potential describes the distance dependence of van der Waals interactions between non-bonded atoms:

$$V(r) = 4\epsilon\left[\left(\frac{\sigma}{r}\right)^{12} - \left(\frac{\sigma}{r}\right)^{6}\right]$$

where $\epsilon$ is the well depth ($\sim 0.5\text{--}1.0\;\text{kJ/mol}$),$\sigma$ is the distance at which $V = 0$, and the minimum occurs at$r_{\min} = 2^{1/6}\sigma$. The $r^{-12}$ term models Pauli repulsion and the $r^{-6}$ term models London dispersion attraction.

A typical protein buries 3,000–10,000 Å$^2$ of surface area, making thousands of van der Waals contacts. The collective contribution is roughly$-100$ to $-400\;\text{kJ/mol}$ for a typical small protein.

Hydrogen Bonds

The strength of an H-bond depends on geometry (distance and angle) and the dielectric environment. In vacuum, an N-H$\cdots$O=C H-bond is worth approximately $-20\;\text{kJ/mol}$. In aqueous solution, the net contribution to stability is smaller ($\sim -2$ to$-8\;\text{kJ/mol}$) because the protein must pay the cost of desolvating the donor and acceptor.

The electrostatic contribution to H-bonding follows Coulomb's law in a dielectric medium:

$$E = \frac{q_1 q_2}{4\pi\epsilon_0 \epsilon_r r}$$

where $\epsilon_r \approx 2\text{--}4$ in the protein interior (vs $\epsilon_r = 80$ in water), greatly amplifying electrostatic interactions in the hydrophobic core.

Conformational Entropy

The major cost of folding is the loss of backbone and side-chain conformational entropy. Each residue loses approximately $\sim 4\text{--}6\;\text{kJ/mol}$ of conformational entropy upon folding. For a 100-residue protein, this amounts to $\sim 400\text{--}600\;\text{kJ/mol}$ — a massive penalty that must be overcome by the sum of all favorable interactions.

$$-T\Delta S_{\text{conf}} \approx -T \cdot k_B \cdot N \cdot \ln\!\left(\frac{\Omega_U}{\Omega_N}\right) \approx +5N\;\text{kJ/mol at 300 K}$$

This is the fundamental reason why proteins are only marginally stable: the large favorable enthalpy from packing interactions and the large unfavorable conformational entropy nearly cancel, leaving a small net $\Delta G_{\text{fold}} \approx -20$ to $-60\;\text{kJ/mol}$.