Protein Structure & Folding
From secondary structure geometry through the hydrophobic effect and Levinthal's paradox to experimental methods for structure determination.
Derivation 1: Secondary Structure Geometry
The $\alpha$-Helix
The $\alpha$-helix, first predicted by Pauling and Corey (1951), is a right-handed helix stabilized by backbone hydrogen bonds between $\text{C=O}_i$ and $\text{N-H}_{i+4}$. Its geometry is defined by three key parameters:
Residues per turn (n): 3.6
Rise per residue (d): 1.5 Å
Pitch (p = n × d): 3.6 × 1.5 = 5.4 Å
H-bond pattern: $i \rightarrow i+4$
Dihedral angles: $\phi = -57°, \; \psi = -47°$
The angular rotation per residue about the helix axis is:
The H-bond length ($\text{N}\cdots\text{O}$) is approximately 2.8 Å, close to optimal for an $\text{N-H}\cdots\text{O=C}$ hydrogen bond. The helix dipole creates a partial positive charge at the N-terminus and partial negative charge at the C-terminus, with a net dipole moment of approximately$0.5n$ Debye units (where $n$ is the number of residues). This dipole often stabilizes negatively charged ligands bound near the N-terminal end.
The $\beta$-Sheet
$\beta$-Sheets consist of extended strands connected by interchain hydrogen bonds. Two arrangements:
Parallel $\beta$-Sheet
$\phi = -119°, \; \psi = +113°$. Strands run same direction. H-bonds are evenly spaced but slightly angled. Rise per residue: 3.2 Å.
Antiparallel $\beta$-Sheet
$\phi = -139°, \; \psi = +135°$. Strands run opposite directions. H-bonds are perpendicular to strands (more stable). Rise per residue: 3.4 Å.
$3_{10}$ and $\pi$ Helices
Two other helical forms exist but are less common:
$3_{10}$-helix: H-bond pattern$i \rightarrow i+3$. 3.0 residues/turn, rise 2.0 Å/residue, pitch 6.0 Å.$\phi = -49°, \; \psi = -26°$. Tighter than $\alpha$, often found at helix termini. The name means 3 residues per turn, 10 atoms in the H-bonded ring.
$\pi$-helix: H-bond pattern$i \rightarrow i+5$. 4.4 residues/turn, rise 1.15 Å/residue, pitch 5.0 Å.$\phi = -57°, \; \psi = -70°$. Wider and less compact. Rare, found as insertions within $\alpha$-helices.
Deriving Helix Parameters from Bond Geometry
For any regular helix with $n$ residues per turn, rise per residue $d$, and H-bond from residue $i$ to residue $i+k$, the number of atoms in the H-bonded ring is:
For $\alpha$-helix ($k = 4$): ring size = 13. For $3_{10}$ ($k = 3$): ring size = 10. For $\pi$ ($k = 5$): ring size = 16. The relationship between pitch, rise, and residues per turn:
Derivation 2: Ramachandran Analysis from van der Waals Radii
G.N. Ramachandran and colleagues (1963) showed that the allowed backbone conformations of polypeptides can be predicted from hard-sphere contact distances between non-bonded atoms.
Coordinate Generation
Given a set of $(\phi, \psi)$ values and a fixed $\omega = 180°$, the Cartesian coordinates of all backbone atoms can be computed using standard bond lengths and angles:
- $d(\text{N-C}_\alpha) = 1.47\;\text{\AA}$
- $d(\text{C}_\alpha\text{-C}) = 1.53\;\text{\AA}$
- $d(\text{C-N}) = 1.33\;\text{\AA}$ (partial double bond)
- $d(\text{C=O}) = 1.24\;\text{\AA}$
- $\angle(\text{N-C}_\alpha\text{-C}) = 111°$
- $\angle(\text{C}_\alpha\text{-C-N}) = 116°$
- $\angle(\text{C-N-C}_\alpha) = 122°$
Van der Waals Contact Criteria
For each $(\phi, \psi)$ combination, interatomic distances between non-bonded atoms are computed. A conformation is disallowed if any distance is shorter than the minimum contact distance:
The critical van der Waals radii used are:
- Carbon: $r_{\text{vdW}} = 1.7\;\text{\AA}$ (normal), $1.5\;\text{\AA}$ (outer limit)
- Nitrogen: $r_{\text{vdW}} = 1.55\;\text{\AA}$ (normal), $1.4\;\text{\AA}$ (outer limit)
- Oxygen: $r_{\text{vdW}} = 1.52\;\text{\AA}$ (normal), $1.35\;\text{\AA}$ (outer limit)
- Hydrogen: $r_{\text{vdW}} = 1.2\;\text{\AA}$ (normal), $1.0\;\text{\AA}$ (outer limit)
The most critical contacts that define the boundaries of the allowed regions are:
Using normal radii yields the fully allowed regions (only ~7.5% of the plot). Using outer limit radii yields partially allowed regions (~22% of the plot). Analysis of high-resolution crystal structures confirms that >99% of non-glycine residues fall within the partially allowed regions.
Derivation 3: The Hydrophobic Effect & Folding Thermodynamics
The hydrophobic effect is the dominant driving force for protein folding. When nonpolar residues are buried in the protein interior, water molecules that were ordered around the exposed hydrophobic surface are released to bulk solvent, increasing the overall entropy of the system.
Free Energy of Folding
The Gibbs free energy change for the unfolded-to-native transition is:
Crucially, native proteins are only marginally stable:$\Delta G_{\text{fold}} \approx -20$ to $-60\;\text{kJ/mol}$, which is the equivalent of just a few hydrogen bonds. This marginal stability is the net result of large, opposing contributions:
- Stabilizing: van der Waals contacts, H-bonds, hydrophobic effect, ion pairs, disulfide bonds
- Destabilizing: loss of conformational entropy ($-T\Delta S_{\text{conf}} \approx +600\;\text{kJ/mol}$ for a 100-residue protein)
Temperature Dependence: The Stability Curve
The large negative heat capacity change $\Delta C_p < 0$ for folding (due to burial of hydrophobic surface) causes the free energy to be a parabolic function of temperature. Using the Kirchhoff equations:
Since $\Delta G(T_m) = 0$ (by definition of the melting temperature), $\Delta S_m = \Delta H_m/T_m$. Therefore:
This stability curve has an inverted parabolic shape. The protein is stable ($\Delta G < 0$) between two temperatures: the heat denaturation temperature $T_m$ and a lower cold denaturation temperature $T_c$. Maximum stability occurs at $T_s$ where $d\Delta G/dT = 0$, which gives:
Entropy-Enthalpy Compensation
The hydrophobic contribution near room temperature is entropy-driven:$\Delta H_{\text{hphob}} \approx 0$ but $T\Delta S_{\text{hphob}} > 0$ (release of ordered water). At higher temperatures, the ordered water cage is already partially disrupted, and the hydrophobic contribution becomes enthalpy-driven. This temperature crossover is captured by the large negative$\Delta C_p$:
The total $\Delta C_p$ for folding correlates with the change in accessible nonpolar surface area:$\Delta C_p \approx -1.34 \cdot \Delta A_{\text{np}}$ (cal mol$^{-1}$ K$^{-1}$ Å$^{-2}$).
Derivation 4: Levinthal's Paradox & the Folding Funnel
The Paradox
Cyrus Levinthal (1969) noted that if each residue in a polypeptide can adopt just 3 backbone conformations (corresponding to the three allowed regions of the Ramachandran plot), a protein of $N$ residues has$3^N$ possible conformations. For a modest 100-residue protein:
If the protein samples conformations at the fastest possible rate (one per bond vibration,$\sim 10^{13}\;\text{s}^{-1}$), the time to exhaustively search all conformations would be:
This is astronomically longer than the age of the universe ($\sim 1.4 \times 10^{10}$ years). Yet real proteins fold in milliseconds to seconds. Therefore, protein folding cannot proceed by random conformational search.
Resolution: The Folding Funnel
The resolution comes from the energy landscape theory (Bryngelson, Wolynes, Onuchic, 1990s). The free energy landscape is not flat but funnel-shaped: there is a general thermodynamic bias toward the native state. The protein need not sample all conformations; instead, it follows a downhill gradient.
The folding funnel can be described by plotting free energy $G$ as a function of the number of native contacts $Q$:
where $E(Q)$ decreases as native contacts form (energy stabilization) and $S(Q)$ decreases as the chain becomes more ordered (fewer accessible microstates). The configurational entropy at a given$Q$ can be approximated as:
where $\Omega_0$ is the number of conformations per residue in the unfolded state. The funnel arises because $|dE/dQ| > T|dS/dQ|$ for a foldable protein — the energy gain from native contacts outpaces the entropy loss from ordering. The folding rate is then limited not by a combinatorial search but by the roughness of the landscape:
where $\Delta G^\ddagger$ is the free energy barrier at the transition state, typically$10\text{--}60\;\text{kJ/mol}$, and $k_0 \approx N/100\;\mu\text{s}^{-1}$ is the prefactor from the Kramers theory of diffusion over barriers.
Applications: Experimental Structure Determination
X-Ray Crystallography
The dominant method for atomic-resolution protein structures. A protein crystal diffracts X-rays ($\lambda \approx 1.5\;\text{\AA}$) according to Bragg's law:
The diffraction pattern provides amplitudes $|F_{hkl}|$, but the phases are lost (the phase problem). Phases are recovered by molecular replacement, isomorphous replacement (heavy atoms), or anomalous scattering (MAD/SAD).
Nuclear Magnetic Resonance (NMR)
NMR determines structures in solution (no crystal needed). Key experiments:
- NOESY: Nuclear Overhauser Effect — measures through-space distances ($< 5\;\text{\AA}$)
- COSY/TOCSY: Through-bond correlations for sequential assignment
- HSQC: $^{1}\text{H}$-$^{15}\text{N}$ correlation — fingerprint of protein
- Residual dipolar couplings: Long-range orientational constraints
Practical size limit is approximately 40–50 kDa (with deuteration and TROSY, up to ~100 kDa).
Cryo-Electron Microscopy (Cryo-EM)
The resolution revolution in cryo-EM (Nobel Prize 2017) now routinely achieves 2–3 Å resolution for large complexes. Single particle analysis images thousands of individual protein particles frozen in vitreous ice, computationally classifies their orientations, and reconstructs a 3D map. No crystallization is needed, and the method can capture multiple conformational states.
AlphaFold & Computational Prediction
AlphaFold2 (DeepMind, 2020) achieved near-experimental accuracy in protein structure prediction. The model uses a novel architecture combining multiple sequence alignments (MSAs) with a structure module that iteratively refines atomic coordinates. AlphaFold has predicted structures for essentially all known protein sequences (~200 million), dramatically expanding structural coverage.
Comparison of Methods
| Method | Resolution | Size Range | Key Requirement |
|---|---|---|---|
| X-ray | 1–3 Å | Any size | Crystal |
| NMR | 2–4 Å | < 50 kDa | Isotope labeling |
| Cryo-EM | 2–4 Å | > 50 kDa | Homogeneous sample |
| AlphaFold | ~1–2 Å (GDT) | Single chains | MSA / templates |
Python Simulation: Folding Landscape, Stability Curve & Levinthal's Paradox
This simulation visualizes: (1) the folding free energy landscape as a function of native contacts, (2) the protein stability curve $\Delta G(T)$ showing both heat and cold denaturation, and (3) Levinthal's paradox demonstrating the impossibility of random conformational search.
Folding Funnel, Stability Curve, and Levinthal Paradox
PythonClick Run to execute the Python code
Code will be executed with Python 3 on the server
Python Simulation: Hydrophobic Effect Thermodynamics
This simulation shows the enthalpy-entropy compensation in protein folding and the temperature dependence of the hydrophobic contribution, illustrating why the hydrophobic effect is entropy-driven at room temperature but enthalpy-driven at higher temperatures.
Enthalpy-Entropy Compensation and Hydrophobic Effect
PythonClick Run to execute the Python code
Code will be executed with Python 3 on the server
Tertiary & Quaternary Structure
Tertiary Structure: The Full 3D Fold
Tertiary structure describes the complete three-dimensional arrangement of all atoms in a single polypeptide chain. It arises from the packing of secondary structure elements and is stabilized by:
- Hydrophobic packing: Nonpolar side chains cluster in the protein interior, minimizing contact with water. The hydrophobic core typically has a packing density of $\sim 0.75$, comparable to crystalline amino acids.
- Hydrogen bonds: Between side chains, between side chains and backbone, and between backbone and water. A buried unsatisfied H-bond donor/acceptor costs approximately $\sim 20\;\text{kJ/mol}$.
- Ionic interactions (salt bridges): Between oppositely charged side chains (e.g., Asp—Lys). Contribute $\sim 5\text{--}20\;\text{kJ/mol}$ depending on solvent exposure.
- Disulfide bonds: Covalent $\text{S-S}$ bridges between Cys residues. Common in extracellular proteins but rare in the reducing cytoplasm. Bond energy $\sim 250\;\text{kJ/mol}$.
- Van der Waals interactions: Individually weak ($\sim 2\text{--}4\;\text{kJ/mol}$) but collectively significant due to the large number of contacts in a tightly packed core.
Common Structural Motifs
- Coiled coil: Two or more $\alpha$-helices wrapped around each other with a heptad repeat (abcdefg), where positions a and d are hydrophobic. Superhelical pitch $\approx 140\;\text{\AA}$. Examples: leucine zippers, keratin, myosin.
- Greek key motif: Four antiparallel $\beta$-strands connected by hairpins in a pattern resembling the Greek key border design. Found in immunoglobulin domains.
- $\beta$-barrel: A closed structure of $\beta$-strands. TIM barrel ($(\beta/\alpha)_8$) is the most common enzyme fold, found in ~10% of all known enzyme structures.
- Rossmann fold: $\beta\alpha\beta\alpha\beta$ motif for dinucleotide (NAD$^+$/FAD) binding. One of the most ancient and conserved folds.
- EF-hand: Helix-loop-helix Ca$^{2+}$-binding motif. The loop coordinates Ca$^{2+}$ through 6–7 oxygen ligands. Found in calmodulin, troponin C, parvalbumin.
- Zinc finger: Small structural motif (~30 residues) stabilized by coordination of Zn$^{2+}$ by Cys and/or His residues. The C$_2$H$_2$ type is the most common DNA-binding domain in the human genome.
Quaternary Structure
Quaternary structure refers to the arrangement of two or more polypeptide subunits in a multisubunit complex. Subunit interfaces are stabilized by the same noncovalent forces as the protein interior. Oligomerization provides several advantages:
- Cooperativity: Allosteric regulation requires multiple subunits (e.g., hemoglobin tetramer, $\alpha_2\beta_2$)
- Genetic economy: Large structures from small genes (e.g., viral capsids use 60 copies of one or a few subunits)
- Error reduction: Self-assembly provides quality control — misfolded subunits often fail to assemble
- Active site formation: Catalytic sites at subunit interfaces (e.g., aspartate transcarbamoylase)
Symmetry in Oligomers
Most oligomeric proteins display point group symmetry:
- C$_n$ (cyclic): n-fold rotational symmetry. Example: C$_3$ for GroES heptamer (actually C$_7$).
- D$_n$ (dihedral): n-fold axis plus perpendicular 2-fold axes. Example: D$_2$ for hemoglobin tetramer, D$_7$ for GroEL.
- Cubic symmetries: T (tetrahedral, 12 subunits), O (octahedral, 24), I (icosahedral, 60). Example: ferritin (O, 24-mer), viral capsids (I, 60n-mer).
The subunit dissociation constant is related to the free energy of association:
Typical subunit interfaces bury $600\text{--}4000\;\text{\AA}^2$ of surface area per subunit, with$K_d$ values ranging from $\text{pM}$ (very stable, e.g., hemoglobin) to$\mu\text{M}$ (transient complexes).
Intrinsically Disordered Proteins (IDPs)
A significant fraction of eukaryotic proteins (~30–40%) contain long disordered regions that lack a fixed three-dimensional structure under physiological conditions. These intrinsically disordered proteins (IDPs) or regions (IDRs) challenge the classical structure-function paradigm.
Sequence Characteristics
IDPs are enriched in charged and polar residues (Glu, Lys, Arg, Gln, Ser, Pro) and depleted in hydrophobic residues (Trp, Phe, Ile, Leu, Val). This low hydrophobicity and high net charge prevent the formation of a compact hydrophobic core. The mean hydrophobicity $\langle H \rangle$ and mean net charge $\langle |R| \rangle$ can predict disorder using the Uversky plot:
Proteins above this line in the $(\langle H \rangle, \langle |R| \rangle)$ plane tend to be disordered; those below tend to be globular.
Functional Advantages of Disorder
- Binding promiscuity: IDPs can fold upon binding to different partners, enabling one protein to interact with many targets (hub proteins in signaling networks)
- Large interaction surfaces: Extended conformations provide larger binding interfaces than compact domains of the same sequence length
- Post-translational modification: Disordered regions are primary sites for phosphorylation, ubiquitination, and other modifications
- Entropic springs: Disordered linkers between domains allow conformational flexibility (e.g., titin PEVK domain)
- Liquid-liquid phase separation: IDPs with multivalent interaction motifs drive the formation of membraneless organelles (P-bodies, stress granules, nucleoli)
Polymer Physics of IDPs
IDPs can be modeled as polymer chains. The radius of gyration scales with chain length as:
where $N$ is the number of residues, $R_0 \approx 2\;\text{\AA}$, and the Flory exponent$\nu$ depends on solvent conditions: $\nu = 0.588$ for a good solvent (self-avoiding random walk), $\nu = 0.5$ for a theta solvent (ideal chain), and $\nu = 1/3$ for a poor solvent (compact globule). Most IDPs in physiological conditions have $\nu \approx 0.5\text{--}0.6$.
Key Equations Summary
Helix Pitch
Protein Stability Curve
Levinthal's Number
Bragg's Law
Radius of Gyration (IDP)
Steric Clash Criterion
Folding Rate (Kramers)
Subunit Dissociation
Quantifying Non-Covalent Forces in Proteins
Protein stability results from a delicate balance of large opposing forces. Understanding each contribution quantitatively is essential for rational protein design and stability engineering.
Van der Waals Interactions
The Lennard-Jones potential describes the distance dependence of van der Waals interactions between non-bonded atoms:
where $\epsilon$ is the well depth ($\sim 0.5\text{--}1.0\;\text{kJ/mol}$),$\sigma$ is the distance at which $V = 0$, and the minimum occurs at$r_{\min} = 2^{1/6}\sigma$. The $r^{-12}$ term models Pauli repulsion and the $r^{-6}$ term models London dispersion attraction.
A typical protein buries 3,000–10,000 Å$^2$ of surface area, making thousands of van der Waals contacts. The collective contribution is roughly$-100$ to $-400\;\text{kJ/mol}$ for a typical small protein.
Hydrogen Bonds
The strength of an H-bond depends on geometry (distance and angle) and the dielectric environment. In vacuum, an N-H$\cdots$O=C H-bond is worth approximately $-20\;\text{kJ/mol}$. In aqueous solution, the net contribution to stability is smaller ($\sim -2$ to$-8\;\text{kJ/mol}$) because the protein must pay the cost of desolvating the donor and acceptor.
The electrostatic contribution to H-bonding follows Coulomb's law in a dielectric medium:
where $\epsilon_r \approx 2\text{--}4$ in the protein interior (vs $\epsilon_r = 80$ in water), greatly amplifying electrostatic interactions in the hydrophobic core.
Conformational Entropy
The major cost of folding is the loss of backbone and side-chain conformational entropy. Each residue loses approximately $\sim 4\text{--}6\;\text{kJ/mol}$ of conformational entropy upon folding. For a 100-residue protein, this amounts to $\sim 400\text{--}600\;\text{kJ/mol}$ — a massive penalty that must be overcome by the sum of all favorable interactions.
This is the fundamental reason why proteins are only marginally stable: the large favorable enthalpy from packing interactions and the large unfavorable conformational entropy nearly cancel, leaving a small net $\Delta G_{\text{fold}} \approx -20$ to $-60\;\text{kJ/mol}$.