Part 7: Protein Structure and Function
The Workhorses of the Cell
Proteins are the most versatile macromolecules in living systems. They catalyze reactions, provide structural support, transport molecules, transmit signals, and defend against pathogens. Their function is determined by their three-dimensional structure, which is in turn dictated by the linear sequence of amino acids encoded in the genome.
This chapter covers the full hierarchy of protein structure from individual amino acids through quaternary assemblies, the physical chemistry of protein folding, and the principles of enzyme catalysis.
1. Amino Acids — The Building Blocks
General Structure
All 20 standard amino acids share a common backbone: a central α-carbon bonded to an amino group (–NH3+), a carboxyl group (–COO−), a hydrogen atom, and a variable side chain (R group). At physiological pH (~7.4), amino acids exist as zwitterions — the amino group is protonated and the carboxyl group is deprotonated simultaneously.
\( \text{H}_3\text{N}^+ - \text{C}_\alpha\text{HR} - \text{COO}^- \)
All standard amino acids except glycine have a chiral α-carbon. Biological proteins use exclusively the L-enantiomer.
The 20 Standard Amino Acids by Properties
Nonpolar (Hydrophobic) — 9 residues
G
Glycine
H side chain, achiral
A
Alanine
Methyl group
V
Valine
Branched chain
L
Leucine
Branched chain
I
Isoleucine
2 chiral centers
P
Proline
Cyclic imino acid
F
Phenylalanine
Benzyl group
W
Tryptophan
Indole ring
M
Methionine
Thioether
Nonpolar residues are typically buried in the protein interior, driving folding via the hydrophobic effect. Proline is unique: its side chain cyclizes back to the backbone nitrogen, restricting φ to ~−60° and introducing a rigid kink. Glycine, with only a hydrogen as its R group, is the most conformationally flexible residue.
Polar Uncharged — 6 residues
S
Serine
Hydroxyl (OH)
T
Threonine
Hydroxyl, 2 chiral
C
Cysteine
Sulfhydryl (SH)
Y
Tyrosine
Phenol OH
N
Asparagine
Amide
Q
Glutamine
Amide
Cysteine is particularly important — two cysteine residues can form a disulfide bond (–S–S–) under oxidizing conditions. Serine and threonine are common phosphorylation sites.
Positively Charged (Basic) — 3 residues
K
Lysine
pKa ~10.5, always +
R
Arginine
pKa ~12.5, guanidinium
H
Histidine
pKa ~6.0, imidazole
Histidine is uniquely important: its imidazole side chain has a pKa near physiological pH (~6.0), making it an excellent proton shuttle in enzyme active sites (e.g., the catalytic triad of serine proteases).
Negatively Charged (Acidic) — 2 residues
D
Aspartate
pKa ~3.65
E
Glutamate
pKa ~4.25
Deprotonated at physiological pH, these residues carry a net negative charge and often participate in salt bridges and metal ion coordination.
Acid-Base Chemistry of Amino Acids
Key pKa Values
| Group | pKa | Conjugate Acid / Base |
|---|---|---|
| α-COOH | ~2.2 | –COOH / –COO− |
| α-NH3+ | ~9.4 | –NH3+ / –NH2 |
| Asp (R) | 3.65 | –COOH / –COO− |
| Glu (R) | 4.25 | –COOH / –COO− |
| His (R) | 6.00 | ImH+ / Im |
| Cys (R) | 8.18 | –SH / –S− |
| Tyr (R) | 10.07 | –OH / –O− |
| Lys (R) | 10.53 | –NH3+ / –NH2 |
| Arg (R) | 12.48 | Guanidinium+ / Guanidine |
Henderson-Hasselbalch Equation
\( \text{pH} = \text{p}K_a + \log\frac{[\text{A}^-]}{[\text{HA}]} \)
When pH = pKa, exactly half the molecules are protonated. This equation is essential for predicting the charge state of amino acid side chains at any given pH.
Isoelectric Point (pI)
The isoelectric point is the pH at which the amino acid (or protein) carries zero net charge. For a simple amino acid with no ionizable side chain:
\( \text{pI} = \frac{\text{p}K_{a1} + \text{p}K_{a2}}{2} \)
For amino acids with ionizable side chains, take the average of the two pKa values that bracket the zwitterionic (neutral) species. For example, for aspartate: pI = (pKa,α-COOH + pKa,R-COOH)/2 = (2.09 + 3.86)/2 ≈ 2.98.
The Peptide Bond
Amino acids are joined by peptide bonds formed via a condensation (dehydration) reaction between the α-carboxyl of one amino acid and the α-amino of the next, releasing water.
Partial Double Bond Character
Resonance between C=O and C–N gives the peptide bond ~40% double-bond character. Bond length is 1.33 Å (between C–N single 1.47 Å and C=N double 1.25 Å). This restricts rotation around the C–N bond.
Planarity and Trans Preference
The six atoms of the peptide unit (Cα, C, O, N, H, Cα) are coplanar. The trans configuration is strongly preferred (~99.95%) due to steric clash in cis. Exception: X–Pro bonds are ~5% cis because proline's ring reduces the energy difference.
Ramachandran Angles φ and ψ
The backbone conformation of each residue is defined by two dihedral (torsion) angles:
\( \phi \text{ (phi)}: \text{C}_{i-1}\text{-N}_i\text{-C}_{\alpha,i}\text{-C}_i \quad \text{(rotation around N-C}_\alpha \text{ bond)} \)
\( \psi \text{ (psi)}: \text{N}_i\text{-C}_{\alpha,i}\text{-C}_i\text{-N}_{i+1} \quad \text{(rotation around C}_\alpha\text{-C bond)} \)
The peptide bond angle ω is fixed at ~180° (trans). Thus φ and ψ are the only two degrees of freedom per residue. Not all (φ, ψ) combinations are sterically allowed — this is visualized in the Ramachandran plot.
2. Primary Structure
Amino Acid Sequence
The primary structure is the linear sequence of amino acids in a polypeptide chain, read from the N-terminus (free amino group) to the C-terminus (free carboxyl group). This convention matches the direction of ribosomal synthesis (N→C).
The primary structure encodes all the information necessary to determine the three-dimensional fold of the protein (Anfinsen's thermodynamic hypothesis). A single amino acid change can dramatically alter function — as in sickle cell disease, where Glu6→Val in β-globin causes hemoglobin polymerization.
Peptide Bond Formation (Dehydration Synthesis)
\( \text{AA}_1\text{-COOH} + \text{H}_2\text{N-AA}_2 \xrightarrow{\text{ribosome}} \text{AA}_1\text{-CO-NH-AA}_2 + \text{H}_2\text{O} \)
In vivo, this is catalyzed by the ribosome (a ribozyme) using aminoacyl-tRNAs as substrates. The reaction is thermodynamically unfavorable (ΔG°′ ≈ +3.5 kcal/mol per bond) and is driven by GTP hydrolysis during translation.
Sequence Determination Methods
Edman Degradation
Sequentially removes and identifies one amino acid at a time from the N-terminus using phenylisothiocyanate (PITC). Practical for ~50–60 residues before cumulative errors degrade accuracy. Requires blocked N-terminus removal first.
Mass Spectrometry (MS/MS)
Modern method. Proteins are digested with trypsin (cleaves after K, R), peptide fragments are ionized (ESI or MALDI), separated by m/z, and fragmented again (tandem MS). Fragment ion series (b-ions, y-ions) reveal the sequence. Can identify post-translational modifications.
3. Secondary Structure
α-Helix
The α-helix is the most common secondary structure element in proteins, first predicted by Linus Pauling and Robert Corey in 1951. It is a right-handed helical structure stabilized by backbone hydrogen bonds.
Geometric Parameters
- Residues per turn: 3.6
- Rise per residue: 1.5 Å
- Pitch (rise/turn): 5.4 Å (= 3.6 × 1.5)
- Hydrogen bond: C=O of residue i → N–H of residue i+4
- φ, ψ angles: −57°, −47°
- Helix diameter: ~12 Å (including side chains)
Important Properties
- Helix dipole: The aligned N–H and C=O groups create a macrodipole with a partial positive charge at the N-terminus and partial negative at the C-terminus (~0.5–0.7 charge units)
- Helix breakers: Proline (lacks N–H for H-bond, restricted φ) and Glycine (too flexible, entropic penalty)
- Helix formers: Ala, Leu, Met, Glu have high helix propensity
\( \text{Pitch} = n \times d = 3.6 \times 1.5\,\text{\AA} = 5.4\,\text{\AA} \)
\( \text{H-bond}: \quad \text{C=O}_i \cdots \text{H-N}_{i+4} \)
β-Sheet
β-sheets are formed by extended polypeptide strands (β-strands) aligned side-by-side, connected by hydrogen bonds between backbone C=O and N–H groups of adjacent strands.
Antiparallel β-Sheet
Adjacent strands run in opposite directions (N→C next to C→N). Hydrogen bonds are nearly perpendicular to the strand direction, forming a regular pattern. More stable than parallel sheets due to optimal H-bond geometry.
φ, ψ: −139°, +135°
Parallel β-Sheet
Adjacent strands run in the same direction. Hydrogen bonds are angled, making parallel sheets slightly less stable. Require at least ~5 strands to be stable. Common in α/β proteins (e.g., Rossmann fold).
φ, ψ: −119°, +113°
β-Sheet Features
- Twist: β-sheets are not flat but have a right-handed twist of ~15–20° per strand when viewed along the strand direction
- β-Bulge: A disruption where one residue in a strand does not form a hydrogen bond with its partner, creating a local bulge. Often found at edge strands.
- Side chain orientation: R groups alternate above and below the sheet plane
Turns, Loops, and Random Coil
β-Turns
Tight 180° reversals involving 4 residues. Stabilized by an H-bond from C=O of residuei to N–H of residue i+3. Type I: φ2=−60°, ψ2=−30°, φ3=−90°, ψ3=0°. Type II: φ2=−60°, ψ2=120°, φ3=80°, ψ3=0°. Glycine is favored at position 3 in Type II turns. Pro is common at position 2.
Ω-Loops
Longer loops (6–16 residues) that resemble the Greek letter Ω. Found on protein surfaces, often form parts of active sites or antigen-binding regions (CDR loops in antibodies). No regular H-bonding pattern.
Random Coil
Regions that lack regular secondary structure. Not truly "random" — they have defined conformations in folded proteins but do not repeat a regular pattern. Intrinsically disordered regions (IDRs) are genuinely dynamic and play roles in signaling.
Ramachandran Plot
The Ramachandran plot maps allowed (φ, ψ) angle combinations for a polypeptide backbone. Most combinations are sterically forbidden due to clashes between backbone atoms and Cβ.
Allowed Regions
- α-helix region: φ ≈ −57°, ψ ≈ −47°
- β-sheet region: φ ≈ −120° to −140°, ψ ≈ +110° to +135°
- Left-handed helix: φ ≈ +57°, ψ ≈ +47° (rare, less stable)
- Polyproline II: φ ≈ −75°, ψ ≈ +145°
Special Residues
- Glycine: No Cβ, so virtually all (φ, ψ) regions are accessible. The Ramachandran plot for Gly is much more permissive.
- Proline: Ring constrains φ to ~−63° ± 15°, severely restricting its Ramachandran space to a narrow vertical strip.
Derivation: Ramachandran Plot Constraints from Steric Clash Analysis
Starting from the hard-sphere model of atoms, we derive which backbone dihedral angles (φ, ψ) are sterically allowed.
Step 1: Define the steric exclusion criterion
Two non-bonded atoms overlap if their interatomic distance falls below the sum of their van der Waals radii. For atoms i and j:
$$d_{ij} < r_{\text{vdW},i} + r_{\text{vdW},j} \implies \text{steric clash (forbidden)}$$
Step 2: Identify the critical atom pairs
For a general L-amino acid residue with a Cβ atom, the closest contacts occur between backbone carbonyl oxygen Oi and amide hydrogen Hi+1, and between Cβ and backbone atoms of adjacent peptide units. The key van der Waals radii are: C = 1.7 Å, N = 1.55 Å, O = 1.52 Å, H = 1.2 Å.
$$d_{\text{min}}(\text{O} \cdots \text{H}) = r_{\text{O}} + r_{\text{H}} = 1.52 + 1.20 = 2.72\;\text{\AA}$$
Step 3: Express interatomic distances as functions of φ and ψ
Using the rigid peptide plane geometry (bond lengths: N–Cα = 1.47 Å, Cα–C = 1.53 Å, C=O = 1.24 Å, C–N = 1.33 Å; bond angles ~120° at C, ~110° at Cα), the distance between any two atoms separated by rotatable bonds is:
$$d_{ij}(\phi, \psi) = \left|\mathbf{r}_j(\phi, \psi) - \mathbf{r}_i\right| = f(\phi, \psi)$$
Step 4: Enumerate forbidden regions by scanning (φ, ψ) space
For each (φ, ψ) pair from −180° to +180°, compute all pairwise distances. A conformation is disallowed if any contact distance falls below the allowed minimum:
$$\text{Allowed}(\phi, \psi) = \begin{cases} 1 & \text{if } d_{ij}(\phi, \psi) \geq d_{\min} \;\;\forall\; i,j \\ 0 & \text{otherwise} \end{cases}$$
Step 5: Identify the allowed basins
The steric scan reveals only ~20% of (φ, ψ) space is accessible for L-amino acids with a Cβ. The allowed regions cluster into well-defined basins corresponding to known secondary structures:
$$\alpha\text{-helix:}\;\phi \approx -57°,\;\psi \approx -47° \qquad \beta\text{-sheet:}\;\phi \approx -120°,\;\psi \approx +130°$$
Step 6: Special cases — Glycine and Proline
Glycine (R = H) lacks Cβ, eliminating the dominant steric clash and allowing ~60% of (φ, ψ) space. Proline's pyrrolidine ring fixes φ ≈ −63° ± 15° by covalent constraint, restricting Ramachandran space to a narrow vertical strip. This analysis was first performed by Ramachandran, Ramakrishnan, and Sasisekharan (1963).
$$\text{Gly: } \sim 60\%\;\text{allowed} \qquad \text{Ala (general): } \sim 20\%\;\text{allowed} \qquad \text{Pro: } \phi \approx -63° \pm 15°$$
4. Tertiary Structure
Stabilizing Forces
Tertiary structure is the complete three-dimensional arrangement of all atoms in a single polypeptide chain. It is stabilized by a combination of non-covalent and covalent interactions.
Hydrophobic Core Packing
The dominant driving force of protein folding. Nonpolar side chains are buried in the protein interior, away from water. The hydrophobic effect arises primarily from the entropy gain of water molecules released from ordered cages around nonpolar groups. Packing efficiency in the core approaches that of crystalline organic solids (~0.75 packing fraction).
Derivation: Kauzmann's Hydrophobic Effect — Transfer Free Energy
Starting from experimental solubility data, we derive the free energy of transferring nonpolar groups from water to the protein interior.
Step 1: Define the transfer process
Consider transferring a nonpolar solute from water (w) to a nonpolar solvent (np), mimicking burial in the protein core:
$$\text{Solute}_{\text{(water)}} \rightarrow \text{Solute}_{\text{(nonpolar)}} \qquad \Delta G_{\text{transfer}} = \Delta H_{\text{tr}} - T\Delta S_{\text{tr}}$$
Step 2: Relate transfer free energy to solubility
The partition coefficient Kp between nonpolar solvent and water is related to the transfer free energy:
$$\Delta G_{\text{transfer}} = -RT \ln K_p = -RT \ln\frac{[\text{Solute}]_{\text{np}}}{[\text{Solute}]_{\text{water}}}$$
Step 3: Experimental values for amino acid side chains
Nozaki and Tanford (1971) measured ΔGtransfer for amino acid side chain analogs from water to ethanol/octanol. The values scale with the accessible surface area (ASA) buried:
$$\Delta G_{\text{transfer}} \approx -25 \;\text{cal/mol/\AA}^2 \times \Delta\text{ASA}$$
Step 4: Decompose into enthalpic and entropic contributions
Calorimetric measurements reveal the hydrophobic effect is primarily entropic at 25°C. Water molecules form ordered “iceberg” cages around nonpolar groups, losing entropy. Transfer releases these waters:
$$\Delta H_{\text{tr}} \approx 0 \;\text{(near 25°C)} \qquad \Delta S_{\text{tr}} > 0 \;\text{(water released)}$$
$$\therefore\;\Delta G_{\text{transfer}} \approx -T\Delta S_{\text{tr}} < 0 \;\text{(favorable)}$$
Step 5: Sum over all buried residues in a protein
For a typical 150-residue globular protein burying ~10,000 Å2 of nonpolar surface area upon folding:
$$\Delta G_{\text{hydrophobic}} = \sum_i \Delta G_{\text{tr},i} \approx -25 \times 10{,}000 \;\text{cal/mol} \approx -250\;\text{kcal/mol}$$
This is the largest single contribution to protein stability, although it is largely offset by the conformational entropy penalty (∼+200 kcal/mol for the same protein), yielding a net stability of only 5–15 kcal/mol.
Hydrogen Bonds
Beyond backbone H-bonds in secondary structure, side chains form H-bonds with each other, backbone atoms, and water molecules. Typical strength: 2–5 kcal/mol. Their contribution to stability is partly offset by the loss of H-bonds to water in the unfolded state.
Salt Bridges (Ion Pairs)
Electrostatic attraction between oppositely charged side chains (e.g., Lys–Asp, Arg–Glu). Strength depends on the dielectric environment: ~1–5 kcal/mol in the protein interior (low dielectric) vs. ~0.5 kcal/mol on the surface (high dielectric, water screening).
Van der Waals Interactions
Individually weak (~0.1–1 kcal/mol), but their vast number (thousands) in a tightly packed core makes them collectively significant. Optimized by close packing of complementary surfaces. Follow a Lennard-Jones potential with attractive r−6 and repulsive r−12 terms.
Disulfide Bonds (–S–S–)
Covalent bonds between cysteine residues. Common in extracellular proteins (oxidizing environment) but rare in the cytoplasm (reducing environment, maintained by glutathione and thioredoxin). Stabilize the folded state by ~2–5 kcal/mol per bond by reducing the conformational entropy of the unfolded state.
Structural Domains and Motifs
A domain is an independently folding unit, typically 50–300 residues, that often corresponds to a functional module. A motif (or supersecondary structure) is a recognizable combination of secondary structure elements.
Rossmann Fold
Alternating β-α-β units forming a parallel β-sheet. Binds dinucleotides (NAD⁺, FAD). Found in dehydrogenases.
Greek Key
Four antiparallel β-strands arranged in a pattern resembling the Greek key ornament. Common in β-barrel proteins.
β-Barrel
Closed β-sheet curved into a barrel. Examples: TIM barrel (8 parallel strands), OmpF porin (antiparallel), GFP.
Coiled-Coil
Two or more α-helices wound around each other with heptad repeat (abcdefg) — hydrophobic at positions a,d. Leucine zippers, tropomyosin, keratin.
Zinc Finger
Small domain (~30 residues) coordinating Zn²⁺ via Cys/His. C₂H₂ type: DNA-binding transcription factors. C₄ type: nuclear receptors.
Leucine Zipper
Coiled-coil dimerization motif with Leu at every 7th position. Often linked to basic DNA-binding region (bZIP). Examples: Fos/Jun, GCN4.
5. Quaternary Structure
Subunit Assembly and Symmetry
Quaternary structure refers to the arrangement of multiple polypeptide subunits (protomers) into a functional complex. Subunit interfaces are stabilized by the same forces as tertiary structure: hydrophobic packing, H-bonds, salt bridges, and sometimes disulfide bonds.
C2 Symmetry
Single 2-fold rotation axis. Common in homodimers. Example: HIV protease (C2 homodimer with active site at the interface).
D2 (Dihedral) Symmetry
Three perpendicular 2-fold axes. Tetramers like hemoglobin (α2β2) have pseudo-D2 symmetry (not true because α ≠ β).
Cubic Symmetry
Higher-order: tetrahedral (T, 12-mer), octahedral (O, 24-mer), icosahedral (I, 60-mer). Viral capsids often use icosahedral symmetry.
Cooperativity — Hemoglobin as a Paradigm
Hemoglobin (α2β2) is the classic example of cooperative ligand binding. O2 binding to one subunit increases the affinity of neighboring subunits through conformational changes that shift the equilibrium between the T state (tense, low affinity) and R state (relaxed, high affinity).
The Hill Equation
\( Y = \frac{[L]^{n_H}}{K_d + [L]^{n_H}} = \frac{(p\text{O}_2)^{n_H}}{(P_{50})^{n_H} + (p\text{O}_2)^{n_H}} \)
\( \log\frac{Y}{1-Y} = n_H \log[L] - n_H \log K_d \)
Where Y is fractional saturation, nH is the Hill coefficient (a measure of cooperativity), and P50 is the partial pressure at half-saturation. For hemoglobin, nH ≈ 2.8 (maximum possible = 4 for 4 binding sites). Myoglobin has nH = 1 (no cooperativity).
Derivation: Hill Equation for Cooperative Binding
Starting from a simplified model where n ligand molecules bind simultaneously to a macromolecule.
Step 1: Write the all-or-none binding equilibrium
Assume all n binding sites are filled in a single concerted step (the Hill approximation):
$$P + nL \rightleftharpoons PL_n \qquad K_d = \frac{[P][L]^n}{[PL_n]}$$
Step 2: Define fractional saturation Y
The fractional saturation is the ratio of occupied binding sites to total sites:
$$Y = \frac{[PL_n]}{[P] + [PL_n]}$$
Step 3: Substitute the equilibrium expression
From the equilibrium: [PLn] = [P][L]n/Kd. Substituting:
$$Y = \frac{[P][L]^n / K_d}{[P] + [P][L]^n / K_d} = \frac{[L]^n / K_d}{1 + [L]^n / K_d}$$
Step 4: Simplify to the Hill equation
Multiply numerator and denominator by Kd and write Kd = (K0.5)n where K0.5 is the ligand concentration at half-saturation:
$$Y = \frac{[L]^n}{K_d + [L]^n} = \frac{[L]^n}{(K_{0.5})^n + [L]^n}$$
Step 5: Derive the Hill plot (linearized form)
Take the ratio Y/(1−Y) and apply logarithms:
$$\frac{Y}{1-Y} = \frac{[L]^n}{K_d} \quad \Longrightarrow \quad \log\frac{Y}{1-Y} = n\log[L] - \log K_d$$
A plot of log(Y/(1−Y)) vs log[L] gives a straight line with slope nH (the Hill coefficient). For hemoglobin, nH ≈ 2.8 indicates strong positive cooperativity (4 sites, but binding is not perfectly concerted).
Step 6: Verify half-saturation condition
When Y = 0.5, confirm that [L] = K0.5 (= P50 for O2 binding):
$$0.5 = \frac{[L]^n}{K_{0.5}^n + [L]^n} \implies K_{0.5}^n = [L]^n \implies [L] = K_{0.5} \;\checkmark$$
T ↔ R Transition
In the T state, a salt bridge between His HC3(β146) and Asp FG1(β94) constrains the structure. O2 binding to Fe2+ pulls the iron into the porphyrin plane, shifting helix F, breaking the salt bridge, and destabilizing the T state. BPG (2,3-bisphosphoglycerate) binds in the central cavity of the T state, stabilizing it and lowering O2 affinity (right-shifting the binding curve).
Bohr Effect
Low pH and high CO2 promote O2 release by stabilizing the T state. CO2 binds to N-terminal amino groups forming carbamate, and H+ protonates His146, strengthening the salt bridge. This facilitates O2 delivery to metabolically active tissues (low pH, high CO2).
MWC Concerted Model
Monod, Wyman & Changeux (1965). All subunits switch states simultaneously. The protein exists in equilibrium between T and R states (characterized by the allosteric constant L = [T0]/[R0]). Ligand binds preferentially to R, shifting the equilibrium.
\( Y = \frac{L c \alpha (1+c\alpha)^{n-1} + \alpha(1+\alpha)^{n-1}}{L(1+c\alpha)^n + (1+\alpha)^n} \)
L = [T]/[R], c = KR/KT, α = [S]/KR
KNF Sequential Model
Koshland, Némethy & Filmer (1966). Induced fit: each subunit changes conformation individually upon ligand binding, and this conformational change affects neighboring subunits. Does not require pre-existing T/R equilibrium. Can explain negative cooperativity (unlike MWC), as seen in aspartate transcarbamoylase with CTP inhibition.
6. Protein Folding
Thermodynamic Principles
Anfinsen's Thermodynamic Hypothesis
Christian Anfinsen demonstrated (1961, Nobel Prize 1972) that denatured ribonuclease A could refold spontaneously to its active conformation. This proved the native structure represents the global free energy minimum of the polypeptide chain — all information needed for folding is contained in the amino acid sequence.
Levinthal's Paradox
A 100-residue protein with 3 possible conformations per residue would need to sample 3100 ≈ 5 × 1047 states. At 1013 transitions/second, this would take ~1027 years — far exceeding the age of the universe. Yet proteins fold in milliseconds to seconds. This paradox demonstrates that folding cannot be a random search but must follow directed pathways.
Derivation: Levinthal's Paradox — Random Search Folding Time
Starting from the assumption that protein folding proceeds by random conformational search, we estimate the time required and show it is astronomically large.
Step 1: Count conformational degrees of freedom
Each residue has two backbone dihedral angles (φ, ψ). Assume each angle can adopt approximately 3 discrete states (gauche+, gauche−, trans). For a protein of N residues:
$$\Omega = 3^{2N} = 9^N \approx 10^{0.95N}$$
Step 2: Estimate the conformational sampling rate
Bond rotations occur on the picosecond timescale. Each new conformation is sampled in approximately:
$$\tau_{\text{step}} \approx 10^{-13}\;\text{s (100 fs per rotational isomerization)}$$
Step 3: Calculate total search time
If the protein must sample all conformations to find the native state by random search:
$$t_{\text{fold}} = \Omega \times \tau_{\text{step}} = 10^{0.95N} \times 10^{-13}\;\text{s} \approx 10^{(0.95N - 13)}\;\text{s}$$
Step 4: Apply to a small protein (N = 100 residues)
For a typical 100-residue protein:
$$t_{\text{fold}} = 10^{(0.95 \times 100 - 13)} = 10^{82}\;\text{s}$$
Compare: the age of the universe is only ~4.3 × 1017 s ≈ 1017.6 s. Random search would take 1064 times the age of the universe!
Step 5: The general formula
Using a more refined estimate with ~3 states per bond and 2 bonds per residue, Levinthal's estimate is often written as:
$$t_{\text{random}} \sim 10^{(N - 8)}\;\text{s} \quad \text{(simplified approximation)}$$
Step 6: Resolution — the folding funnel
Since real proteins fold in 10−3 to 100 s, folding cannot be a random search. The resolution is that the energy landscape is funnel-shaped: local interactions rapidly form, progressively constraining the search space. At each step, the number of accessible conformations decreases dramatically:
$$t_{\text{actual}} \sim N / k_{\text{local}} \sim 10^{-6}\;\text{to}\;10^{0}\;\text{s} \ll 10^{82}\;\text{s}$$
Gibbs Free Energy of Folding
\( \Delta G_{\text{fold}} = \Delta H_{\text{fold}} - T\Delta S_{\text{fold}} \)
\( \Delta G_{\text{fold}} = G_{\text{native}} - G_{\text{unfolded}} \approx -5 \text{ to } -15 \text{ kcal/mol} \)
The net stability of a folded protein is surprisingly marginal: only 5–15 kcal/mol, which is the small difference between large opposing terms. Folding is enthalpically driven (H-bonds, van der Waals, electrostatics) but entropically opposed (loss of chain conformational entropy). The hydrophobic effect provides a favorable entropy term (release of ordered water).
Derivation: Protein Stability — Decomposition of ΔGfolding
Starting from the individual thermodynamic contributions, we derive the net free energy of protein folding as a sum of competing terms.
Step 1: Identify all contributing free energy terms
The total free energy of folding is the sum of all interactions gained and entropy lost upon transitioning from unfolded (U) to native (N) state:
$$\Delta G_{\text{fold}} = G_N - G_U = \Delta G_{\text{H-bond}} + \Delta G_{\text{hydrophobic}} + \Delta G_{\text{conf.entropy}} + \Delta G_{\text{vdW}} + \Delta G_{\text{electrostatic}}$$
Step 2: Hydrogen bond contribution
Each intramolecular H-bond in the folded state replaces an H-bond to water in the unfolded state. The net contribution per H-bond is small but collectively significant. For ~200 backbone H-bonds in a 150-residue protein:
$$\Delta G_{\text{H-bond}} \approx n_{\text{H-bond}} \times (\Delta G_{\text{intra}} - \Delta G_{\text{water}}) \approx 200 \times (-0.5) = -100\;\text{kcal/mol}$$
Step 3: Hydrophobic contribution (favorable)
Burial of nonpolar surface area drives folding through the hydrophobic effect. Using the Kauzmann transfer free energy:
$$\Delta G_{\text{hydrophobic}} \approx -25\;\text{cal/mol/\AA}^2 \times \Delta\text{ASA}_{\text{nonpolar}} \approx -250\;\text{kcal/mol}$$
Step 4: Conformational entropy penalty (unfavorable)
Each residue loses ~4.3 cal/mol/K of conformational entropy upon folding (from ~9 backbone states to 1). For N residues at T = 300 K:
$$\Delta G_{\text{conf.entropy}} = -T\Delta S_{\text{conf}} \approx N \times T \times 4.3\;\text{cal/mol/K} \approx 150 \times 300 \times 4.3 \approx +194\;\text{kcal/mol}$$
Step 5: Van der Waals packing contribution
Tight packing in the core provides favorable van der Waals contacts. Each atom contributes ~0.03 kcal/mol, and a 150-residue protein has ~1,000 atoms in the core:
$$\Delta G_{\text{vdW}} \approx -0.03 \times n_{\text{contacts}} \approx -40\;\text{kcal/mol}$$
Step 6: Net stability is marginal
Summing all terms reveals that protein stability is a small difference between large opposing numbers:
$$\Delta G_{\text{fold}} \approx (-100) + (-250) + (+194) + (-40) + (+186) \approx -10\;\text{kcal/mol}$$
The ∼+186 kcal/mol includes additional unfavorable terms (loss of solvation entropy for polar groups, backbone strain, etc.). This marginal stability (−5 to −15 kcal/mol) is biologically crucial: it allows proteins to be conformationally dynamic, to be regulated by post-translational modifications, and to be degraded when no longer needed.
Energy Landscape and Folding Pathways
Folding Funnel Model
Modern view: the energy landscape is a rugged funnel. The unfolded state sits at the top (high energy, high entropy, many conformations). As the protein folds, it descends the funnel toward the native state at the bottom (low energy, low entropy, single conformation). The funnel shape ensures that many different starting conformations converge on the native state through multiple parallel pathways.
Local minima on the funnel surface correspond to kinetic traps — misfolded intermediates that must overcome energy barriers to continue folding.
Molten Globule Intermediate
An early folding intermediate with native-like secondary structure and overall compactness but without the tight packing and fixed tertiary contacts of the native state. It has a hydrophobic core but with "liquid-like" interior packing. Observable experimentally by circular dichroism (secondary structure present), ANS fluorescence (exposed hydrophobic patches), and hydrodynamic radius (compact but expanded relative to native).
Molecular Chaperones
While Anfinsen showed small proteins can refold spontaneously in vitro, many proteins require assistance in the crowded cellular environment (~300 mg/mL total protein). Chaperones prevent aggregation and provide protected folding environments.
Hsp70 System (DnaK/DnaJ/GrpE in bacteria)
Mechanism: Hsp70 binds exposed hydrophobic segments of nascent or misfolded proteins via its substrate-binding domain. The ATP cycle controls substrate affinity: ATP-bound state has an open lid with fast on/off rates; DnaJ (Hsp40 co-chaperone) stimulates ATP hydrolysis, trapping the substrate under a closed lid; GrpE promotes ADP release and substrate release. Multiple bind-release cycles give the substrate opportunities to fold. Prevents aggregation of ~15–20% of newly synthesized E. coli proteins.
GroEL/GroES System (Hsp60/Hsp10 — the Anfinsen Cage)
Structure: GroEL is a double-ring barrel, each ring with 7 subunits forming a central cavity. GroES is a dome-shaped heptameric lid. Mechanism: Misfolded protein (<60 kDa) binds to hydrophobic residues lining the GroEL cavity (cis ring). ATP and GroES binding trigger a massive conformational change — the cavity doubles in volume and switches from hydrophobic to hydrophilic walls. The encapsulated protein folds in isolation for ~10 seconds (the time of ATP hydrolysis). Then GroES and substrate are released. If not yet folded, the protein can rebind for another cycle. Essential for ~10% of E. coli proteins, notably TIM barrel proteins.
Hsp90
A homodimeric chaperone that acts late in the folding pathway, stabilizing near-native conformations of specific clients: steroid hormone receptors, kinases, p53 tumor suppressor. Requires co-chaperones (Hop, p23, Cdc37). The Hsp90 ATPase cycle is slow and regulated. Target of anti-cancer drug geldanamycin, which blocks the ATP-binding pocket.
Protein Misfolding Diseases
When proteins fail to fold correctly, they can aggregate into toxic structures. Many neurodegenerative diseases are caused by accumulation of misfolded protein aggregates.
Amyloid Fibrils
Cross-β structure: β-strands perpendicular to the fibril axis, forming a continuous hydrogen-bonded sheet. Extremely stable (resistant to proteases, detergents). Detected by Congo red birefringence and ThT fluorescence. The amyloid fold is a generic property of polypeptide chains — many proteins can form amyloids under appropriate conditions.
Prion Diseases
Caused by PrPC (normal, α-helix rich) converting to PrPSc (misfolded, β-sheet rich). PrPSc acts as a template, catalyzing the conversion of PrPC — a self-propagating conformational change. Diseases: Creutzfeldt-Jakob (CJD), bovine spongiform encephalopathy (BSE), kuru, scrapie. Transmissible without nucleic acid (protein-only hypothesis, Stanley Prusiner, Nobel 1997).
Alzheimer's Disease
Accumulation of amyloid-β (Aβ) peptide (40–42 residues, cleaved from APP by β- and γ-secretases) into extracellular plaques, and intracellular neurofibrillary tangles of hyperphosphorylated tau protein. Aβ42 is more aggregation-prone than Aβ40. Oligomeric intermediates may be more toxic than mature fibrils.
Parkinson's Disease
Aggregation of α-synuclein into Lewy bodies in dopaminergic neurons of the substantia nigra. α-Synuclein is intrinsically disordered in its native state and adopts β-sheet structure upon aggregation. Point mutations (A53T, A30P, E46K) increase aggregation propensity. Propagation between neurons suggests prion-like behavior.
7. Enzyme Catalysis
Principles of Enzymatic Catalysis
Enzymes accelerate reactions by factors of 106–1017 without altering the equilibrium. They achieve this by lowering the activation energy (ΔG‡) through stabilization of the transition state.
\( k = \frac{k_B T}{h} e^{-\Delta G^\ddagger / RT} \)
Eyring equation: rate constant depends exponentially on activation energy
Transition State Stabilization
Enzymes bind the transition state more tightly than the substrate or product (Pauling's postulate). The enzyme active site is complementary to the transition state geometry, not the ground state. This is why transition state analogs are potent inhibitors (e.g., phosphonamidate for carboxypeptidase A).
Proximity & Orientation (Propinquity)
Binding brings reactive groups into close proximity in the correct orientation, converting an intermolecular reaction (entropically unfavorable) to an effectively intramolecular one. Estimated to contribute 102–105-fold rate enhancement. The effective molarity of reactants in the active site can exceed 10 M.
General Acid-Base Catalysis
Amino acid side chains donate or accept protons during the reaction, stabilizing charged transition states. His (pKa ~ 6) is the ideal proton shuttle at physiological pH. Glu, Asp, Lys, Cys, and Tyr also participate. Water molecules in the active site can serve as proton relays.
Covalent Catalysis
A nucleophilic group on the enzyme attacks the substrate, forming a transient covalent enzyme-substrate intermediate. The intermediate then breaks down to release product. Examples: Ser in serine proteases, Cys in cysteine proteases, Lys forming Schiff bases (aldolase, transaminases), His in phosphotransferases.
Metal Ion Catalysis
Metal ions (Zn2+, Mg2+, Mn2+, Fe2+/3+) can stabilize negative charges on intermediates (Lewis acid catalysis), generate nucleophiles by lowering the pKa of bound water (Zn2+-OH in carbonic anhydrase), participate in redox reactions, or orient substrates. ~30% of all enzymes require metal ions.
Electrostatic Catalysis
The low dielectric environment of the active site amplifies electrostatic interactions. Oxyanion holes stabilize developing negative charges on tetrahedral intermediates using backbone N–H groups (serine proteases) or positively charged residues. Helix dipoles can stabilize charged intermediates.
Serine Protease Mechanism — The Catalytic Triad
Serine proteases (chymotrypsin, trypsin, elastase, subtilisin) use a conserved catalytic triad of Ser-His-Asp to cleave peptide bonds. This is one of the best-studied enzyme mechanisms in biochemistry.
Step 1: Nucleophilic Attack (Acylation)
Asp102 orients and polarizes His57 via a low-barrier hydrogen bond. His57 acts as a general base, abstracting a proton from Ser195's hydroxyl, making it a potent nucleophile. The activated Ser195 attacks the carbonyl carbon of the scissile peptide bond, forming a tetrahedral intermediate. The developing negative charge on the carbonyl oxygen is stabilized by the oxyanion hole (backbone N–H of Gly193 and Ser195).
Step 2: Collapse & Amine Leaving
The tetrahedral intermediate collapses: His57 donates a proton (general acid catalysis) to the leaving group amine nitrogen. The C-terminal fragment departs. An acyl-enzyme intermediate remains (ester bond between Ser195 and the N-terminal fragment of the substrate).
Step 3: Deacylation
Water enters the active site and is activated by His57 (general base). The resulting hydroxide attacks the acyl-enzyme ester, forming a second tetrahedral intermediate (again stabilized by the oxyanion hole). This intermediate collapses, releasing the N-terminal product and regenerating free Ser195. The enzyme returns to its resting state.
Substrate Specificity
Determined by the S1 binding pocket: Chymotrypsin has a large hydrophobic pocket (cleaves after Phe, Trp, Tyr). Trypsin has Asp189 at the base of S1 (cleaves after Lys, Arg — positive charges). Elastase has Val/Thr partially blocking S1 (cleaves after small residues: Ala, Gly, Ser). Same mechanism, different specificity — a beautiful example of evolutionary divergence.
Python: Ramachandran Plot & Cooperative Oxygen Binding
Ramachandran Plot: Allowed Backbone Conformations
PythonCompute steric clash map and plot allowed phi/psi regions for general, glycine, and proline residues
Click Run to execute the Python code
Code will be executed with Python 3 on the server
Cooperative Oxygen Binding: Hill Equation & T/R States
PythonSimulate hemoglobin vs myoglobin O2 binding curves with Hill equation and MWC model
Click Run to execute the Python code
Code will be executed with Python 3 on the server
Fortran: Amino Acid pI Calculator
This Fortran program computes the isoelectric point (pI) and net charge at physiological pH for a given protein sequence, using the Henderson-Hasselbalch equation with standard pKa values for all ionizable groups.
Fortran: Protein Isoelectric Point (pI) Calculator
FortranCalculate pI from pKa values using bisection method for a given protein sequence
Click Run to execute the Fortran code
Code will be compiled with gfortran and executed on the server
Chapter Summary
Structural Hierarchy
- Primary: Amino acid sequence (N→C), peptide bonds with partial double-bond character
- Secondary: α-helix (3.6 res/turn, i→i+4 H-bonds), β-sheet (parallel/antiparallel), turns, loops
- Tertiary: Hydrophobic core, disulfides, salt bridges, H-bonds, van der Waals; domains and motifs
- Quaternary: Multi-subunit assembly, symmetry, cooperativity
Key Equations
- Henderson-Hasselbalch: pH = pKa + log([A−]/[HA])
- Hill equation: Y = [L]nH / (Kd + [L]nH)
- Gibbs free energy: ΔG = ΔH − TΔS
- Eyring equation: k = (kBT/h) exp(−ΔG‡/RT)
Folding
- Anfinsen: sequence determines structure (thermodynamic control)
- Levinthal: folding is a directed process, not random search
- Funnel landscape: rugged funnel converging on native state
- Chaperones: Hsp70 (prevent aggregation), GroEL/GroES (Anfinsen cage), Hsp90 (late-stage)
- Misfolding: amyloid, prions (PrPC→PrPSc), Alzheimer's Aβ, Parkinson's α-synuclein
Enzyme Catalysis
- Transition state stabilization (Pauling's postulate)
- Proximity/orientation, acid-base, covalent, metal ion catalysis
- Serine protease triad: Ser-His-Asp (nucleophilic acylation/deacylation)
- Substrate specificity via binding pocket geometry (chymotrypsin vs trypsin vs elastase)