Part 7: Protein Structure and Function

The Workhorses of the Cell

Proteins are the most versatile macromolecules in living systems. They catalyze reactions, provide structural support, transport molecules, transmit signals, and defend against pathogens. Their function is determined by their three-dimensional structure, which is in turn dictated by the linear sequence of amino acids encoded in the genome.

This chapter covers the full hierarchy of protein structure from individual amino acids through quaternary assemblies, the physical chemistry of protein folding, and the principles of enzyme catalysis.

1. Amino Acids — The Building Blocks

General Structure

All 20 standard amino acids share a common backbone: a central α-carbon bonded to an amino group (–NH₃⁺), a carboxyl group (–COO⁻), a hydrogen atom, and a variable side chain (R group). At physiological pH (~7.4), amino acids exist as zwitterions — the amino group is protonated and the carboxyl group is deprotonated simultaneously.

$ \text{H}_3\text{N}^+ - \text{C}_\alpha\text{HR} - \text{COO}^- $

All standard amino acids except glycine have a chiral α-carbon. Biological proteins use exclusively the L-enantiomer.

The 20 Standard Amino Acids by Properties

Nonpolar (Hydrophobic) — 9 residues

Glycine

H side chain, achiral

Alanine

Methyl group

Valine

Branched chain

Leucine

Branched chain

Isoleucine

2 chiral centers

Proline

Cyclic imino acid

Phenylalanine

Benzyl group

Tryptophan

Indole ring

Methionine

Thioether

Nonpolar residues are typically buried in the protein interior, driving folding via the hydrophobic effect. Proline is unique: its side chain cyclizes back to the backbone nitrogen, restricting φ to ~−60° and introducing a rigid kink. Glycine, with only a hydrogen as its R group, is the most conformationally flexible residue.

Polar Uncharged — 6 residues

Serine

Hydroxyl (OH)

Threonine

Hydroxyl, 2 chiral

Cysteine

Sulfhydryl (SH)

Tyrosine

Phenol OH

Asparagine

Amide

Glutamine

Amide

Cysteine is particularly important — two cysteine residues can form a disulfide bond (–S–S–) under oxidizing conditions. Serine and threonine are common phosphorylation sites.

Positively Charged (Basic) — 3 residues

Lysine

pKa ~10.5, always +

Arginine

pKa ~12.5, guanidinium

Histidine

pKa ~6.0, imidazole

Histidine is uniquely important: its imidazole side chain has a pK_a near physiological pH (~6.0), making it an excellent proton shuttle in enzyme active sites (e.g., the catalytic triad of serine proteases).

Negatively Charged (Acidic) — 2 residues

Aspartate

pKa ~3.65

Glutamate

pKa ~4.25

Deprotonated at physiological pH, these residues carry a net negative charge and often participate in salt bridges and metal ion coordination.

Acid-Base Chemistry of Amino Acids

Key pK_a Values

Group	pK_a	Conjugate Acid / Base
α-COOH	~2.2	–COOH / –COO⁻
α-NH₃⁺	~9.4	–NH₃⁺ / –NH₂
Asp (R)	3.65	–COOH / –COO⁻
Glu (R)	4.25	–COOH / –COO⁻
His (R)	6.00	ImH⁺ / Im
Cys (R)	8.18	–SH / –S⁻
Tyr (R)	10.07	–OH / –O⁻
Lys (R)	10.53	–NH₃⁺ / –NH₂
Arg (R)	12.48	Guanidinium⁺ / Guanidine

Henderson-Hasselbalch Equation

$ \text{pH} = \text{p}K_a + \log\frac{[\text{A}^-]}{[\text{HA}]} $

When pH = pK_a, exactly half the molecules are protonated. This equation is essential for predicting the charge state of amino acid side chains at any given pH.

Isoelectric Point (pI)

The isoelectric point is the pH at which the amino acid (or protein) carries zero net charge. For a simple amino acid with no ionizable side chain:

$ \text{pI} = \frac{\text{p}K_{a1} + \text{p}K_{a2}}{2} $

For amino acids with ionizable side chains, take the average of the two pK_a values that bracket the zwitterionic (neutral) species. For example, for aspartate: pI = (pK_a,α-COOH + pK_a,R-COOH)/2 = (2.09 + 3.86)/2 ≈ 2.98.

The Peptide Bond

Amino acids are joined by peptide bonds formed via a condensation (dehydration) reaction between the α-carboxyl of one amino acid and the α-amino of the next, releasing water.

Partial Double Bond Character

Resonance between C=O and C–N gives the peptide bond ~40% double-bond character. Bond length is 1.33 Å (between C–N single 1.47 Å and C=N double 1.25 Å). This restricts rotation around the C–N bond.

Planarity and Trans Preference

The six atoms of the peptide unit (C_α, C, O, N, H, C_α) are coplanar. The trans configuration is strongly preferred (~99.95%) due to steric clash in cis. Exception: X–Pro bonds are ~5% cis because proline's ring reduces the energy difference.

Ramachandran Angles φ and ψ

The backbone conformation of each residue is defined by two dihedral (torsion) angles:

$ \phi \text{ (phi)}: \text{C}_{i-1}\text{-N}_i\text{-C}_{\alpha,i}\text{-C}_i \quad \text{(rotation around N-C}_\alpha \text{ bond)} $

$ \psi \text{ (psi)}: \text{N}_i\text{-C}_{\alpha,i}\text{-C}_i\text{-N}_{i+1} \quad \text{(rotation around C}_\alpha\text{-C bond)} $

The peptide bond angle ω is fixed at ~180° (trans). Thus φ and ψ are the only two degrees of freedom per residue. Not all (φ, ψ) combinations are sterically allowed — this is visualized in the Ramachandran plot.

2. Primary Structure

Amino Acid Sequence

The primary structure is the linear sequence of amino acids in a polypeptide chain, read from the N-terminus (free amino group) to the C-terminus (free carboxyl group). This convention matches the direction of ribosomal synthesis (N→C).

The primary structure encodes all the information necessary to determine the three-dimensional fold of the protein (Anfinsen's thermodynamic hypothesis). A single amino acid change can dramatically alter function — as in sickle cell disease, where Glu6→Val in β-globin causes hemoglobin polymerization.

Peptide Bond Formation (Dehydration Synthesis)

$ \text{AA}_1\text{-COOH} + \text{H}_2\text{N-AA}_2 \xrightarrow{\text{ribosome}} \text{AA}_1\text{-CO-NH-AA}_2 + \text{H}_2\text{O} $

In vivo, this is catalyzed by the ribosome (a ribozyme) using aminoacyl-tRNAs as substrates. The reaction is thermodynamically unfavorable (ΔG°′ ≈ +3.5 kcal/mol per bond) and is driven by GTP hydrolysis during translation.

Sequence Determination Methods

Edman Degradation

Sequentially removes and identifies one amino acid at a time from the N-terminus using phenylisothiocyanate (PITC). Practical for ~50–60 residues before cumulative errors degrade accuracy. Requires blocked N-terminus removal first.

Mass Spectrometry (MS/MS)

Modern method. Proteins are digested with trypsin (cleaves after K, R), peptide fragments are ionized (ESI or MALDI), separated by m/z, and fragmented again (tandem MS). Fragment ion series (b-ions, y-ions) reveal the sequence. Can identify post-translational modifications.

3. Secondary Structure

α-Helix

The α-helix is the most common secondary structure element in proteins, first predicted by Linus Pauling and Robert Corey in 1951. It is a right-handed helical structure stabilized by backbone hydrogen bonds.

Geometric Parameters

Residues per turn: 3.6
Rise per residue: 1.5 Å
Pitch (rise/turn): 5.4 Å (= 3.6 × 1.5)
Hydrogen bond: C=O of residue i → N–H of residue i+4
φ, ψ angles: −57°, −47°
Helix diameter: ~12 Å (including side chains)

Important Properties

Helix dipole: The aligned N–H and C=O groups create a macrodipole with a partial positive charge at the N-terminus and partial negative at the C-terminus (~0.5–0.7 charge units)
Helix breakers: Proline (lacks N–H for H-bond, restricted φ) and Glycine (too flexible, entropic penalty)
Helix formers: Ala, Leu, Met, Glu have high helix propensity

$ \text{Pitch} = n \times d = 3.6 \times 1.5\,\text{\AA} = 5.4\,\text{\AA} $

$ \text{H-bond}: \quad \text{C=O}_i \cdots \text{H-N}_{i+4} $

β-Sheet

β-sheets are formed by extended polypeptide strands (β-strands) aligned side-by-side, connected by hydrogen bonds between backbone C=O and N–H groups of adjacent strands.

Antiparallel β-Sheet

Adjacent strands run in opposite directions (N→C next to C→N). Hydrogen bonds are nearly perpendicular to the strand direction, forming a regular pattern. More stable than parallel sheets due to optimal H-bond geometry.

φ, ψ: −139°, +135°

Parallel β-Sheet

Adjacent strands run in the same direction. Hydrogen bonds are angled, making parallel sheets slightly less stable. Require at least ~5 strands to be stable. Common in α/β proteins (e.g., Rossmann fold).

φ, ψ: −119°, +113°

β-Sheet Features

Twist: β-sheets are not flat but have a right-handed twist of ~15–20° per strand when viewed along the strand direction
β-Bulge: A disruption where one residue in a strand does not form a hydrogen bond with its partner, creating a local bulge. Often found at edge strands.
Side chain orientation: R groups alternate above and below the sheet plane

Turns, Loops, and Random Coil

β-Turns

Tight 180° reversals involving 4 residues. Stabilized by an H-bond from C=O of residuei to N–H of residue i+3. Type I: φ₂=−60°, ψ₂=−30°, φ₃=−90°, ψ₃=0°. Type II: φ₂=−60°, ψ₂=120°, φ₃=80°, ψ₃=0°. Glycine is favored at position 3 in Type II turns. Pro is common at position 2.

Ω-Loops

Longer loops (6–16 residues) that resemble the Greek letter Ω. Found on protein surfaces, often form parts of active sites or antigen-binding regions (CDR loops in antibodies). No regular H-bonding pattern.

Random Coil

Regions that lack regular secondary structure. Not truly "random" — they have defined conformations in folded proteins but do not repeat a regular pattern. Intrinsically disordered regions (IDRs) are genuinely dynamic and play roles in signaling.

Ramachandran Plot

The Ramachandran plot maps allowed (φ, ψ) angle combinations for a polypeptide backbone. Most combinations are sterically forbidden due to clashes between backbone atoms and C_β.

Allowed Regions

α-helix region: φ ≈ −57°, ψ ≈ −47°
β-sheet region: φ ≈ −120° to −140°, ψ ≈ +110° to +135°
Left-handed helix: φ ≈ +57°, ψ ≈ +47° (rare, less stable)
Polyproline II: φ ≈ −75°, ψ ≈ +145°

Special Residues

Glycine: No C_β, so virtually all (φ, ψ) regions are accessible. The Ramachandran plot for Gly is much more permissive.
Proline: Ring constrains φ to ~−63° ± 15°, severely restricting its Ramachandran space to a narrow vertical strip.

Derivation: Ramachandran Plot Constraints from Steric Clash Analysis

Starting from the hard-sphere model of atoms, we derive which backbone dihedral angles (φ, ψ) are sterically allowed.

Step 1: Define the steric exclusion criterion

Two non-bonded atoms overlap if their interatomic distance falls below the sum of their van der Waals radii. For atoms i and j:

$$d_{ij} < r_{\text{vdW},i} + r_{\text{vdW},j} \implies \text{steric clash (forbidden)}$$

Step 2: Identify the critical atom pairs

For a general L-amino acid residue with a Cβ atom, the closest contacts occur between backbone carbonyl oxygen O_i and amide hydrogen H_i+1, and between Cβ and backbone atoms of adjacent peptide units. The key van der Waals radii are: C = 1.7 Å, N = 1.55 Å, O = 1.52 Å, H = 1.2 Å.

$$d_{\text{min}}(\text{O} \cdots \text{H}) = r_{\text{O}} + r_{\text{H}} = 1.52 + 1.20 = 2.72\;\text{\AA}$$

Step 3: Express interatomic distances as functions of φ and ψ

Using the rigid peptide plane geometry (bond lengths: N–Cα = 1.47 Å, Cα–C = 1.53 Å, C=O = 1.24 Å, C–N = 1.33 Å; bond angles ~120° at C, ~110° at Cα), the distance between any two atoms separated by rotatable bonds is:

$$d_{ij}(\phi, \psi) = \left|\mathbf{r}_j(\phi, \psi) - \mathbf{r}_i\right| = f(\phi, \psi)$$

Step 4: Enumerate forbidden regions by scanning (φ, ψ) space

For each (φ, ψ) pair from −180° to +180°, compute all pairwise distances. A conformation is disallowed if any contact distance falls below the allowed minimum:

$$\text{Allowed}(\phi, \psi) = \begin{cases} 1 & \text{if } d_{ij}(\phi, \psi) \geq d_{\min} \;\;\forall\; i,j \\ 0 & \text{otherwise} \end{cases}$$

Step 5: Identify the allowed basins

The steric scan reveals only ~20% of (φ, ψ) space is accessible for L-amino acids with a Cβ. The allowed regions cluster into well-defined basins corresponding to known secondary structures:

$$\alpha\text{-helix:}\;\phi \approx -57°,\;\psi \approx -47° \qquad \beta\text{-sheet:}\;\phi \approx -120°,\;\psi \approx +130°$$

Step 6: Special cases — Glycine and Proline

Glycine (R = H) lacks Cβ, eliminating the dominant steric clash and allowing ~60% of (φ, ψ) space. Proline's pyrrolidine ring fixes φ ≈ −63° ± 15° by covalent constraint, restricting Ramachandran space to a narrow vertical strip. This analysis was first performed by Ramachandran, Ramakrishnan, and Sasisekharan (1963).

$$\text{Gly: } \sim 60\%\;\text{allowed} \qquad \text{Ala (general): } \sim 20\%\;\text{allowed} \qquad \text{Pro: } \phi \approx -63° \pm 15°$$

4. Tertiary Structure

Stabilizing Forces

Tertiary structure is the complete three-dimensional arrangement of all atoms in a single polypeptide chain. It is stabilized by a combination of non-covalent and covalent interactions.

Hydrophobic Core Packing

The dominant driving force of protein folding. Nonpolar side chains are buried in the protein interior, away from water. The hydrophobic effect arises primarily from the entropy gain of water molecules released from ordered cages around nonpolar groups. Packing efficiency in the core approaches that of crystalline organic solids (~0.75 packing fraction).

Derivation: Kauzmann's Hydrophobic Effect — Transfer Free Energy

Starting from experimental solubility data, we derive the free energy of transferring nonpolar groups from water to the protein interior.

Step 1: Define the transfer process

Consider transferring a nonpolar solute from water (w) to a nonpolar solvent (np), mimicking burial in the protein core:

$$\text{Solute}_{\text{(water)}} \rightarrow \text{Solute}_{\text{(nonpolar)}} \qquad \Delta G_{\text{transfer}} = \Delta H_{\text{tr}} - T\Delta S_{\text{tr}}$$

Step 2: Relate transfer free energy to solubility

The partition coefficient K_p between nonpolar solvent and water is related to the transfer free energy:

$$\Delta G_{\text{transfer}} = -RT \ln K_p = -RT \ln\frac{[\text{Solute}]_{\text{np}}}{[\text{Solute}]_{\text{water}}}$$

Step 3: Experimental values for amino acid side chains

Nozaki and Tanford (1971) measured ΔG_transfer for amino acid side chain analogs from water to ethanol/octanol. The values scale with the accessible surface area (ASA) buried:

$$\Delta G_{\text{transfer}} \approx -25 \;\text{cal/mol/\AA}^2 \times \Delta\text{ASA}$$

Step 4: Decompose into enthalpic and entropic contributions

Calorimetric measurements reveal the hydrophobic effect is primarily entropic at 25°C. Water molecules form ordered “iceberg” cages around nonpolar groups, losing entropy. Transfer releases these waters:

$$\Delta H_{\text{tr}} \approx 0 \;\text{(near 25°C)} \qquad \Delta S_{\text{tr}} > 0 \;\text{(water released)}$$

$$\therefore\;\Delta G_{\text{transfer}} \approx -T\Delta S_{\text{tr}} < 0 \;\text{(favorable)}$$

Step 5: Sum over all buried residues in a protein

For a typical 150-residue globular protein burying ~10,000 Å² of nonpolar surface area upon folding:

$$\Delta G_{\text{hydrophobic}} = \sum_i \Delta G_{\text{tr},i} \approx -25 \times 10{,}000 \;\text{cal/mol} \approx -250\;\text{kcal/mol}$$

This is the largest single contribution to protein stability, although it is largely offset by the conformational entropy penalty (∼+200 kcal/mol for the same protein), yielding a net stability of only 5–15 kcal/mol.

Hydrogen Bonds

Beyond backbone H-bonds in secondary structure, side chains form H-bonds with each other, backbone atoms, and water molecules. Typical strength: 2–5 kcal/mol. Their contribution to stability is partly offset by the loss of H-bonds to water in the unfolded state.

Salt Bridges (Ion Pairs)

Electrostatic attraction between oppositely charged side chains (e.g., Lys–Asp, Arg–Glu). Strength depends on the dielectric environment: ~1–5 kcal/mol in the protein interior (low dielectric) vs. ~0.5 kcal/mol on the surface (high dielectric, water screening).

Van der Waals Interactions

Individually weak (~0.1–1 kcal/mol), but their vast number (thousands) in a tightly packed core makes them collectively significant. Optimized by close packing of complementary surfaces. Follow a Lennard-Jones potential with attractive r⁻⁶ and repulsive r⁻¹² terms.

Disulfide Bonds (–S–S–)

Covalent bonds between cysteine residues. Common in extracellular proteins (oxidizing environment) but rare in the cytoplasm (reducing environment, maintained by glutathione and thioredoxin). Stabilize the folded state by ~2–5 kcal/mol per bond by reducing the conformational entropy of the unfolded state.

Structural Domains and Motifs

A domain is an independently folding unit, typically 50–300 residues, that often corresponds to a functional module. A motif (or supersecondary structure) is a recognizable combination of secondary structure elements.

Rossmann Fold

Alternating β-α-β units forming a parallel β-sheet. Binds dinucleotides (NAD⁺, FAD). Found in dehydrogenases.

Greek Key

Four antiparallel β-strands arranged in a pattern resembling the Greek key ornament. Common in β-barrel proteins.

β-Barrel

Closed β-sheet curved into a barrel. Examples: TIM barrel (8 parallel strands), OmpF porin (antiparallel), GFP.

Coiled-Coil

Two or more α-helices wound around each other with heptad repeat (abcdefg) — hydrophobic at positions a,d. Leucine zippers, tropomyosin, keratin.

Zinc Finger

Small domain (~30 residues) coordinating Zn²⁺ via Cys/His. C₂H₂ type: DNA-binding transcription factors. C₄ type: nuclear receptors.

Leucine Zipper

Coiled-coil dimerization motif with Leu at every 7th position. Often linked to basic DNA-binding region (bZIP). Examples: Fos/Jun, GCN4.

5. Quaternary Structure

Subunit Assembly and Symmetry

Quaternary structure refers to the arrangement of multiple polypeptide subunits (protomers) into a functional complex. Subunit interfaces are stabilized by the same forces as tertiary structure: hydrophobic packing, H-bonds, salt bridges, and sometimes disulfide bonds.

C₂ Symmetry

Single 2-fold rotation axis. Common in homodimers. Example: HIV protease (C₂ homodimer with active site at the interface).

D₂ (Dihedral) Symmetry

Three perpendicular 2-fold axes. Tetramers like hemoglobin (α₂β₂) have pseudo-D₂ symmetry (not true because α ≠ β).

Cubic Symmetry

Higher-order: tetrahedral (T, 12-mer), octahedral (O, 24-mer), icosahedral (I, 60-mer). Viral capsids often use icosahedral symmetry.

Cooperativity — Hemoglobin as a Paradigm

Hemoglobin (α₂β₂) is the classic example of cooperative ligand binding. O₂ binding to one subunit increases the affinity of neighboring subunits through conformational changes that shift the equilibrium between the T state (tense, low affinity) and R state (relaxed, high affinity).

The Hill Equation

$ Y = \frac{[L]^{n_H}}{K_d + [L]^{n_H}} = \frac{(p\text{O}_2)^{n_H}}{(P_{50})^{n_H} + (p\text{O}_2)^{n_H}} $

$ \log\frac{Y}{1-Y} = n_H \log[L] - n_H \log K_d $

Where Y is fractional saturation, n_H is the Hill coefficient (a measure of cooperativity), and P₅₀ is the partial pressure at half-saturation. For hemoglobin, n_H ≈ 2.8 (maximum possible = 4 for 4 binding sites). Myoglobin has n_H = 1 (no cooperativity).

Derivation: Hill Equation for Cooperative Binding

Starting from a simplified model where n ligand molecules bind simultaneously to a macromolecule.

Step 1: Write the all-or-none binding equilibrium

Assume all n binding sites are filled in a single concerted step (the Hill approximation):

$$P + nL \rightleftharpoons PL_n \qquad K_d = \frac{[P][L]^n}{[PL_n]}$$

Step 2: Define fractional saturation Y

The fractional saturation is the ratio of occupied binding sites to total sites:

$$Y = \frac{[PL_n]}{[P] + [PL_n]}$$

Step 3: Substitute the equilibrium expression

From the equilibrium: [PL_n] = [P][L]ⁿ/K_d. Substituting:

$$Y = \frac{[P][L]^n / K_d}{[P] + [P][L]^n / K_d} = \frac{[L]^n / K_d}{1 + [L]^n / K_d}$$

Step 4: Simplify to the Hill equation

Multiply numerator and denominator by K_d and write K_d = (K_0.5)ⁿ where K_0.5 is the ligand concentration at half-saturation:

$$Y = \frac{[L]^n}{K_d + [L]^n} = \frac{[L]^n}{(K_{0.5})^n + [L]^n}$$

Step 5: Derive the Hill plot (linearized form)

Take the ratio Y/(1−Y) and apply logarithms:

$$\frac{Y}{1-Y} = \frac{[L]^n}{K_d} \quad \Longrightarrow \quad \log\frac{Y}{1-Y} = n\log[L] - \log K_d$$

A plot of log(Y/(1−Y)) vs log[L] gives a straight line with slope n_H (the Hill coefficient). For hemoglobin, n_H ≈ 2.8 indicates strong positive cooperativity (4 sites, but binding is not perfectly concerted).

Step 6: Verify half-saturation condition

When Y = 0.5, confirm that [L] = K_0.5 (= P₅₀ for O₂ binding):

$$0.5 = \frac{[L]^n}{K_{0.5}^n + [L]^n} \implies K_{0.5}^n = [L]^n \implies [L] = K_{0.5} \;\checkmark$$

T ↔ R Transition

In the T state, a salt bridge between His HC3(β146) and Asp FG1(β94) constrains the structure. O₂ binding to Fe²⁺ pulls the iron into the porphyrin plane, shifting helix F, breaking the salt bridge, and destabilizing the T state. BPG (2,3-bisphosphoglycerate) binds in the central cavity of the T state, stabilizing it and lowering O₂ affinity (right-shifting the binding curve).

Bohr Effect

Low pH and high CO₂ promote O₂ release by stabilizing the T state. CO₂ binds to N-terminal amino groups forming carbamate, and H⁺ protonates His146, strengthening the salt bridge. This facilitates O₂ delivery to metabolically active tissues (low pH, high CO₂).

MWC Concerted Model

Monod, Wyman & Changeux (1965). All subunits switch states simultaneously. The protein exists in equilibrium between T and R states (characterized by the allosteric constant L = [T₀]/[R₀]). Ligand binds preferentially to R, shifting the equilibrium.

$ Y = \frac{L c \alpha (1+c\alpha)^{n-1} + \alpha(1+\alpha)^{n-1}}{L(1+c\alpha)^n + (1+\alpha)^n} $

L = [T]/[R], c = K_R/K_T, α = [S]/K_R

KNF Sequential Model

Koshland, Némethy & Filmer (1966). Induced fit: each subunit changes conformation individually upon ligand binding, and this conformational change affects neighboring subunits. Does not require pre-existing T/R equilibrium. Can explain negative cooperativity (unlike MWC), as seen in aspartate transcarbamoylase with CTP inhibition.

6. Protein Folding

Thermodynamic Principles

Anfinsen's Thermodynamic Hypothesis

Christian Anfinsen demonstrated (1961, Nobel Prize 1972) that denatured ribonuclease A could refold spontaneously to its active conformation. This proved the native structure represents the global free energy minimum of the polypeptide chain — all information needed for folding is contained in the amino acid sequence.

Levinthal's Paradox

A 100-residue protein with 3 possible conformations per residue would need to sample 3¹⁰⁰ ≈ 5 × 10⁴⁷ states. At 10¹³ transitions/second, this would take ~10²⁷ years — far exceeding the age of the universe. Yet proteins fold in milliseconds to seconds. This paradox demonstrates that folding cannot be a random search but must follow directed pathways.

Derivation: Levinthal's Paradox — Random Search Folding Time

Starting from the assumption that protein folding proceeds by random conformational search, we estimate the time required and show it is astronomically large.

Step 1: Count conformational degrees of freedom

Each residue has two backbone dihedral angles (φ, ψ). Assume each angle can adopt approximately 3 discrete states (gauche+, gauche−, trans). For a protein of N residues:

$$\Omega = 3^{2N} = 9^N \approx 10^{0.95N}$$

Step 2: Estimate the conformational sampling rate

Bond rotations occur on the picosecond timescale. Each new conformation is sampled in approximately:

$$\tau_{\text{step}} \approx 10^{-13}\;\text{s (100 fs per rotational isomerization)}$$

Step 3: Calculate total search time

If the protein must sample all conformations to find the native state by random search:

$$t_{\text{fold}} = \Omega \times \tau_{\text{step}} = 10^{0.95N} \times 10^{-13}\;\text{s} \approx 10^{(0.95N - 13)}\;\text{s}$$

Step 4: Apply to a small protein (N = 100 residues)

For a typical 100-residue protein:

$$t_{\text{fold}} = 10^{(0.95 \times 100 - 13)} = 10^{82}\;\text{s}$$

Compare: the age of the universe is only ~4.3 × 10¹⁷ s ≈ 10^17.6 s. Random search would take 10⁶⁴ times the age of the universe!

Step 5: The general formula

Using a more refined estimate with ~3 states per bond and 2 bonds per residue, Levinthal's estimate is often written as:

$$t_{\text{random}} \sim 10^{(N - 8)}\;\text{s} \quad \text{(simplified approximation)}$$

Step 6: Resolution — the folding funnel

Since real proteins fold in 10⁻³ to 10⁰ s, folding cannot be a random search. The resolution is that the energy landscape is funnel-shaped: local interactions rapidly form, progressively constraining the search space. At each step, the number of accessible conformations decreases dramatically:

$$t_{\text{actual}} \sim N / k_{\text{local}} \sim 10^{-6}\;\text{to}\;10^{0}\;\text{s} \ll 10^{82}\;\text{s}$$

Gibbs Free Energy of Folding

$ \Delta G_{\text{fold}} = \Delta H_{\text{fold}} - T\Delta S_{\text{fold}} $

$ \Delta G_{\text{fold}} = G_{\text{native}} - G_{\text{unfolded}} \approx -5 \text{ to } -15 \text{ kcal/mol} $

The net stability of a folded protein is surprisingly marginal: only 5–15 kcal/mol, which is the small difference between large opposing terms. Folding is enthalpically driven (H-bonds, van der Waals, electrostatics) but entropically opposed (loss of chain conformational entropy). The hydrophobic effect provides a favorable entropy term (release of ordered water).

Derivation: Protein Stability — Decomposition of ΔG_folding

Starting from the individual thermodynamic contributions, we derive the net free energy of protein folding as a sum of competing terms.

Step 1: Identify all contributing free energy terms

The total free energy of folding is the sum of all interactions gained and entropy lost upon transitioning from unfolded (U) to native (N) state:

$$\Delta G_{\text{fold}} = G_N - G_U = \Delta G_{\text{H-bond}} + \Delta G_{\text{hydrophobic}} + \Delta G_{\text{conf.entropy}} + \Delta G_{\text{vdW}} + \Delta G_{\text{electrostatic}}$$

Step 2: Hydrogen bond contribution

Each intramolecular H-bond in the folded state replaces an H-bond to water in the unfolded state. The net contribution per H-bond is small but collectively significant. For ~200 backbone H-bonds in a 150-residue protein:

$$\Delta G_{\text{H-bond}} \approx n_{\text{H-bond}} \times (\Delta G_{\text{intra}} - \Delta G_{\text{water}}) \approx 200 \times (-0.5) = -100\;\text{kcal/mol}$$

Step 3: Hydrophobic contribution (favorable)

Burial of nonpolar surface area drives folding through the hydrophobic effect. Using the Kauzmann transfer free energy:

$$\Delta G_{\text{hydrophobic}} \approx -25\;\text{cal/mol/\AA}^2 \times \Delta\text{ASA}_{\text{nonpolar}} \approx -250\;\text{kcal/mol}$$

Step 4: Conformational entropy penalty (unfavorable)

Each residue loses ~4.3 cal/mol/K of conformational entropy upon folding (from ~9 backbone states to 1). For N residues at T = 300 K:

$$\Delta G_{\text{conf.entropy}} = -T\Delta S_{\text{conf}} \approx N \times T \times 4.3\;\text{cal/mol/K} \approx 150 \times 300 \times 4.3 \approx +194\;\text{kcal/mol}$$

Step 5: Van der Waals packing contribution

Tight packing in the core provides favorable van der Waals contacts. Each atom contributes ~0.03 kcal/mol, and a 150-residue protein has ~1,000 atoms in the core:

$$\Delta G_{\text{vdW}} \approx -0.03 \times n_{\text{contacts}} \approx -40\;\text{kcal/mol}$$

Step 6: Net stability is marginal

Summing all terms reveals that protein stability is a small difference between large opposing numbers:

$$\Delta G_{\text{fold}} \approx (-100) + (-250) + (+194) + (-40) + (+186) \approx -10\;\text{kcal/mol}$$

The ∼+186 kcal/mol includes additional unfavorable terms (loss of solvation entropy for polar groups, backbone strain, etc.). This marginal stability (−5 to −15 kcal/mol) is biologically crucial: it allows proteins to be conformationally dynamic, to be regulated by post-translational modifications, and to be degraded when no longer needed.

Energy Landscape and Folding Pathways

Folding Funnel Model

Modern view: the energy landscape is a rugged funnel. The unfolded state sits at the top (high energy, high entropy, many conformations). As the protein folds, it descends the funnel toward the native state at the bottom (low energy, low entropy, single conformation). The funnel shape ensures that many different starting conformations converge on the native state through multiple parallel pathways.

Local minima on the funnel surface correspond to kinetic traps — misfolded intermediates that must overcome energy barriers to continue folding.

Molten Globule Intermediate

An early folding intermediate with native-like secondary structure and overall compactness but without the tight packing and fixed tertiary contacts of the native state. It has a hydrophobic core but with "liquid-like" interior packing. Observable experimentally by circular dichroism (secondary structure present), ANS fluorescence (exposed hydrophobic patches), and hydrodynamic radius (compact but expanded relative to native).

Molecular Chaperones

While Anfinsen showed small proteins can refold spontaneously in vitro, many proteins require assistance in the crowded cellular environment (~300 mg/mL total protein). Chaperones prevent aggregation and provide protected folding environments.

Hsp70 System (DnaK/DnaJ/GrpE in bacteria)

Mechanism: Hsp70 binds exposed hydrophobic segments of nascent or misfolded proteins via its substrate-binding domain. The ATP cycle controls substrate affinity: ATP-bound state has an open lid with fast on/off rates; DnaJ (Hsp40 co-chaperone) stimulates ATP hydrolysis, trapping the substrate under a closed lid; GrpE promotes ADP release and substrate release. Multiple bind-release cycles give the substrate opportunities to fold. Prevents aggregation of ~15–20% of newly synthesized E. coli proteins.

GroEL/GroES System (Hsp60/Hsp10 — the Anfinsen Cage)

Structure: GroEL is a double-ring barrel, each ring with 7 subunits forming a central cavity. GroES is a dome-shaped heptameric lid. Mechanism: Misfolded protein (<60 kDa) binds to hydrophobic residues lining the GroEL cavity (cis ring). ATP and GroES binding trigger a massive conformational change — the cavity doubles in volume and switches from hydrophobic to hydrophilic walls. The encapsulated protein folds in isolation for ~10 seconds (the time of ATP hydrolysis). Then GroES and substrate are released. If not yet folded, the protein can rebind for another cycle. Essential for ~10% of E. coli proteins, notably TIM barrel proteins.

Hsp90

A homodimeric chaperone that acts late in the folding pathway, stabilizing near-native conformations of specific clients: steroid hormone receptors, kinases, p53 tumor suppressor. Requires co-chaperones (Hop, p23, Cdc37). The Hsp90 ATPase cycle is slow and regulated. Target of anti-cancer drug geldanamycin, which blocks the ATP-binding pocket.

Protein Misfolding Diseases

When proteins fail to fold correctly, they can aggregate into toxic structures. Many neurodegenerative diseases are caused by accumulation of misfolded protein aggregates.

Amyloid Fibrils

Cross-β structure: β-strands perpendicular to the fibril axis, forming a continuous hydrogen-bonded sheet. Extremely stable (resistant to proteases, detergents). Detected by Congo red birefringence and ThT fluorescence. The amyloid fold is a generic property of polypeptide chains — many proteins can form amyloids under appropriate conditions.

Prion Diseases

Caused by PrP^C (normal, α-helix rich) converting to PrP^Sc (misfolded, β-sheet rich). PrP^Sc acts as a template, catalyzing the conversion of PrP^C — a self-propagating conformational change. Diseases: Creutzfeldt-Jakob (CJD), bovine spongiform encephalopathy (BSE), kuru, scrapie. Transmissible without nucleic acid (protein-only hypothesis, Stanley Prusiner, Nobel 1997).

Alzheimer's Disease

Accumulation of amyloid-β (Aβ) peptide (40–42 residues, cleaved from APP by β- and γ-secretases) into extracellular plaques, and intracellular neurofibrillary tangles of hyperphosphorylated tau protein. Aβ42 is more aggregation-prone than Aβ40. Oligomeric intermediates may be more toxic than mature fibrils.

Parkinson's Disease

Aggregation of α-synuclein into Lewy bodies in dopaminergic neurons of the substantia nigra. α-Synuclein is intrinsically disordered in its native state and adopts β-sheet structure upon aggregation. Point mutations (A53T, A30P, E46K) increase aggregation propensity. Propagation between neurons suggests prion-like behavior.

7. Enzyme Catalysis

Principles of Enzymatic Catalysis

Enzymes accelerate reactions by factors of 10⁶–10¹⁷ without altering the equilibrium. They achieve this by lowering the activation energy (ΔG^‡) through stabilization of the transition state.

$ k = \frac{k_B T}{h} e^{-\Delta G^\ddagger / RT} $

Eyring equation: rate constant depends exponentially on activation energy

Transition State Stabilization

Enzymes bind the transition state more tightly than the substrate or product (Pauling's postulate). The enzyme active site is complementary to the transition state geometry, not the ground state. This is why transition state analogs are potent inhibitors (e.g., phosphonamidate for carboxypeptidase A).

Proximity & Orientation (Propinquity)

Binding brings reactive groups into close proximity in the correct orientation, converting an intermolecular reaction (entropically unfavorable) to an effectively intramolecular one. Estimated to contribute 10²–10⁵-fold rate enhancement. The effective molarity of reactants in the active site can exceed 10 M.

General Acid-Base Catalysis

Amino acid side chains donate or accept protons during the reaction, stabilizing charged transition states. His (pK_a ~ 6) is the ideal proton shuttle at physiological pH. Glu, Asp, Lys, Cys, and Tyr also participate. Water molecules in the active site can serve as proton relays.

Covalent Catalysis

A nucleophilic group on the enzyme attacks the substrate, forming a transient covalent enzyme-substrate intermediate. The intermediate then breaks down to release product. Examples: Ser in serine proteases, Cys in cysteine proteases, Lys forming Schiff bases (aldolase, transaminases), His in phosphotransferases.

Metal Ion Catalysis

Metal ions (Zn²⁺, Mg²⁺, Mn²⁺, Fe^2+/3+) can stabilize negative charges on intermediates (Lewis acid catalysis), generate nucleophiles by lowering the pK_a of bound water (Zn²⁺-OH in carbonic anhydrase), participate in redox reactions, or orient substrates. ~30% of all enzymes require metal ions.

Electrostatic Catalysis

The low dielectric environment of the active site amplifies electrostatic interactions. Oxyanion holes stabilize developing negative charges on tetrahedral intermediates using backbone N–H groups (serine proteases) or positively charged residues. Helix dipoles can stabilize charged intermediates.

Serine Protease Mechanism — The Catalytic Triad

Serine proteases (chymotrypsin, trypsin, elastase, subtilisin) use a conserved catalytic triad of Ser-His-Asp to cleave peptide bonds. This is one of the best-studied enzyme mechanisms in biochemistry.

Step 1: Nucleophilic Attack (Acylation)

Asp102 orients and polarizes His57 via a low-barrier hydrogen bond. His57 acts as a general base, abstracting a proton from Ser195's hydroxyl, making it a potent nucleophile. The activated Ser195 attacks the carbonyl carbon of the scissile peptide bond, forming a tetrahedral intermediate. The developing negative charge on the carbonyl oxygen is stabilized by the oxyanion hole (backbone N–H of Gly193 and Ser195).

Step 2: Collapse & Amine Leaving

The tetrahedral intermediate collapses: His57 donates a proton (general acid catalysis) to the leaving group amine nitrogen. The C-terminal fragment departs. An acyl-enzyme intermediate remains (ester bond between Ser195 and the N-terminal fragment of the substrate).

Step 3: Deacylation

Water enters the active site and is activated by His57 (general base). The resulting hydroxide attacks the acyl-enzyme ester, forming a second tetrahedral intermediate (again stabilized by the oxyanion hole). This intermediate collapses, releasing the N-terminal product and regenerating free Ser195. The enzyme returns to its resting state.

Substrate Specificity

Determined by the S1 binding pocket: Chymotrypsin has a large hydrophobic pocket (cleaves after Phe, Trp, Tyr). Trypsin has Asp189 at the base of S1 (cleaves after Lys, Arg — positive charges). Elastase has Val/Thr partially blocking S1 (cleaves after small residues: Ala, Gly, Ser). Same mechanism, different specificity — a beautiful example of evolutionary divergence.

Python: Ramachandran Plot & Cooperative Oxygen Binding

Ramachandran Plot: Allowed Backbone Conformations

Python

Compute steric clash map and plot allowed phi/psi regions for general, glycine, and proline residues

script.py121 lines

#!/usr/bin/env python3
"""ramachandran.py - Compute and plot Ramachandran diagram (allowed phi,psi regions)"""
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

# --- Ramachandran allowed regions (empirical boundaries) ---
# We model allowed regions as elliptical zones in (phi, psi) space
# based on steric clash analysis of backbone conformations.

phi = np.linspace(-180, 180, 360)
psi = np.linspace(-180, 180, 360)
PHI, PSI = np.meshgrid(phi, psi)

def ellipse_mask(phi_c, psi_c, dphi, dpsi, grid_phi, grid_psi):
    """Return a mask for an elliptical allowed region."""
    # Handle wrapping at +/-180
    dphi_arr = grid_phi - phi_c
    dpsi_arr = grid_psi - psi_c
    dphi_arr = np.where(dphi_arr > 180, dphi_arr - 360, dphi_arr)
    dphi_arr = np.where(dphi_arr < -180, dphi_arr + 360, dphi_arr)
    dpsi_arr = np.where(dpsi_arr > 180, dpsi_arr - 360, dpsi_arr)
    dpsi_arr = np.where(dpsi_arr < -180, dpsi_arr + 360, dpsi_arr)
    return (dphi_arr / dphi)**2 + (dpsi_arr / dpsi)**2

# --- General amino acids (all except Gly, Pro) ---
# Core allowed regions
general = np.zeros_like(PHI)
# Right-handed alpha-helix
general += (ellipse_mask(-63, -43, 25, 25, PHI, PSI) <= 1).astype(float)
# Beta-sheet
general += (ellipse_mask(-120, 130, 35, 30, PHI, PSI) <= 1).astype(float)
# Left-handed alpha-helix (rare)
general += (ellipse_mask(57, 47, 20, 20, PHI, PSI) <= 1).astype(float) * 0.5
# Polyproline II
general += (ellipse_mask(-75, 150, 15, 15, PHI, PSI) <= 1).astype(float) * 0.6

# Generously allowed (slightly larger, lower density)
general_generous = np.zeros_like(PHI)
general_generous += (ellipse_mask(-63, -43, 40, 40, PHI, PSI) <= 1).astype(float) * 0.3
general_generous += (ellipse_mask(-120, 130, 50, 45, PHI, PSI) <= 1).astype(float) * 0.3
general_generous += (ellipse_mask(-80, 0, 30, 25, PHI, PSI) <= 1).astype(float) * 0.2

general_total = np.clip(general + general_generous, 0, 1)

# --- Glycine (no Cb, much more allowed) ---
glycine = np.zeros_like(PHI)
glycine += (ellipse_mask(-63, -43, 50, 50, PHI, PSI) <= 1).astype(float)
glycine += (ellipse_mask(63, 43, 50, 50, PHI, PSI) <= 1).astype(float)
glycine += (ellipse_mask(-120, 130, 60, 50, PHI, PSI) <= 1).astype(float)
glycine += (ellipse_mask(120, -130, 60, 50, PHI, PSI) <= 1).astype(float)
glycine += (ellipse_mask(-75, 150, 35, 35, PHI, PSI) <= 1).astype(float) * 0.7
glycine += (ellipse_mask(75, -150, 35, 35, PHI, PSI) <= 1).astype(float) * 0.7
glycine += (ellipse_mask(-80, 0, 40, 35, PHI, PSI) <= 1).astype(float) * 0.5
glycine += (ellipse_mask(80, 0, 40, 35, PHI, PSI) <= 1).astype(float) * 0.5
glycine = np.clip(glycine, 0, 1)

# --- Proline (phi restricted to ~-63 +/- 15) ---
proline = np.zeros_like(PHI)
phi_mask = np.exp(-0.5 * ((PHI + 63) / 12)**2)  # Narrow Gaussian at phi=-63
proline += phi_mask * (ellipse_mask(-63, -43, 15, 25, PHI, PSI) <= 1).astype(float)
proline += phi_mask * (ellipse_mask(-63, 145, 15, 25, PHI, PSI) <= 1).astype(float)
proline = np.clip(proline, 0, 1)

# --- Plot ---
fig, axes = plt.subplots(1, 3, figsize=(15, 5), dpi=100)
fig.patch.set_facecolor('#0f172a')

titles = ['General Residues', 'Glycine', 'Proline']
data = [general_total, glycine, proline]
cmaps = ['YlOrRd', 'GnBu', 'PuRd']

for ax, d, title, cmap in zip(axes, data, titles, cmaps):
    ax.set_facecolor('#1e293b')
    ax.contourf(PHI, PSI, d, levels=[0.15, 0.4, 0.7, 1.0],
                colors=['#4a2040', '#8b3a62', '#d4567a', '#ff8fab'] if cmap == 'PuRd'
                else (['#2a3a20', '#5a7a40', '#8bc060', '#b8f080'] if cmap == 'GnBu'
                else ['#3a2a10', '#7a5a20', '#c09040', '#ffc060']),
                alpha=0.85)
    ax.contour(PHI, PSI, d, levels=[0.15, 0.4, 0.7],
               colors='white', linewidths=0.5, alpha=0.4)

# Mark key regions
    if title == 'General Residues':
        ax.plot(-63, -43, 'w*', markersize=10, label='alpha-helix')
        ax.plot(-120, 130, 'ws', markersize=7, label='beta-sheet')
        ax.plot(57, 47, 'w^', markersize=6, label='L-alpha (rare)')
        ax.legend(loc='lower left', fontsize=7, facecolor='#1e293b',
                  edgecolor='gray', labelcolor='white')

ax.set_xlim(-180, 180)
    ax.set_ylim(-180, 180)
    ax.set_xlabel('phi (degrees)', color='white', fontsize=10)
    ax.set_ylabel('psi (degrees)', color='white', fontsize=10)
    ax.set_title(title, color='white', fontsize=12, fontweight='bold')
    ax.axhline(0, color='gray', linewidth=0.3, alpha=0.5)
    ax.axvline(0, color='gray', linewidth=0.3, alpha=0.5)
    ax.set_xticks([-180, -90, 0, 90, 180])
    ax.set_yticks([-180, -90, 0, 90, 180])
    ax.tick_params(colors='white', labelsize=8)
    for spine in ax.spines.values():
        spine.set_color('gray')
        spine.set_alpha(0.5)

fig.suptitle('Ramachandran Plots: Allowed Backbone Conformations',
             color='white', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig('output.png', dpi=100, bbox_inches='tight',
            facecolor='#0f172a', edgecolor='none')
print("Ramachandran plot saved.")
print()
print("KEY REGIONS:")
print(f"  alpha-helix:  phi=-63, psi=-43  (3.6 res/turn, i->i+4 H-bonds)")
print(f"  beta-sheet:   phi=-120, psi=130 (extended, inter-strand H-bonds)")
print(f"  left-alpha:   phi=+57, psi=+47  (rare, energetically less favorable)")
print(f"  PPII helix:   phi=-75, psi=+150 (polyproline II, found in collagen)")
print()
print("Glycine: ~3x more allowed phi,psi space (no Cb steric clash)")
print("Proline: phi restricted to ~-63 +/- 15 (pyrrolidine ring constraint)")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Cooperative Oxygen Binding: Hill Equation & T/R States

Python

Simulate hemoglobin vs myoglobin O2 binding curves with Hill equation and MWC model

script.py137 lines

#!/usr/bin/env python3
"""cooperative_binding.py - Hill equation & MWC model for hemoglobin O2 binding"""
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

# --- Hill Equation ---
def hill(pO2, P50, nH):
    """Fractional saturation Y as a function of pO2."""
    return pO2**nH / (P50**nH + pO2**nH)

# --- MWC (Monod-Wyman-Changeux) Concerted Model ---
def mwc(pO2, L, KR, KT, n=4):
    """
    MWC model for allosteric binding.
    L = [T0]/[R0], allosteric constant
    KR = dissociation constant for R state
    KT = dissociation constant for T state
    n = number of subunits
    """
    alpha = pO2 / KR
    c = KR / KT
    numerator = L * c * alpha * (1 + c * alpha)**(n-1) + alpha * (1 + alpha)**(n-1)
    denominator = L * (1 + c * alpha)**n + (1 + alpha)**n
    return numerator / denominator

pO2 = np.linspace(0, 150, 500)

# Binding curves
Y_myoglobin = hill(pO2, P50=2.8, nH=1.0)
Y_hemoglobin = hill(pO2, P50=26, nH=2.8)
Y_hb_no_coop = hill(pO2, P50=26, nH=1.0)  # hypothetical non-cooperative

# MWC model parameters for hemoglobin
Y_mwc_normal = mwc(pO2, L=9054, KR=2.5, KT=100, n=4)
Y_mwc_bpg = mwc(pO2, L=50000, KR=2.5, KT=100, n=4)     # +BPG shifts right
Y_mwc_no_bpg = mwc(pO2, L=500, KR=2.5, KT=100, n=4)     # no BPG shifts left

# --- Plot 1: Hill equation comparison ---
fig, axes = plt.subplots(1, 3, figsize=(16, 5), dpi=100)
fig.patch.set_facecolor('#0f172a')

ax = axes[0]
ax.set_facecolor('#1e293b')
ax.plot(pO2, Y_myoglobin, color='#60a5fa', linewidth=2.5, label='Myoglobin (nH=1.0)')
ax.plot(pO2, Y_hemoglobin, color='#f87171', linewidth=2.5, label='Hemoglobin (nH=2.8)')
ax.plot(pO2, Y_hb_no_coop, color='#fbbf24', linewidth=1.5, linestyle='--',
        label='Hb hypothetical (nH=1.0)')
ax.axhline(0.5, color='gray', linewidth=0.5, linestyle=':')
ax.axvline(26, color='#f87171', linewidth=0.5, linestyle=':', alpha=0.5)
ax.axvline(2.8, color='#60a5fa', linewidth=0.5, linestyle=':', alpha=0.5)

# Mark tissue and lung pO2
ax.axvspan(20, 40, alpha=0.08, color='red', label='Tissues (~20-40 mmHg)')
ax.axvspan(80, 110, alpha=0.08, color='blue', label='Lungs (~100 mmHg)')

ax.set_xlabel('pO2 (mmHg)', color='white', fontsize=10)
ax.set_ylabel('Fractional Saturation (Y)', color='white', fontsize=10)
ax.set_title('Hill Equation: O2 Binding', color='white', fontsize=12, fontweight='bold')
ax.legend(fontsize=7, facecolor='#1e293b', edgecolor='gray', labelcolor='white')
ax.set_xlim(0, 150)
ax.set_ylim(0, 1.05)
ax.tick_params(colors='white', labelsize=8)
for spine in ax.spines.values():
    spine.set_color('gray')
    spine.set_alpha(0.5)

# --- Plot 2: Hill plot (log-log) ---
ax = axes[1]
ax.set_facecolor('#1e293b')
pO2_hill = np.linspace(1, 100, 200)
for nH, P50, color, label in [(1.0, 2.8, '#60a5fa', 'Myoglobin'),
                                (2.8, 26, '#f87171', 'Hemoglobin'),
                                (4.0, 26, '#a78bfa', 'Max coop (nH=4)')]:
    Y = hill(pO2_hill, P50, nH)
    # log(Y/(1-Y)) vs log(pO2)
    mask = (Y > 0.01) & (Y < 0.99)
    log_ratio = np.log10(Y[mask] / (1 - Y[mask]))
    log_pO2 = np.log10(pO2_hill[mask])
    ax.plot(log_pO2, log_ratio, color=color, linewidth=2, label=f'{label} (nH={nH})')

ax.axhline(0, color='gray', linewidth=0.5, linestyle=':')
ax.set_xlabel('log(pO2)', color='white', fontsize=10)
ax.set_ylabel('log(Y / (1-Y))', color='white', fontsize=10)
ax.set_title('Hill Plot', color='white', fontsize=12, fontweight='bold')
ax.legend(fontsize=7, facecolor='#1e293b', edgecolor='gray', labelcolor='white')
ax.set_ylim(-3, 3)
ax.tick_params(colors='white', labelsize=8)
for spine in ax.spines.values():
    spine.set_color('gray')
    spine.set_alpha(0.5)

# --- Plot 3: MWC model with different L ---
ax = axes[2]
ax.set_facecolor('#1e293b')
ax.plot(pO2, Y_mwc_no_bpg, color='#34d399', linewidth=2, label='Low L (no BPG)')
ax.plot(pO2, Y_mwc_normal, color='#f87171', linewidth=2.5, label='Normal (L=9054)')
ax.plot(pO2, Y_mwc_bpg, color='#818cf8', linewidth=2, label='High L (+BPG, low pH)')
ax.axhline(0.5, color='gray', linewidth=0.5, linestyle=':')

ax.set_xlabel('pO2 (mmHg)', color='white', fontsize=10)
ax.set_ylabel('Fractional Saturation (Y)', color='white', fontsize=10)
ax.set_title('MWC Model: BPG & Bohr Effect', color='white', fontsize=12, fontweight='bold')
ax.legend(fontsize=7, facecolor='#1e293b', edgecolor='gray', labelcolor='white')
ax.set_xlim(0, 150)
ax.set_ylim(0, 1.05)
ax.tick_params(colors='white', labelsize=8)
for spine in ax.spines.values():
    spine.set_color('gray')
    spine.set_alpha(0.5)

plt.tight_layout()
plt.savefig('output.png', dpi=100, bbox_inches='tight',
            facecolor='#0f172a', edgecolor='none')

print("Oxygen binding simulation complete.")
print()
print("HILL EQUATION RESULTS:")
print(f"  Myoglobin:   P50 = 2.8 mmHg, nH = 1.0 (hyperbolic, no cooperativity)")
print(f"  Hemoglobin:  P50 = 26  mmHg, nH = 2.8 (sigmoidal, positive cooperativity)")
print(f"  Max possible nH for 4 sites = 4.0")
print()
print("PHYSIOLOGICAL SIGNIFICANCE:")
Y_lungs_mb = hill(100, 2.8, 1.0)
Y_tissue_mb = hill(30, 2.8, 1.0)
Y_lungs_hb = hill(100, 26, 2.8)
Y_tissue_hb = hill(30, 26, 2.8)
print(f"  Myoglobin:   Y(lungs)={Y_lungs_mb:.3f}, Y(tissue)={Y_tissue_mb:.3f}, delivery={Y_lungs_mb-Y_tissue_mb:.3f}")
print(f"  Hemoglobin:  Y(lungs)={Y_lungs_hb:.3f}, Y(tissue)={Y_tissue_hb:.3f}, delivery={Y_lungs_hb-Y_tissue_hb:.3f}")
print(f"  Hemoglobin delivers {(Y_lungs_hb-Y_tissue_hb)/(Y_lungs_mb-Y_tissue_mb):.1f}x more O2 to tissues!")
print()
print("MWC MODEL:")
print(f"  L = [T0]/[R0]: controls sigmoid shape")
print(f"  BPG increases L -> right-shifted curve (lower affinity, better delivery)")
print(f"  Bohr effect (low pH) also increases L")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Fortran: Amino Acid pI Calculator

This Fortran program computes the isoelectric point (pI) and net charge at physiological pH for a given protein sequence, using the Henderson-Hasselbalch equation with standard pK_a values for all ionizable groups.

Fortran: Protein Isoelectric Point (pI) Calculator

Fortran

Calculate pI from pKa values using bisection method for a given protein sequence

program.f90168 lines

program protein_pi_calculator
  ! Calculate isoelectric point (pI) and charge vs pH for a protein sequence
  ! Uses Henderson-Hasselbalch equation with standard pKa values
  ! pI found by bisection: find pH where net_charge = 0
  implicit none

character(len=200) :: sequence
  integer :: seq_len, i, n_steps
  real(8) :: pH, charge, pI_low, pI_high, pI_mid
  real(8) :: charge_low, charge_mid, tol
  integer :: n_asp, n_glu, n_his, n_cys, n_tyr, n_lys, n_arg
  real(8) :: pKa_nterm, pKa_cterm
  real(8) :: pKa_asp, pKa_glu, pKa_his, pKa_cys, pKa_tyr, pKa_lys, pKa_arg

! Standard pKa values
  pKa_nterm = 9.69d0   ! alpha-amino
  pKa_cterm = 2.34d0   ! alpha-carboxyl
  pKa_asp   = 3.65d0
  pKa_glu   = 4.25d0
  pKa_his   = 6.00d0
  pKa_cys   = 8.18d0
  pKa_tyr   = 10.07d0
  pKa_lys   = 10.53d0
  pKa_arg   = 12.48d0

! Example sequences
  character(len=200) :: sequences(4)
  character(len=30) :: names(4)
  integer :: s

sequences(1) = 'MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH'
  names(1) = 'Hemoglobin alpha (first 50)'
  sequences(2) = 'MKWVTFISLLFLFSSAYSRGVFRRDAHKSEVAHRFKDLGEENFKALVLIA'
  names(2) = 'BSA (first 49)'
  sequences(3) = 'KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHESLADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTTQANKHIIVACEGNPYVPVHFDASV'
  names(3) = 'Lysozyme (partial)'
  sequences(4) = 'DDDDEEEEKKKRRR'
  names(4) = 'Charged test peptide'

print '(A)', '============================================================'
  print '(A)', '  PROTEIN ISOELECTRIC POINT (pI) CALCULATOR'
  print '(A)', '  Using Henderson-Hasselbalch with standard pKa values'
  print '(A)', '============================================================'
  print '(A)', ''

do s = 1, 4
    sequence = sequences(s)
    seq_len = len_trim(sequence)

! Count ionizable residues
    n_asp = 0; n_glu = 0; n_his = 0; n_cys = 0
    n_tyr = 0; n_lys = 0; n_arg = 0

do i = 1, seq_len
      select case (sequence(i:i))
        case ('D'); n_asp = n_asp + 1
        case ('E'); n_glu = n_glu + 1
        case ('H'); n_his = n_his + 1
        case ('C'); n_cys = n_cys + 1
        case ('Y'); n_tyr = n_tyr + 1
        case ('K'); n_lys = n_lys + 1
        case ('R'); n_arg = n_arg + 1
      end select
    end do

print '(A,A)', 'Protein: ', trim(names(s))
    print '(A,A)', 'Sequence: ', trim(sequence)
    print '(A,I4)', 'Length: ', seq_len
    print '(A)', 'Ionizable residue counts:'
    print '(A,I3,A,I3,A,I3,A,I3,A,I3,A,I3,A,I3)', &
      '  D=',n_asp,' E=',n_glu,' H=',n_his,' C=',n_cys, &
      ' Y=',n_tyr,' K=',n_lys,' R=',n_arg

! Bisection to find pI (pH where net charge = 0)
    tol = 1.0d-4
    pI_low = 0.0d0
    pI_high = 14.0d0

do while (pI_high - pI_low > tol)
      pI_mid = (pI_low + pI_high) / 2.0d0
      charge_mid = net_charge(pI_mid, n_asp, n_glu, n_his, n_cys, &
                              n_tyr, n_lys, n_arg, &
                              pKa_nterm, pKa_cterm, pKa_asp, pKa_glu, &
                              pKa_his, pKa_cys, pKa_tyr, pKa_lys, pKa_arg)
      charge_low = net_charge(pI_low, n_asp, n_glu, n_his, n_cys, &
                              n_tyr, n_lys, n_arg, &
                              pKa_nterm, pKa_cterm, pKa_asp, pKa_glu, &
                              pKa_his, pKa_cys, pKa_tyr, pKa_lys, pKa_arg)
      if (charge_mid * charge_low > 0.0d0) then
        pI_low = pI_mid
      else
        pI_high = pI_mid
      end if
    end do

print '(A,F8.4)', 'Isoelectric point (pI) = ', pI_mid
    print '(A,F8.4)', 'Net charge at pH 7.4   = ', &
      net_charge(7.4d0, n_asp, n_glu, n_his, n_cys, n_tyr, n_lys, n_arg, &
                 pKa_nterm, pKa_cterm, pKa_asp, pKa_glu, &
                 pKa_his, pKa_cys, pKa_tyr, pKa_lys, pKa_arg)

! Print charge vs pH table
    print '(A)', ''
    print '(A)', '  pH    Net Charge'
    print '(A)', '  ----  ----------'
    do n_steps = 0, 14
      pH = dble(n_steps)
      charge = net_charge(pH, n_asp, n_glu, n_his, n_cys, n_tyr, n_lys, n_arg, &
                          pKa_nterm, pKa_cterm, pKa_asp, pKa_glu, &
                          pKa_his, pKa_cys, pKa_tyr, pKa_lys, pKa_arg)
      print '(F6.1,F10.3)', pH, charge
    end do
    print '(A)', ''
    print '(A)', '------------------------------------------------------------'
    print '(A)', ''
  end do

print '(A)', 'NOTES:'
  print '(A)', '  - Positive groups (protonated): N-term, His, Lys, Arg'
  print '(A)', '  - Negative groups (deprotonated): C-term, Asp, Glu, Cys, Tyr'
  print '(A)', '  - Henderson-Hasselbalch: pH = pKa + log([A-]/[HA])'
  print '(A)', '  - pI = pH where sum of all charges = 0'
  print '(A)', '  - Real pI can differ due to local environment effects'

contains

function net_charge(pH, nD, nE, nH, nC, nY, nK, nR, &
                      pKnt, pKct, pKD, pKE, pKH, pKC, pKY, pKK, pKR) result(q)
    real(8), intent(in) :: pH, pKnt, pKct, pKD, pKE, pKH, pKC, pKY, pKK, pKR
    integer, intent(in) :: nD, nE, nH, nC, nY, nK, nR
    real(8) :: q

! Positive charges (protonated form dominates below pKa)
    ! fraction protonated = 1 / (1 + 10^(pH - pKa))
    q = 0.0d0

! N-terminus (positive when protonated)
    q = q + 1.0d0 / (1.0d0 + 10.0d0**(pH - pKnt))

! His (positive when protonated)
    q = q + dble(nH) / (1.0d0 + 10.0d0**(pH - pKH))

! Lys (positive when protonated)
    q = q + dble(nK) / (1.0d0 + 10.0d0**(pH - pKK))

! Arg (positive when protonated)
    q = q + dble(nR) / (1.0d0 + 10.0d0**(pH - pKR))

! Negative charges (deprotonated form dominates above pKa)
    ! fraction deprotonated = 1 / (1 + 10^(pKa - pH))
    ! C-terminus
    q = q - 1.0d0 / (1.0d0 + 10.0d0**(pKct - pH))

! Asp
    q = q - dble(nD) / (1.0d0 + 10.0d0**(pKD - pH))

! Glu
    q = q - dble(nE) / (1.0d0 + 10.0d0**(pKE - pH))

! Cys
    q = q - dble(nC) / (1.0d0 + 10.0d0**(pKC - pH))

! Tyr
    q = q - dble(nY) / (1.0d0 + 10.0d0**(pKY - pH))

end function net_charge

end program protein_pi_calculator

Click Run to execute the Fortran code

Code will be compiled with gfortran and executed on the server

Chapter Summary

Structural Hierarchy

Primary: Amino acid sequence (N→C), peptide bonds with partial double-bond character
Secondary: α-helix (3.6 res/turn, i→i+4 H-bonds), β-sheet (parallel/antiparallel), turns, loops
Tertiary: Hydrophobic core, disulfides, salt bridges, H-bonds, van der Waals; domains and motifs
Quaternary: Multi-subunit assembly, symmetry, cooperativity

Key Equations

Henderson-Hasselbalch: pH = pK_a + log([A⁻]/[HA])
Hill equation: Y = [L]^n_H / (K_d + [L]^n_H)
Gibbs free energy: ΔG = ΔH − TΔS
Eyring equation: k = (k_BT/h) exp(−ΔG^‡/RT)

Folding

Anfinsen: sequence determines structure (thermodynamic control)
Levinthal: folding is a directed process, not random search
Funnel landscape: rugged funnel converging on native state
Chaperones: Hsp70 (prevent aggregation), GroEL/GroES (Anfinsen cage), Hsp90 (late-stage)
Misfolding: amyloid, prions (PrP^C→PrP^Sc), Alzheimer's Aβ, Parkinson's α-synuclein

Enzyme Catalysis

Transition state stabilization (Pauling's postulate)
Proximity/orientation, acid-base, covalent, metal ion catalysis
Serine protease triad: Ser-His-Asp (nucleophilic acylation/deacylation)
Substrate specificity via binding pocket geometry (chymotrypsin vs trypsin vs elastase)

← Translation Gene Regulation →

Share:X Reddit LinkedIn