Amino Acids & Protein Chemistry
From ionizable functional groups and titration behavior through peptide bond geometry to mass spectrometric determination of protein molecular weight.
Derivation 1: Henderson-Hasselbalch & Amino Acid Titration
Every amino acid possesses at least two ionizable groups: the $\alpha$-carboxyl group ($\text{pK}_{a1} \approx 2$) and the $\alpha$-amino group ($\text{pK}_{a2} \approx 9\text{--}10$). Some have ionizable side chains, adding a third (or even fourth) equilibrium. Understanding these equilibria is foundational for predicting charge state, solubility, electrophoretic mobility, and chromatographic behavior.
The Henderson-Hasselbalch Equation
Consider a weak acid $\text{HA}$ in equilibrium with water:
The acid dissociation constant is:
Taking the negative logarithm of both sides:
Since $\text{p}K_a = -\log K_a$ and $\text{pH} = -\log[\text{H}^+]$:
This is the Henderson-Hasselbalch equation. When $[\text{A}^-] = [\text{HA}]$, the log term vanishes and $\text{pH} = \text{p}K_a$. This is the midpoint of each buffering region in the titration curve, where buffering capacity is maximal.
Zwitterion Equilibria
At physiological pH, amino acids exist predominantly as zwitterions — the carboxyl group is deprotonated ($\text{COO}^-$) while the amino group is protonated ($\text{NH}_3^+$). A simple amino acid like glycine undergoes two ionization steps:
The fully protonated form (cation) dominates at low pH, the zwitterion dominates at intermediate pH, and the fully deprotonated form (anion) dominates at high pH.
Derivation of the Isoelectric Point (pI)
The isoelectric point is the pH at which the net charge on the amino acid is zero. For a simple amino acid with only $\text{pK}_{a1}$ and $\text{pK}_{a2}$:
At the pI, the concentration of the cationic form equals the concentration of the anionic form. Let$f_+ = \frac{[\text{H}^+]}{[\text{H}^+] + K_{a1}}$ be the fraction in the cationic state and$f_- = \frac{K_{a2}}{[\text{H}^+] + K_{a2}}$ the fraction in the anionic state relative to the zwitterion. Setting $f_+ = f_-$:
Taking $-\log$ of both sides:
For amino acids with ionizable side chains, the pI is the average of the two pK values that flank the zwitterionic species. For aspartate (acidic side chain): $\text{pI} = (\text{pK}_{a1} + \text{pK}_{a,\text{R}})/2$. For lysine (basic side chain): $\text{pI} = (\text{pK}_{a2} + \text{pK}_{a,\text{R}})/2$.
Buffering Capacity
The buffering capacity $\beta$ is defined as the amount of strong acid or base needed to change the pH by one unit. From the Henderson-Hasselbalch equation:
Maximum buffering occurs when $\text{pH} = \text{pK}_a$ (i.e., $[\text{H}^+] = K_a$), giving$\beta_{\max} = 0.576 \cdot C_{\text{total}}$. The effective buffering range is$\text{pK}_a \pm 1$.
Derivation 2: Peptide Bond Geometry & Ramachandran Analysis
Resonance Stabilization & Planarity
The peptide bond ($\text{C-N}$) connecting adjacent amino acid residues has significant partial double-bond character due to resonance between two contributing structures:
The resonance energy of the peptide bond is approximately $\Delta E_{\text{res}} \approx 80\;\text{kJ/mol}$. This partial double-bond character has profound consequences:
- The six atoms of the peptide unit ($\text{C}_\alpha$, C, O, N, H, $\text{C}_\alpha$) are coplanar
- The C-N bond length is 1.33 Å (between a single bond at 1.49 Å and a double bond at 1.27 Å)
- Rotation about the C-N bond (angle $\omega$) is restricted; $\omega = 180°$ (trans) is strongly favored over $\omega = 0°$ (cis)
- The barrier to rotation about $\omega$ is approximately $80\;\text{kJ/mol}$
The trans configuration is preferred by a factor of approximately 1000:1 over cis for most residues, because in the cis form the successive $\text{C}_\alpha$ atoms and their substituents are sterically crowded. The exception is proline, where the cyclic side chain reduces the energy difference, making cis proline peptide bonds occur in about 6% of cases.
Ramachandran Angles: $\phi$ and $\psi$
Since $\omega$ is effectively fixed at 180°, backbone conformational freedom resides in the two dihedral angles at each $\text{C}_\alpha$:
$\phi$ (phi): Rotation about the $\text{N-C}_\alpha$ bond. Defined by atoms $\text{C}_{i-1}\text{-N}_i\text{-C}_{\alpha,i}\text{-C}_i$
$\psi$ (psi): Rotation about the $\text{C}_\alpha\text{-C}$ bond. Defined by atoms $\text{N}_i\text{-C}_{\alpha,i}\text{-C}_i\text{-N}_{i+1}$
Deriving Allowed Regions from Steric Constraints
The Ramachandran plot maps all possible $(\phi, \psi)$ combinations. Not all are sterically permitted. The allowed regions are determined by the hard-sphere approximation: no two non-bonded atoms may approach closer than the sum of their van der Waals radii.
The key interatomic distances that determine steric clashes are:
For each $(\phi, \psi)$ pair, one computes the coordinates of all backbone and $\text{C}_\beta$atoms. If any interatomic distance falls below the sum of van der Waals radii, that$(\phi, \psi)$ combination is disallowed. The fully allowed regions (using strict radii) cover only about 7.5% of the plot; partially allowed regions (using slightly reduced radii) cover about 22%.
For glycine (no $\text{C}_\beta$), the allowed region is much larger because there is no side-chain steric clash. For proline, the cyclic side chain constrains $\phi \approx -63°$, drastically limiting its Ramachandran space.
Canonical Secondary Structure Angles
- $\alpha$-helix: $\phi = -57°, \; \psi = -47°$
- $3_{10}$-helix: $\phi = -49°, \; \psi = -26°$
- $\pi$-helix: $\phi = -57°, \; \psi = -70°$
- Parallel $\beta$-sheet: $\phi = -119°, \; \psi = +113°$
- Antiparallel $\beta$-sheet: $\phi = -139°, \; \psi = +135°$
- Polyproline II: $\phi = -75°, \; \psi = +145°$
Derivation 3: Protein Molecular Weight from Mass Spectrometry
Electrospray ionization mass spectrometry (ESI-MS) is the workhorse for intact protein mass determination. In ESI, proteins acquire multiple proton charges in the gas phase, producing a characteristic charge-state envelope.
The ESI Charge State Equation
For a protein of molecular mass $M$ carrying $z$ protons (each of mass$H = 1.00794\;\text{Da}$), the measured mass-to-charge ratio is:
Each charge state $z$ produces a distinct peak. A protein of mass 14 kDa might show peaks from$z = 8$ to $z = 15$, appearing between m/z 950 and 1800.
Deconvolution: Solving for M and z
Given two adjacent peaks with charge states $z$ and $z+1$:
From the first equation: $M = z \cdot (m/z)_1 - zH$. From the second:$M = (z+1)(m/z)_2 - (z+1)H$. Setting equal:
Expanding and solving for $z$:
Once $z$ is determined (rounded to the nearest integer), $M$ is calculated directly:
In practice, multiple charge-state pairs are used to obtain an average $M$ with improved precision. Modern software uses maximum-entropy or Bayesian deconvolution algorithms.
MALDI-TOF as an Alternative
In matrix-assisted laser desorption/ionization (MALDI), proteins typically acquire only one or two charges ($z = 1$ or $z = 2$), so the $m/z$ value directly approximates$M + H$. MALDI-TOF is simpler to interpret but less precise than ESI for intact proteins, with mass accuracy typically $\pm 0.01\%$ to $\pm 0.1\%$.
Applications: Gel Electrophoresis & Protein Separation
SDS-PAGE: Separation by Molecular Weight
Sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) is the most widely used method for separating proteins by size. SDS denatures proteins and coats them with a uniform negative charge proportional to mass (approximately 1.4 g SDS per g protein), so migration depends solely on molecular weight.
The electrophoretic mobility of a charged particle in a gel is given by:
where $q$ is the net charge, $\eta$ is the viscosity of the medium, and$r$ is the effective hydrodynamic radius (Stokes radius). For SDS-protein complexes, since$q \propto M$ (from uniform SDS binding), the charge-to-friction ratio determines mobility.
In a sieving gel, larger proteins encounter more frictional resistance. The empirical relationship is:
where $R_f$ is the relative mobility (distance migrated / distance of front) and $a, b$ are constants for a given gel concentration. This gives a linear relationship between$\log M$ and $R_f$ over a range determined by the acrylamide concentration.
Isoelectric Focusing (IEF)
IEF separates proteins by their isoelectric point. A pH gradient is established in the gel (using carrier ampholytes or immobilized pH gradients). Each protein migrates until it reaches the position where$\text{pH} = \text{pI}$, at which point its net charge is zero and migration stops.
The resolving power of IEF is remarkable. Two proteins differing in pI by as little as$\Delta\text{pI} = 0.01$ can be separated. The resolution depends on the steepness of the pH gradient and the diffusion coefficient of the protein:
where $D$ is the diffusion coefficient, $E$ is the electric field strength, and$-d\mu/d\text{pH}$ is the rate of change of mobility with pH near the pI.
Two-Dimensional Gel Electrophoresis
2D-PAGE combines IEF (first dimension, separation by pI) with SDS-PAGE (second dimension, separation by molecular weight) to achieve extraordinary resolution. A complex protein mixture can be resolved into thousands of individual spots. Each spot corresponds to a unique protein species characterized by its$(\text{pI}, M_w)$ coordinates.
Practical Considerations
- Dynamic range: 2D gels can resolve 2,000–10,000 protein spots per gel
- Detection: Coomassie blue (~50 ng sensitivity), silver stain (~1 ng), fluorescent dyes (SYPRO Ruby, ~1 ng)
- Limitations: Membrane proteins, very basic proteins ($\text{pI} > 10$), and low-abundance proteins are often underrepresented
- Quantification: DIGE (difference gel electrophoresis) labels samples with Cy3/Cy5 dyes for ratiometric comparison on a single gel
Python Simulation: Amino Acid Titration Curves
This simulation uses the Henderson-Hasselbalch equation to plot titration curves for glycine (no ionizable side chain), aspartate (acidic side chain), and lysine (basic side chain). The pI is computed and annotated for each.
Titration Curves for Gly, Asp, and Lys with pKa/pI Annotations
PythonClick Run to execute the Python code
Code will be executed with Python 3 on the server
Python Simulation: Ramachandran Plot
This simulation generates a Ramachandran plot showing the allowed backbone conformational space based on approximate steric energy calculations. The canonical positions of all major secondary structures are annotated.
Ramachandran Plot: Allowed Backbone Conformations
PythonClick Run to execute the Python code
Code will be executed with Python 3 on the server
Python Simulation: ESI-MS Charge State Deconvolution
This simulation demonstrates charge-state deconvolution for lysozyme (M = 14,305 Da). It simulates the ESI charge envelope, determines charge states from adjacent m/z peaks, and reconstructs the molecular weight.
ESI-MS Deconvolution: From m/z Peaks to Molecular Weight
PythonClick Run to execute the Python code
Code will be executed with Python 3 on the server
Amino Acid Classification & Properties
The 20 standard amino acids are classified by the chemical nature of their side chains (R groups). This classification is fundamental because it determines protein folding, enzyme catalysis, and molecular recognition.
Nonpolar (Hydrophobic) Amino Acids
- Glycine (Gly, G): $\text{R} = \text{H}$. The simplest amino acid. Unique conformational flexibility (no $\text{C}_\beta$), often found in tight turns and at active sites.
- Alanine (Ala, A): $\text{R} = \text{CH}_3$. The reference amino acid for helix propensity measurements. Strong helix former.
- Valine (Val, V), Leucine (Leu, L), Isoleucine (Ile, I): Branched aliphatic chains. Found in protein interiors. $\beta$-branching (Val, Ile) disfavors $\alpha$-helix formation.
- Proline (Pro, P): Cyclic imino acid. The pyrrolidine ring constrains $\phi \approx -63°$ and introduces a kink. Often found at helix caps and in collagen ($\text{Gly-X-Pro}$ repeats).
- Phenylalanine (Phe, F), Tryptophan (Trp, W): Aromatic side chains. Trp has the largest side chain and the highest UV absorption ($\varepsilon_{280} = 5,690\;\text{M}^{-1}\text{cm}^{-1}$).
- Methionine (Met, M): Contains a thioether linkage. The initiator amino acid in translation (AUG codon). Susceptible to oxidation to methionine sulfoxide.
Polar Uncharged Amino Acids
- Serine (Ser, S), Threonine (Thr, T): Hydroxyl groups capable of H-bonding and phosphorylation. Thr is $\beta$-branched.
- Asparagine (Asn, N), Glutamine (Gln, Q): Amide side chains. Common sites of N-linked glycosylation (Asn-X-Ser/Thr sequon).
- Tyrosine (Tyr, Y): Phenolic hydroxyl with $\text{pK}_a \approx 10.5$. Can be phosphorylated. Contributes to UV absorption at 280 nm ($\varepsilon_{280} = 1,280\;\text{M}^{-1}\text{cm}^{-1}$).
- Cysteine (Cys, C): Thiol group ($\text{pK}_a \approx 8.3$). Forms disulfide bonds ($\text{Cys-S-S-Cys}$). Critical in redox chemistry and metal coordination (zinc fingers).
Charged Amino Acids
- Aspartate (Asp, D): $\text{pK}_a \approx 3.9$. Negative charge at physiological pH. Common in Ca$^{2+}$-binding sites.
- Glutamate (Glu, E): $\text{pK}_a \approx 4.1$. Negative charge at physiological pH. Frequent in enzyme active sites as general acid/base catalyst.
- Lysine (Lys, K): $\text{pK}_a \approx 10.5$. Positive charge at physiological pH. Subject to acetylation, methylation, ubiquitination (post-translational modifications important in epigenetics).
- Arginine (Arg, R): Guanidinium group, $\text{pK}_a \approx 12.5$. Always protonated at physiological pH. Forms multiple H-bonds; important for substrate binding (e.g., phosphate recognition).
- Histidine (His, H): Imidazole ring, $\text{pK}_a \approx 6.0$. The only amino acid that titrates near physiological pH, making it ideal for acid-base catalysis (e.g., in the catalytic triad of serine proteases).
UV Absorption of Proteins
Protein concentration can be estimated by UV absorbance at 280 nm. The molar extinction coefficient is predicted from the amino acid composition using the Beer-Lambert law:
This is the Pace method for estimating extinction coefficients from sequence alone, widely used to determine protein concentration without a standard curve.
Protein Purification Strategies
Protein purification exploits differences in physical and chemical properties between the target protein and contaminants. A typical purification scheme uses 3–5 chromatographic steps.
Ion Exchange Chromatography
Separates proteins by net charge. The protein binds to the column at low ionic strength and is eluted by a salt gradient. The charge on a protein is pH-dependent:
where the first sum is over cationic groups and the second over anionic groups. At pH above the pI, the protein is negatively charged and binds an anion exchanger (e.g., DEAE, Q). Below the pI, it is positively charged and binds a cation exchanger (e.g., CM, S).
Size Exclusion Chromatography (SEC)
Also called gel filtration. Separates by hydrodynamic radius (Stokes radius $R_s$). Larger proteins are excluded from pore space and elute first. The partition coefficient is:
where $V_e$ is elution volume, $V_0$ is void volume, and $V_t$ is total column volume. A plot of $\log M_w$ vs $K_{av}$ is linear over the fractionation range.
Affinity Chromatography
The most selective purification method. A ligand specific for the target protein is immobilized on a solid support. The target binds while contaminants wash through. Common systems include:
- Ni-NTA: Binds His-tagged recombinant proteins ($K_d \sim \mu\text{M}$). Eluted with imidazole.
- Glutathione-Sepharose: Binds GST-tagged proteins. Eluted with reduced glutathione.
- Protein A/G: Binds the Fc region of IgG antibodies. Used for antibody purification.
- Substrate analogs: Immobilized inhibitors or cofactors for enzyme purification.
Purification Table
Progress is tracked by a purification table with key metrics:
Key Equations Summary
Henderson-Hasselbalch Equation
Isoelectric Point
Electrophoretic Mobility
ESI-MS Charge State
Beer-Lambert Law
Buffering Capacity
SDS-PAGE Mobility
Size Exclusion Partition Coefficient
Net Charge as a Function of pH
Purification Fold
Stereochemistry of Amino Acids
All standard amino acids except glycine have a chiral $\text{C}_\alpha$ (four different substituents). Biological proteins exclusively use the L-configuration (S-configuration by Cahn-Ingold-Prelog rules, except for cysteine which is R due to the sulfur priority).
Threonine and isoleucine have two chiral centers ($\text{C}_\alpha$ and$\text{C}_\beta$), giving rise to potential diastereomers. Only the L-threonine (2S,3R) and L-isoleucine (2S,3S) forms are found in proteins.
Optical rotation is measured with a polarimeter. The specific rotation is:
where $\alpha_{\text{obs}}$ is the observed rotation, $l$ is the path length in dm, and$c$ is the concentration in g/mL. L-amino acids can be either dextrorotatory (+) or levorotatory (−); the D/L designation refers to configuration, not the sign of rotation.
D-amino acids do occur in biology: in bacterial cell walls (D-Ala, D-Glu in peptidoglycan), certain antibiotics (gramicidin), and some neuropeptides. Their presence in peptidoglycan confers resistance to most proteases, which are specific for L-amino acids.