← Back to Part I: Amino Acids & Proteins

Amino Acids & Protein Chemistry

From ionizable functional groups and titration behavior through peptide bond geometry to mass spectrometric determination of protein molecular weight.

Derivation 1: Henderson-Hasselbalch & Amino Acid Titration

Every amino acid possesses at least two ionizable groups: the $\alpha$-carboxyl group ($\text{pK}_{a1} \approx 2$) and the $\alpha$-amino group ($\text{pK}_{a2} \approx 9\text{--}10$). Some have ionizable side chains, adding a third (or even fourth) equilibrium. Understanding these equilibria is foundational for predicting charge state, solubility, electrophoretic mobility, and chromatographic behavior.

The Henderson-Hasselbalch Equation

Consider a weak acid $\text{HA}$ in equilibrium with water:

$$\text{HA} \;\rightleftharpoons\; \text{H}^+ + \text{A}^-$$

The acid dissociation constant is:

$$K_a = \frac{[\text{H}^+][\text{A}^-]}{[\text{HA}]}$$

Taking the negative logarithm of both sides:

$$-\log K_a = -\log[\text{H}^+] - \log\frac{[\text{A}^-]}{[\text{HA}]}$$

Since $\text{p}K_a = -\log K_a$ and $\text{pH} = -\log[\text{H}^+]$:

$$\boxed{\text{pH} = \text{p}K_a + \log\frac{[\text{A}^-]}{[\text{HA}]}}$$

This is the Henderson-Hasselbalch equation. When $[\text{A}^-] = [\text{HA}]$, the log term vanishes and $\text{pH} = \text{p}K_a$. This is the midpoint of each buffering region in the titration curve, where buffering capacity is maximal.

Zwitterion Equilibria

At physiological pH, amino acids exist predominantly as zwitterions — the carboxyl group is deprotonated ($\text{COO}^-$) while the amino group is protonated ($\text{NH}_3^+$). A simple amino acid like glycine undergoes two ionization steps:

$$\text{H}_3\text{N}^+\text{-CH}_2\text{-COOH} \;\xrightarrow{\text{pK}_{a1}}\; \text{H}_3\text{N}^+\text{-CH}_2\text{-COO}^- \;\xrightarrow{\text{pK}_{a2}}\; \text{H}_2\text{N-CH}_2\text{-COO}^-$$

The fully protonated form (cation) dominates at low pH, the zwitterion dominates at intermediate pH, and the fully deprotonated form (anion) dominates at high pH.

Derivation of the Isoelectric Point (pI)

The isoelectric point is the pH at which the net charge on the amino acid is zero. For a simple amino acid with only $\text{pK}_{a1}$ and $\text{pK}_{a2}$:

At the pI, the concentration of the cationic form equals the concentration of the anionic form. Let$f_+ = \frac{[\text{H}^+]}{[\text{H}^+] + K_{a1}}$ be the fraction in the cationic state and$f_- = \frac{K_{a2}}{[\text{H}^+] + K_{a2}}$ the fraction in the anionic state relative to the zwitterion. Setting $f_+ = f_-$:

$$\frac{[\text{H}^+]}{K_{a1}} = \frac{K_{a2}}{[\text{H}^+]}$$

$$[\text{H}^+]^2 = K_{a1} \cdot K_{a2}$$

Taking $-\log$ of both sides:

$$\boxed{\text{pI} = \frac{\text{pK}_{a1} + \text{pK}_{a2}}{2}}$$

For amino acids with ionizable side chains, the pI is the average of the two pK values that flank the zwitterionic species. For aspartate (acidic side chain): $\text{pI} = (\text{pK}_{a1} + \text{pK}_{a,\text{R}})/2$. For lysine (basic side chain): $\text{pI} = (\text{pK}_{a2} + \text{pK}_{a,\text{R}})/2$.

Buffering Capacity

The buffering capacity $\beta$ is defined as the amount of strong acid or base needed to change the pH by one unit. From the Henderson-Hasselbalch equation:

$$\beta = 2.303 \cdot C_{\text{total}} \cdot \frac{K_a [\text{H}^+]}{(K_a + [\text{H}^+])^2}$$

Maximum buffering occurs when $\text{pH} = \text{pK}_a$ (i.e., $[\text{H}^+] = K_a$), giving$\beta_{\max} = 0.576 \cdot C_{\text{total}}$. The effective buffering range is$\text{pK}_a \pm 1$.

Derivation 2: Peptide Bond Geometry & Ramachandran Analysis

Resonance Stabilization & Planarity

The peptide bond ($\text{C-N}$) connecting adjacent amino acid residues has significant partial double-bond character due to resonance between two contributing structures:

$$\text{C}=\text{O} \cdots \text{N-H} \quad \longleftrightarrow \quad \text{C-O}^- \cdots \text{N}^+=\text{H}$$

The resonance energy of the peptide bond is approximately $\Delta E_{\text{res}} \approx 80\;\text{kJ/mol}$. This partial double-bond character has profound consequences:

The six atoms of the peptide unit ($\text{C}_\alpha$, C, O, N, H, $\text{C}_\alpha$) are coplanar
The C-N bond length is 1.33 Å (between a single bond at 1.49 Å and a double bond at 1.27 Å)
Rotation about the C-N bond (angle $\omega$) is restricted; $\omega = 180°$ (trans) is strongly favored over $\omega = 0°$ (cis)
The barrier to rotation about $\omega$ is approximately $80\;\text{kJ/mol}$

The trans configuration is preferred by a factor of approximately 1000:1 over cis for most residues, because in the cis form the successive $\text{C}_\alpha$ atoms and their substituents are sterically crowded. The exception is proline, where the cyclic side chain reduces the energy difference, making cis proline peptide bonds occur in about 6% of cases.

Ramachandran Angles: $\phi$ and $\psi$

Since $\omega$ is effectively fixed at 180°, backbone conformational freedom resides in the two dihedral angles at each $\text{C}_\alpha$:

$\phi$ (phi): Rotation about the $\text{N-C}_\alpha$ bond. Defined by atoms $\text{C}_{i-1}\text{-N}_i\text{-C}_{\alpha,i}\text{-C}_i$

$\psi$ (psi): Rotation about the $\text{C}_\alpha\text{-C}$ bond. Defined by atoms $\text{N}_i\text{-C}_{\alpha,i}\text{-C}_i\text{-N}_{i+1}$

Deriving Allowed Regions from Steric Constraints

The Ramachandran plot maps all possible $(\phi, \psi)$ combinations. Not all are sterically permitted. The allowed regions are determined by the hard-sphere approximation: no two non-bonded atoms may approach closer than the sum of their van der Waals radii.

The key interatomic distances that determine steric clashes are:

$$d_{\min}(\text{H}\cdots\text{H}) = 2.0\;\text{\AA}, \quad d_{\min}(\text{H}\cdots\text{O}) = 2.4\;\text{\AA}, \quad d_{\min}(\text{H}\cdots\text{N}) = 2.4\;\text{\AA}$$

$$d_{\min}(\text{O}\cdots\text{O}) = 2.8\;\text{\AA}, \quad d_{\min}(\text{C}\cdots\text{C}) = 3.0\;\text{\AA}, \quad d_{\min}(\text{N}\cdots\text{O}) = 2.7\;\text{\AA}$$

For each $(\phi, \psi)$ pair, one computes the coordinates of all backbone and $\text{C}_\beta$atoms. If any interatomic distance falls below the sum of van der Waals radii, that$(\phi, \psi)$ combination is disallowed. The fully allowed regions (using strict radii) cover only about 7.5% of the plot; partially allowed regions (using slightly reduced radii) cover about 22%.

For glycine (no $\text{C}_\beta$), the allowed region is much larger because there is no side-chain steric clash. For proline, the cyclic side chain constrains $\phi \approx -63°$, drastically limiting its Ramachandran space.

Canonical Secondary Structure Angles

$\alpha$-helix: $\phi = -57°, \; \psi = -47°$
$3_{10}$-helix: $\phi = -49°, \; \psi = -26°$
$\pi$-helix: $\phi = -57°, \; \psi = -70°$
Parallel $\beta$-sheet: $\phi = -119°, \; \psi = +113°$
Antiparallel $\beta$-sheet: $\phi = -139°, \; \psi = +135°$
Polyproline II: $\phi = -75°, \; \psi = +145°$

Derivation 3: Protein Molecular Weight from Mass Spectrometry

Electrospray ionization mass spectrometry (ESI-MS) is the workhorse for intact protein mass determination. In ESI, proteins acquire multiple proton charges in the gas phase, producing a characteristic charge-state envelope.

The ESI Charge State Equation

For a protein of molecular mass $M$ carrying $z$ protons (each of mass$H = 1.00794\;\text{Da}$), the measured mass-to-charge ratio is:

$$\boxed{\frac{m}{z} = \frac{M + zH}{z} = \frac{M}{z} + H}$$

Each charge state $z$ produces a distinct peak. A protein of mass 14 kDa might show peaks from$z = 8$ to $z = 15$, appearing between m/z 950 and 1800.

Deconvolution: Solving for M and z

Given two adjacent peaks with charge states $z$ and $z+1$:

$$(m/z)_1 = \frac{M + zH}{z}, \qquad (m/z)_2 = \frac{M + (z+1)H}{z+1}$$

From the first equation: $M = z \cdot (m/z)_1 - zH$. From the second:$M = (z+1)(m/z)_2 - (z+1)H$. Setting equal:

$$z \cdot (m/z)_1 - zH = (z+1)(m/z)_2 - (z+1)H$$

Expanding and solving for $z$:

$$z\bigl[(m/z)_1 - (m/z)_2\bigr] = (m/z)_2 - H$$

$$\boxed{z = \frac{(m/z)_2 - H}{(m/z)_1 - (m/z)_2}}$$

Once $z$ is determined (rounded to the nearest integer), $M$ is calculated directly:

$$\boxed{M = z\bigl[(m/z)_1 - H\bigr]}$$

In practice, multiple charge-state pairs are used to obtain an average $M$ with improved precision. Modern software uses maximum-entropy or Bayesian deconvolution algorithms.

MALDI-TOF as an Alternative

In matrix-assisted laser desorption/ionization (MALDI), proteins typically acquire only one or two charges ($z = 1$ or $z = 2$), so the $m/z$ value directly approximates$M + H$. MALDI-TOF is simpler to interpret but less precise than ESI for intact proteins, with mass accuracy typically $\pm 0.01\%$ to $\pm 0.1\%$.

Applications: Gel Electrophoresis & Protein Separation

SDS-PAGE: Separation by Molecular Weight

Sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) is the most widely used method for separating proteins by size. SDS denatures proteins and coats them with a uniform negative charge proportional to mass (approximately 1.4 g SDS per g protein), so migration depends solely on molecular weight.

The electrophoretic mobility of a charged particle in a gel is given by:

$$\boxed{\mu = \frac{q}{6\pi\eta r}}$$

where $q$ is the net charge, $\eta$ is the viscosity of the medium, and$r$ is the effective hydrodynamic radius (Stokes radius). For SDS-protein complexes, since$q \propto M$ (from uniform SDS binding), the charge-to-friction ratio determines mobility.

In a sieving gel, larger proteins encounter more frictional resistance. The empirical relationship is:

$$\log M = a - b \cdot R_f$$

where $R_f$ is the relative mobility (distance migrated / distance of front) and $a, b$ are constants for a given gel concentration. This gives a linear relationship between$\log M$ and $R_f$ over a range determined by the acrylamide concentration.

Isoelectric Focusing (IEF)

IEF separates proteins by their isoelectric point. A pH gradient is established in the gel (using carrier ampholytes or immobilized pH gradients). Each protein migrates until it reaches the position where$\text{pH} = \text{pI}$, at which point its net charge is zero and migration stops.

The resolving power of IEF is remarkable. Two proteins differing in pI by as little as$\Delta\text{pI} = 0.01$ can be separated. The resolution depends on the steepness of the pH gradient and the diffusion coefficient of the protein:

$$\Delta(\text{pI})_{\min} = 3\sqrt{\frac{D \cdot (d\text{pH}/dx)}{E \cdot (-d\mu/d\text{pH})}}$$

where $D$ is the diffusion coefficient, $E$ is the electric field strength, and$-d\mu/d\text{pH}$ is the rate of change of mobility with pH near the pI.

Two-Dimensional Gel Electrophoresis

2D-PAGE combines IEF (first dimension, separation by pI) with SDS-PAGE (second dimension, separation by molecular weight) to achieve extraordinary resolution. A complex protein mixture can be resolved into thousands of individual spots. Each spot corresponds to a unique protein species characterized by its$(\text{pI}, M_w)$ coordinates.

Practical Considerations

Dynamic range: 2D gels can resolve 2,000–10,000 protein spots per gel
Detection: Coomassie blue (~50 ng sensitivity), silver stain (~1 ng), fluorescent dyes (SYPRO Ruby, ~1 ng)
Limitations: Membrane proteins, very basic proteins ($\text{pI} > 10$), and low-abundance proteins are often underrepresented
Quantification: DIGE (difference gel electrophoresis) labels samples with Cy3/Cy5 dyes for ratiometric comparison on a single gel

Python Simulation: Amino Acid Titration Curves

This simulation uses the Henderson-Hasselbalch equation to plot titration curves for glycine (no ionizable side chain), aspartate (acidic side chain), and lysine (basic side chain). The pI is computed and annotated for each.

Titration Curves for Gly, Asp, and Lys with pKa/pI Annotations

Python

amino_acid_titration.py97 lines

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

# Henderson-Hasselbalch titration curve simulator
# For each amino acid, compute fraction of base added vs pH

def titration_curve(pKa_values, pH_range):
    """
    Compute equivalents of OH- added as a function of pH
    for an amino acid with given pKa values.
    Each pKa contributes a sigmoidal buffering step.
    """
    equiv = np.zeros_like(pH_range)
    for i, pKa in enumerate(pKa_values):
        # Henderson-Hasselbalch: pH = pKa + log([A-]/[HA])
        # fraction deprotonated = 1 / (1 + 10^(pKa - pH))
        fraction = 1.0 / (1.0 + 10.0**(pKa - pH_range))
        equiv += fraction
    return equiv

pH = np.linspace(0, 14, 1000)

# Glycine: pKa1 = 2.34 (alpha-COOH), pKa2 = 9.60 (alpha-NH3+)
gly_pKa = [2.34, 9.60]
gly_equiv = titration_curve(gly_pKa, pH)
gly_pI = (2.34 + 9.60) / 2  # = 5.97

# Aspartate: pKa1 = 2.09, pKa2 = 3.86 (side chain COOH), pKa3 = 9.82
asp_pKa = [2.09, 3.86, 9.82]
asp_equiv = titration_curve(asp_pKa, pH)
asp_pI = (2.09 + 3.86) / 2  # = 2.98

# Lysine: pKa1 = 2.18, pKa2 = 8.95, pKa3 = 10.53 (side chain NH3+)
lys_pKa = [2.18, 8.95, 10.53]
lys_equiv = titration_curve(lys_pKa, pH)
lys_pI = (8.95 + 10.53) / 2  # = 9.74

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# --- Glycine ---
ax = axes[0]
ax.plot(gly_equiv, pH, color='#10b981', linewidth=2.5)
ax.axhline(y=gly_pKa[0], color='#f59e0b', linestyle='--', alpha=0.7, label=f'pKa1 = {gly_pKa[0]}')
ax.axhline(y=gly_pKa[1], color='#ef4444', linestyle='--', alpha=0.7, label=f'pKa2 = {gly_pKa[1]}')
ax.axhline(y=gly_pI, color='#8b5cf6', linestyle=':', alpha=0.7, label=f'pI = {gly_pI:.2f}')
for pKa in gly_pKa:
    ax.annotate(f'pKa = {pKa}', xy=(0.5, pKa), fontsize=9, color='white',
                bbox=dict(boxstyle='round,pad=0.3', facecolor='#1e293b', edgecolor='#10b981'))
ax.set_xlabel('Equivalents of OH⁻', fontsize=11, color='white')
ax.set_ylabel('pH', fontsize=11, color='white')
ax.set_title('Glycine (no ionizable side chain)', fontsize=12, color='#10b981', fontweight='bold')
ax.legend(fontsize=8, facecolor='#1e293b', edgecolor='#10b981', labelcolor='white')
ax.set_facecolor('#0f172a')
ax.tick_params(colors='white')
ax.set_ylim(0, 14)
ax.grid(True, alpha=0.2, color='#10b981')

# --- Aspartate ---
ax = axes[1]
ax.plot(asp_equiv, pH, color='#06b6d4', linewidth=2.5)
ax.axhline(y=asp_pKa[0], color='#f59e0b', linestyle='--', alpha=0.7, label=f'pKa1 = {asp_pKa[0]}')
ax.axhline(y=asp_pKa[1], color='#ef4444', linestyle='--', alpha=0.7, label=f'pKa2 = {asp_pKa[1]}')
ax.axhline(y=asp_pKa[2], color='#22c55e', linestyle='--', alpha=0.7, label=f'pKa3 = {asp_pKa[2]}')
ax.axhline(y=asp_pI, color='#8b5cf6', linestyle=':', alpha=0.7, label=f'pI = {asp_pI:.2f}')
ax.set_xlabel('Equivalents of OH⁻', fontsize=11, color='white')
ax.set_ylabel('pH', fontsize=11, color='white')
ax.set_title('Aspartate (acidic side chain)', fontsize=12, color='#06b6d4', fontweight='bold')
ax.legend(fontsize=8, facecolor='#1e293b', edgecolor='#06b6d4', labelcolor='white')
ax.set_facecolor('#0f172a')
ax.tick_params(colors='white')
ax.set_ylim(0, 14)
ax.grid(True, alpha=0.2, color='#06b6d4')

# --- Lysine ---
ax = axes[2]
ax.plot(lys_equiv, pH, color='#f472b6', linewidth=2.5)
ax.axhline(y=lys_pKa[0], color='#f59e0b', linestyle='--', alpha=0.7, label=f'pKa1 = {lys_pKa[0]}')
ax.axhline(y=lys_pKa[1], color='#ef4444', linestyle='--', alpha=0.7, label=f'pKa2 = {lys_pKa[1]}')
ax.axhline(y=lys_pKa[2], color='#22c55e', linestyle='--', alpha=0.7, label=f'pKa3 = {lys_pKa[2]}')
ax.axhline(y=lys_pI, color='#8b5cf6', linestyle=':', alpha=0.7, label=f'pI = {lys_pI:.2f}')
ax.set_xlabel('Equivalents of OH⁻', fontsize=11, color='white')
ax.set_ylabel('pH', fontsize=11, color='white')
ax.set_title('Lysine (basic side chain)', fontsize=12, color='#f472b6', fontweight='bold')
ax.legend(fontsize=8, facecolor='#1e293b', edgecolor='#f472b6', labelcolor='white')
ax.set_facecolor('#0f172a')
ax.tick_params(colors='white')
ax.set_ylim(0, 14)
ax.grid(True, alpha=0.2, color='#f472b6')

fig.patch.set_facecolor('#0f172a')
plt.tight_layout()
plt.savefig('output.png', dpi=150, bbox_inches='tight', facecolor='#0f172a')
plt.close()
print("Titration curves plotted for Gly, Asp, Lys with pKa and pI annotations.")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Python Simulation: Ramachandran Plot

This simulation generates a Ramachandran plot showing the allowed backbone conformational space based on approximate steric energy calculations. The canonical positions of all major secondary structures are annotated.

Ramachandran Plot: Allowed Backbone Conformations

Python

ramachandran_plot.py94 lines

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

# Ramachandran plot: allowed phi/psi regions based on steric constraints
# We approximate the energy landscape using hard-sphere van der Waals radii

phi = np.linspace(-180, 180, 360)
psi = np.linspace(-180, 180, 360)
PHI, PSI = np.meshgrid(phi, psi)

# Simplified steric energy function
# Real Ramachandran plots use atomic coordinates; we approximate
# with empirical Gaussian-based allowed regions

def allowed_energy(phi_grid, psi_grid):
    """
    Approximate Ramachandran energy landscape.
    Lower energy = more allowed. We model known allowed regions
    as Gaussian wells centered on canonical secondary structures.
    """
    energy = np.ones_like(phi_grid) * 10.0  # high energy baseline (disallowed)

# Alpha-helix region: phi ~ -57, psi ~ -47
    energy -= 9.0 * np.exp(-(((phi_grid + 57)**2) / (2 * 25**2) + ((psi_grid + 47)**2) / (2 * 25**2)))

# Beta-sheet region: phi ~ -120, psi ~ +130
    energy -= 9.0 * np.exp(-(((phi_grid + 120)**2) / (2 * 30**2) + ((psi_grid - 130)**2) / (2 * 30**2)))

# Left-handed alpha-helix: phi ~ +60, psi ~ +60 (partially allowed)
    energy -= 5.0 * np.exp(-(((phi_grid - 60)**2) / (2 * 20**2) + ((psi_grid - 60)**2) / (2 * 20**2)))

# Polyproline II helix: phi ~ -75, psi ~ +145
    energy -= 7.0 * np.exp(-(((phi_grid + 75)**2) / (2 * 20**2) + ((psi_grid - 145)**2) / (2 * 20**2)))

# 3_10 helix: phi ~ -49, psi ~ -26
    energy -= 6.0 * np.exp(-(((phi_grid + 49)**2) / (2 * 15**2) + ((psi_grid + 26)**2) / (2 * 15**2)))

# pi-helix: phi ~ -57, psi ~ -70
    energy -= 4.0 * np.exp(-(((phi_grid + 57)**2) / (2 * 15**2) + ((psi_grid + 70)**2) / (2 * 15**2)))

return energy

E = allowed_energy(PHI, PSI)

fig, ax = plt.subplots(figsize=(9, 8))

# Filled contour plot
levels = np.linspace(-2, 10, 20)
contour = ax.contourf(PHI, PSI, E, levels=levels, cmap='YlGn_r')

# Contour lines for allowed/partially allowed boundaries
ax.contour(PHI, PSI, E, levels=[1.5, 3.0], colors=['#10b981', '#f59e0b'], linewidths=[2, 1.5])

# Mark canonical secondary structure positions
structures = {
    'alpha-helix': (-57, -47, '#10b981'),
    'beta-sheet': (-120, 130, '#06b6d4'),
    'L-alpha': (60, 60, '#f472b6'),
    'PPII': (-75, 145, '#f59e0b'),
    '3_10 helix': (-49, -26, '#a78bfa'),
    'pi-helix': (-57, -70, '#ef4444'),
}

for name, (p, s, color) in structures.items():
    ax.plot(p, s, 'o', markersize=10, color=color, markeredgecolor='white', markeredgewidth=1.5)
    ax.annotate(name, (p, s), textcoords="offset points", xytext=(12, 8),
                fontsize=9, color=color, fontweight='bold',
                bbox=dict(boxstyle='round,pad=0.2', facecolor='#0f172a', edgecolor=color, alpha=0.8))

ax.set_xlabel('phi (degrees)', fontsize=13, color='white')
ax.set_ylabel('psi (degrees)', fontsize=13, color='white')
ax.set_title('Ramachandran Plot: Allowed Backbone Conformations', fontsize=14, color='#10b981', fontweight='bold')
ax.set_xlim(-180, 180)
ax.set_ylim(-180, 180)
ax.set_xticks(range(-180, 181, 60))
ax.set_yticks(range(-180, 181, 60))
ax.axhline(0, color='white', alpha=0.3, linewidth=0.5)
ax.axvline(0, color='white', alpha=0.3, linewidth=0.5)
ax.set_facecolor('#0f172a')
ax.tick_params(colors='white')
ax.set_aspect('equal')

cbar = plt.colorbar(contour, ax=ax, shrink=0.8)
cbar.set_label('Relative Steric Energy', color='white', fontsize=11)
cbar.ax.tick_params(colors='white')

fig.patch.set_facecolor('#0f172a')
plt.tight_layout()
plt.savefig('output.png', dpi=150, bbox_inches='tight', facecolor='#0f172a')
plt.close()
print("Ramachandran plot generated with allowed regions for all major secondary structures.")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Python Simulation: ESI-MS Charge State Deconvolution

This simulation demonstrates charge-state deconvolution for lysozyme (M = 14,305 Da). It simulates the ESI charge envelope, determines charge states from adjacent m/z peaks, and reconstructs the molecular weight.

ESI-MS Deconvolution: From m/z Peaks to Molecular Weight

Python

esi_ms_deconvolution.py99 lines

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

# Mass spectrometry: ESI charge state deconvolution
# Given m/z peaks for different charge states z, determine molecular weight M

# For ESI-MS: m/z = (M + z * H) / z where H = 1.00794 Da (proton mass)
# Rearranging: M = z * (m/z) - z * H
# From two adjacent charge states z and z+1:
#   (m/z)_z = (M + z*H) / z
#   (m/z)_{z+1} = (M + (z+1)*H) / (z+1)
# Solving: z = [(m/z)_{z+1} - H] / [(m/z)_z - (m/z)_{z+1}]

H = 1.00794  # proton mass in Da
M_true = 14305.2  # true molecular weight (lysozyme)

# Simulate ESI charge state envelope (z = 8 to 15)
z_values = np.arange(8, 16)
mz_values = (M_true + z_values * H) / z_values

# Add small random noise
np.random.seed(42)
mz_observed = mz_values + np.random.normal(0, 0.5, len(z_values))

# Deconvolution: determine z from adjacent peak pairs
print("=" * 65)
print("ESI-MS Charge State Deconvolution for Lysozyme")
print("=" * 65)
print(f"\nTrue molecular weight: {M_true:.1f} Da\n")
print(f"{'Peak':>6} {'m/z observed':>14} {'z (calc)':>10} {'z (round)':>10} {'M (Da)':>12}")
print("-" * 65)

M_estimates = []
for i in range(len(mz_observed) - 1):
    mz1 = mz_observed[i]    # charge z
    mz2 = mz_observed[i+1]  # charge z+1

# z = (mz2 - H) / (mz1 - mz2)
    z_calc = (mz2 - H) / (mz1 - mz2)
    z_round = round(z_calc)
    M_calc = z_round * (mz1 - H)
    M_estimates.append(M_calc)

print(f"  {i+1:>4}   {mz1:>12.2f}   {z_calc:>8.2f}   {z_round:>8d}   {M_calc:>10.1f}")

M_avg = np.mean(M_estimates)
M_std = np.std(M_estimates)
print(f"\nDeconvolved M = {M_avg:.1f} +/- {M_std:.1f} Da")
print(f"Error from true M: {abs(M_avg - M_true):.1f} Da ({abs(M_avg - M_true)/M_true*100:.3f}%)")

# Plot the ESI mass spectrum
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Simulated ESI spectrum with Gaussian peaks
mz_range = np.linspace(850, 1850, 2000)
intensity = np.zeros_like(mz_range)
for i, (mz, z) in enumerate(zip(mz_observed, z_values)):
    peak_height = np.exp(-0.5 * ((z - 11) / 2.0)**2)  # Gaussian envelope
    intensity += peak_height * np.exp(-0.5 * ((mz_range - mz) / 3.0)**2)

ax1.plot(mz_range, intensity, color='#10b981', linewidth=1.5)
ax1.fill_between(mz_range, intensity, alpha=0.3, color='#10b981')
for mz, z in zip(mz_observed, z_values):
    peak_h = np.exp(-0.5 * ((z - 11) / 2.0)**2)
    ax1.annotate(f'z={z}+', (mz, peak_h), textcoords="offset points",
                xytext=(0, 10), fontsize=8, color='#f59e0b', ha='center', fontweight='bold')

ax1.set_xlabel('m/z', fontsize=12, color='white')
ax1.set_ylabel('Relative Intensity', fontsize=12, color='white')
ax1.set_title('ESI Mass Spectrum (Lysozyme)', fontsize=13, color='#10b981', fontweight='bold')
ax1.set_facecolor('#0f172a')
ax1.tick_params(colors='white')
ax1.grid(True, alpha=0.2, color='#10b981')

# Deconvolved spectrum (single peak at M)
M_range = np.linspace(14000, 14600, 500)
deconv_peak = np.exp(-0.5 * ((M_range - M_avg) / 5.0)**2)
ax2.plot(M_range, deconv_peak, color='#06b6d4', linewidth=2.5)
ax2.fill_between(M_range, deconv_peak, alpha=0.3, color='#06b6d4')
ax2.axvline(x=M_avg, color='#f59e0b', linestyle='--', alpha=0.8)
ax2.annotate(f'M = {M_avg:.1f} Da', (M_avg, 1.0), textcoords="offset points",
            xytext=(15, -5), fontsize=11, color='#f59e0b', fontweight='bold',
            bbox=dict(boxstyle='round,pad=0.3', facecolor='#1e293b', edgecolor='#f59e0b'))

ax2.set_xlabel('Molecular Weight (Da)', fontsize=12, color='white')
ax2.set_ylabel('Relative Intensity', fontsize=12, color='white')
ax2.set_title('Deconvolved Spectrum', fontsize=13, color='#06b6d4', fontweight='bold')
ax2.set_facecolor('#0f172a')
ax2.tick_params(colors='white')
ax2.grid(True, alpha=0.2, color='#06b6d4')

fig.patch.set_facecolor('#0f172a')
plt.tight_layout()
plt.savefig('output.png', dpi=150, bbox_inches='tight', facecolor='#0f172a')
plt.close()
print("\nESI mass spectrum and deconvolved spectrum plotted.")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Amino Acid Classification & Properties

The 20 standard amino acids are classified by the chemical nature of their side chains (R groups). This classification is fundamental because it determines protein folding, enzyme catalysis, and molecular recognition.

Nonpolar (Hydrophobic) Amino Acids

Glycine (Gly, G): $\text{R} = \text{H}$. The simplest amino acid. Unique conformational flexibility (no $\text{C}_\beta$), often found in tight turns and at active sites.
Alanine (Ala, A): $\text{R} = \text{CH}_3$. The reference amino acid for helix propensity measurements. Strong helix former.
Valine (Val, V), Leucine (Leu, L), Isoleucine (Ile, I): Branched aliphatic chains. Found in protein interiors. $\beta$-branching (Val, Ile) disfavors $\alpha$-helix formation.
Proline (Pro, P): Cyclic imino acid. The pyrrolidine ring constrains $\phi \approx -63°$ and introduces a kink. Often found at helix caps and in collagen ($\text{Gly-X-Pro}$ repeats).
Phenylalanine (Phe, F), Tryptophan (Trp, W): Aromatic side chains. Trp has the largest side chain and the highest UV absorption ($\varepsilon_{280} = 5,690\;\text{M}^{-1}\text{cm}^{-1}$).
Methionine (Met, M): Contains a thioether linkage. The initiator amino acid in translation (AUG codon). Susceptible to oxidation to methionine sulfoxide.

Polar Uncharged Amino Acids

Serine (Ser, S), Threonine (Thr, T): Hydroxyl groups capable of H-bonding and phosphorylation. Thr is $\beta$-branched.
Asparagine (Asn, N), Glutamine (Gln, Q): Amide side chains. Common sites of N-linked glycosylation (Asn-X-Ser/Thr sequon).
Tyrosine (Tyr, Y): Phenolic hydroxyl with $\text{pK}_a \approx 10.5$. Can be phosphorylated. Contributes to UV absorption at 280 nm ($\varepsilon_{280} = 1,280\;\text{M}^{-1}\text{cm}^{-1}$).
Cysteine (Cys, C): Thiol group ($\text{pK}_a \approx 8.3$). Forms disulfide bonds ($\text{Cys-S-S-Cys}$). Critical in redox chemistry and metal coordination (zinc fingers).

Charged Amino Acids

Aspartate (Asp, D): $\text{pK}_a \approx 3.9$. Negative charge at physiological pH. Common in Ca$^{2+}$-binding sites.
Glutamate (Glu, E): $\text{pK}_a \approx 4.1$. Negative charge at physiological pH. Frequent in enzyme active sites as general acid/base catalyst.
Lysine (Lys, K): $\text{pK}_a \approx 10.5$. Positive charge at physiological pH. Subject to acetylation, methylation, ubiquitination (post-translational modifications important in epigenetics).
Arginine (Arg, R): Guanidinium group, $\text{pK}_a \approx 12.5$. Always protonated at physiological pH. Forms multiple H-bonds; important for substrate binding (e.g., phosphate recognition).
Histidine (His, H): Imidazole ring, $\text{pK}_a \approx 6.0$. The only amino acid that titrates near physiological pH, making it ideal for acid-base catalysis (e.g., in the catalytic triad of serine proteases).

UV Absorption of Proteins

Protein concentration can be estimated by UV absorbance at 280 nm. The molar extinction coefficient is predicted from the amino acid composition using the Beer-Lambert law:

$$A_{280} = \varepsilon_{280} \cdot c \cdot l$$

$$\varepsilon_{280} = n_{\text{Trp}} \times 5690 + n_{\text{Tyr}} \times 1280 + n_{\text{Cys-Cys}} \times 125 \;\;\text{M}^{-1}\text{cm}^{-1}$$

This is the Pace method for estimating extinction coefficients from sequence alone, widely used to determine protein concentration without a standard curve.

Protein Purification Strategies

Protein purification exploits differences in physical and chemical properties between the target protein and contaminants. A typical purification scheme uses 3–5 chromatographic steps.

Ion Exchange Chromatography

Separates proteins by net charge. The protein binds to the column at low ionic strength and is eluted by a salt gradient. The charge on a protein is pH-dependent:

$$Q_{\text{net}}(\text{pH}) = \sum_i \frac{1}{1 + 10^{(\text{pH} - \text{pK}_{a,i})}} - \sum_j \frac{1}{1 + 10^{(\text{pK}_{a,j} - \text{pH})}}$$

where the first sum is over cationic groups and the second over anionic groups. At pH above the pI, the protein is negatively charged and binds an anion exchanger (e.g., DEAE, Q). Below the pI, it is positively charged and binds a cation exchanger (e.g., CM, S).

Size Exclusion Chromatography (SEC)

Also called gel filtration. Separates by hydrodynamic radius (Stokes radius $R_s$). Larger proteins are excluded from pore space and elute first. The partition coefficient is:

$$K_{av} = \frac{V_e - V_0}{V_t - V_0}$$

where $V_e$ is elution volume, $V_0$ is void volume, and $V_t$ is total column volume. A plot of $\log M_w$ vs $K_{av}$ is linear over the fractionation range.

Affinity Chromatography

The most selective purification method. A ligand specific for the target protein is immobilized on a solid support. The target binds while contaminants wash through. Common systems include:

Ni-NTA: Binds His-tagged recombinant proteins ($K_d \sim \mu\text{M}$). Eluted with imidazole.
Glutathione-Sepharose: Binds GST-tagged proteins. Eluted with reduced glutathione.
Protein A/G: Binds the Fc region of IgG antibodies. Used for antibody purification.
Substrate analogs: Immobilized inhibitors or cofactors for enzyme purification.

Purification Table

Progress is tracked by a purification table with key metrics:

$$\text{Specific Activity} = \frac{\text{Total Activity (units)}}{\text{Total Protein (mg)}}$$

$$\text{Purification Fold} = \frac{\text{Specific Activity (step } n\text{)}}{\text{Specific Activity (crude)}}$$

$$\text{Yield} = \frac{\text{Total Activity (step } n\text{)}}{\text{Total Activity (crude)}} \times 100\%$$

Key Equations Summary

Henderson-Hasselbalch Equation

$$\text{pH} = \text{pK}_a + \log\frac{[\text{A}^-]}{[\text{HA}]}$$

Isoelectric Point

$$\text{pI} = \frac{\text{pK}_{a,\text{flanking1}} + \text{pK}_{a,\text{flanking2}}}{2}$$

Electrophoretic Mobility

$$\mu = \frac{q}{6\pi\eta r}$$

ESI-MS Charge State

$$z = \frac{(m/z)_2 - H}{(m/z)_1 - (m/z)_2}, \qquad M = z[(m/z)_1 - H]$$

Beer-Lambert Law

$$A = \varepsilon c l$$

Buffering Capacity

$$\beta = 2.303 \cdot C_{\text{total}} \cdot \frac{K_a [\text{H}^+]}{(K_a + [\text{H}^+])^2}$$

SDS-PAGE Mobility

$$\log M = a - b \cdot R_f$$

Size Exclusion Partition Coefficient

$$K_{av} = \frac{V_e - V_0}{V_t - V_0}$$

Net Charge as a Function of pH

$$Q_{\text{net}}(\text{pH}) = \sum_i \frac{1}{1 + 10^{(\text{pH} - \text{pK}_{a,i})}} - \sum_j \frac{1}{1 + 10^{(\text{pK}_{a,j} - \text{pH})}}$$

Purification Fold

$$\text{Fold} = \frac{\text{Specific Activity}_n}{\text{Specific Activity}_{\text{crude}}}$$

Stereochemistry of Amino Acids

All standard amino acids except glycine have a chiral $\text{C}_\alpha$ (four different substituents). Biological proteins exclusively use the L-configuration (S-configuration by Cahn-Ingold-Prelog rules, except for cysteine which is R due to the sulfur priority).

Threonine and isoleucine have two chiral centers ($\text{C}_\alpha$ and$\text{C}_\beta$), giving rise to potential diastereomers. Only the L-threonine (2S,3R) and L-isoleucine (2S,3S) forms are found in proteins.

Optical rotation is measured with a polarimeter. The specific rotation is:

$$[\alpha]^T_\lambda = \frac{\alpha_{\text{obs}}}{l \cdot c}$$

where $\alpha_{\text{obs}}$ is the observed rotation, $l$ is the path length in dm, and$c$ is the concentration in g/mL. L-amino acids can be either dextrorotatory (+) or levorotatory (−); the D/L designation refers to configuration, not the sign of rotation.

D-amino acids do occur in biology: in bacterial cell walls (D-Ala, D-Glu in peptidoglycan), certain antibiotics (gramicidin), and some neuropeptides. Their presence in peptidoglycan confers resistance to most proteases, which are specific for L-amino acids.

← Part I Overview Enzyme Mechanisms & Kinetics →