Peptide Bonds & Primary Structure
Covalent linkage of amino acids, backbone geometry, and the information content of protein sequence
The Peptide Bond
Amino acids are joined together in proteins by peptide bonds, which are amide linkages formed between the $\alpha$-carboxyl group of one amino acid and the$\alpha$-amino group of the next. This is a condensation reaction(also called a dehydration synthesis) that releases one molecule of water:
More precisely, the reaction can be written as:
Thermodynamics of Peptide Bond Formation
Peptide bond formation is thermodynamically unfavorable under standard conditions. The standard free energy change for hydrolysis of a peptide bond is:
This means that the equilibrium favors hydrolysis, not synthesis. In biological systems, peptide bond formation is driven by the ribosome, which couples the reaction to the hydrolysis of GTP and the energy stored in aminoacyl-tRNA bonds. The overall energetic cost of adding one amino acid residue during translation is approximately:
(accounting for aminoacyl-tRNA synthetase activation, GTP hydrolysis during elongation factor binding, and translocation steps).
Bond Characteristics
The peptide bond has several important physical properties:
- Bond length: The C-N bond in a peptide is ~1.33 Å, intermediate between a typical C-N single bond (~1.49 Å) and a C=N double bond (~1.27 Å)
- Bond dissociation energy: ~330 kJ/mol, indicating significant stability
- Planarity: The six atoms $C_\alpha$-C-O-N-H-$C_\alpha$ lie in a single plane (the peptide plane)
- Trans configuration: The $C_\alpha$ atoms on either side of the peptide bond are trans to each other in >99.9% of peptide bonds (exception: X-Pro bonds, ~10% cis)
- No free rotation: The partial double-bond character prevents rotation around the C-N bond (rotation barrier ~80 kJ/mol)
Resonance and Planarity
The peptide bond exhibits partial double-bond character (~40%) due to resonance (delocalization of electrons) between the carbonyl oxygen and the amide nitrogen. This can be represented by two major resonance structures:
Resonance Structures
In Structure I, the C=O is a full double bond and C-N is a single bond. In Structure II, the electrons are delocalized, giving the C-N bond partial double-bond character while the C=O bond becomes a single bond with a negative charge on oxygen and a positive charge on nitrogen.
The true electronic structure is a resonance hybrid of these two forms. The fraction of double-bond character in the C-N bond can be estimated from bond length data:
This simple calculation overestimates the double-bond character. More sophisticated analyses (including electron density maps from X-ray crystallography) place it at approximately 40%.
The Peptide Plane
As a consequence of the partial double-bond character, the six atoms involved in the peptide bond unit are constrained to lie in a single plane:
- $C_\alpha^{(i)}$ -- the alpha carbon of residue $i$
- $\text{C}^{(i)}$ -- the carbonyl carbon
- $\text{O}^{(i)}$ -- the carbonyl oxygen
- $\text{N}^{(i+1)}$ -- the amide nitrogen
- $\text{H}^{(i+1)}$ -- the amide hydrogen
- $C_\alpha^{(i+1)}$ -- the alpha carbon of residue $i+1$
The dihedral angle $\omega$ (omega) describes rotation around the C-N peptide bond. For the trans configuration, $\omega = 180°$; for the cisconfiguration, $\omega = 0°$. The energy barrier to rotation about $\omega$ is approximately:
This barrier is high enough that interconversion between cis and trans is slow and usually requires enzymatic catalysis (e.g., peptidyl-prolyl isomerases).
The Ramachandran Plot
While the peptide bond itself is rigid and planar, the protein backbone retains flexibility through rotation about two bonds flanking each $C_\alpha$. These rotations are described by two dihedral angles:
Backbone Dihedral Angles
- $\phi$ (phi): rotation about the $\text{N} - C_\alpha$ bond, defined by the atoms $\text{C}_{i-1} - \text{N}_i - C_{\alpha,i} - \text{C}_i$
- $\psi$ (psi): rotation about the $C_\alpha - \text{C}$ bond, defined by the atoms $\text{N}_i - C_{\alpha,i} - \text{C}_i - \text{N}_{i+1}$
Both angles range from $-180°$ to $+180°$. A plot of $\psi$versus $\phi$ for all residues in a protein is called a Ramachandran plot(or Ramachandran diagram), first introduced by G.N. Ramachandran in 1963.
Steric Constraints and Allowed Regions
Not all ($\phi, \psi$) combinations are sterically allowed. Van der Waals repulsions between backbone atoms and the $C_\beta$ atom of the side chain restrict the accessible conformational space. The major allowed regions correspond to well-known secondary structures:
| Secondary Structure | $\phi$ (degrees) | $\psi$ (degrees) | Region |
|---|---|---|---|
| Right-handed $\alpha$-helix | ~$-57°$ | ~$-47°$ | Lower left quadrant |
| Parallel $\beta$-sheet | ~$-119°$ | ~$+113°$ | Upper left quadrant |
| Antiparallel $\beta$-sheet | ~$-139°$ | ~$+135°$ | Upper left quadrant |
| $3_{10}$-helix | ~$-49°$ | ~$-26°$ | Lower left quadrant |
| Left-handed $\alpha$-helix | ~$+57°$ | ~$+47°$ | Upper right quadrant |
Special Cases: Glycine and Proline
Glycine ($\text{R} = \text{H}$) has the greatest conformational flexibility because it lacks a $C_\beta$ side chain. Its Ramachandran plot shows a much larger allowed region compared to other residues, including the upper-right quadrant (left-handed helix region). Glycine residues are therefore commonly found in tight turns and loops where other amino acids would experience steric clashes.
Proline is the most conformationally restricted amino acid. Its cyclic pyrrolidine ring constrains $\phi$ to approximately $-65° \pm 15°$. This severely limits the allowed ($\phi, \psi$) space and explains why proline is rarely found within regular $\alpha$-helices (it disrupts the hydrogen bonding pattern, since it lacks an amide hydrogen).
Polypeptide Chains
A polypeptide is a linear chain of amino acid residues linked by peptide bonds. By convention, the chain is written and read from the N-terminus (free $\alpha$-amino group) to the C-terminus (free $\alpha$-carboxyl group). This directionality is analogous to the 5' to 3' convention for nucleic acids.
Backbone and Side Chains
The repeating unit of the polypeptide backbone consists of the atoms $\text{N} - C_\alpha - \text{C}$(three atoms per residue). The backbone is the structural scaffold, while the side chains (R-groups) project outward and determine the chemical properties of each position. The backbone atoms form a regular repeating pattern with a rise per residue of 3.6 Å in an $\alpha$-helix and 3.5 Å per residue in a fully extended $\beta$-strand.
Molecular Weight Calculations
The average molecular weight of an amino acid residue (after peptide bond formation, i.e., minus water) is approximately 110 Da (daltons). For a polypeptide of $n$ residues, each peptide bond releases one water molecule ($M_r = 18$ Da), so the total molecular weight is:
where $M_i$ is the molecular weight of the $i$-th free amino acid. Using the average residue weight approximation:
Here 128 Da is the average molecular weight of a free amino acid (weighted by typical amino acid composition), and each peptide bond removes 18 Da of water. The residue weight of ~110 Da is the standard approximation used in biochemistry.
Example: Estimating protein molecular weight
Human hemoglobin $\alpha$-chain contains 141 amino acid residues. Estimate its molecular weight.
The actual molecular weight (from the amino acid sequence) is 15,126 Da, so the approximation is within ~2.5%. For more precise calculations, the exact molecular weights of each residue must be summed.
Protein Sequencing
The primary structure of a protein is its amino acid sequence. Determining this sequence is fundamental to understanding protein function. Several methods have been developed, each with distinct advantages.
Edman Degradation
Developed by Pehr Edman in 1950, this method sequentially removes and identifies the N-terminal residue. The reagent phenylisothiocyanate (PITC) reacts with the free $\alpha$-amino group under mildly alkaline conditions:
Treatment with anhydrous acid cleaves the PTC-amino acid derivative as a thiazolinone, which is then converted to the more stable phenylthiohydantoin (PTH)-amino acid and identified by HPLC. The cycle is repeated on the shortened peptide. Key limitations:
- Practical limit of ~50-60 residues per run (cumulative yield losses, ~2-5% per cycle)
- If the overall yield per cycle is $Y$, after $n$ cycles the remaining signal is $Y^n$
- Blocked N-termini (e.g., acetylated) cannot be sequenced by Edman degradation
- Requires ~1-5 pmol of pure protein
For a 95% yield per cycle after 60 cycles: $0.95^{60} = 0.046$ (only 4.6% of original signal remains).
Mass Spectrometry
Modern protein sequencing relies heavily on tandem mass spectrometry (MS/MS). The protein is digested into peptides (typically with trypsin, which cleaves after Lys and Arg), and the peptides are analyzed by:
- MALDI-TOF: Matrix-assisted laser desorption/ionization with time-of-flight detection. Measures $m/z$ ratios to determine peptide masses
- ESI-MS/MS: Electrospray ionization with tandem mass analysis. Selected peptide ions are fragmented (CID) and the fragment masses reveal the sequence
- Peptide mass fingerprinting: The pattern of peptide masses is compared to theoretical digests of known proteins in databases
The mass accuracy of modern instruments allows resolution of individual amino acids. For example, distinguishing leucine (131.17 Da) from isoleucine (131.17 Da) requires additional fragmentation methods (e.g., ETD or high-energy CID to generate $d$-ions and $w$-ions).
Sanger's Method (Historical)
Frederick Sanger developed the first complete protein sequencing method in the 1950s, determining the sequence of bovine insulin (51 residues, 2 chains). His approach used:
- 2,4-dinitrofluorobenzene (DNFB) to label the N-terminal residue
- Partial acid hydrolysis to generate overlapping fragments
- Paper chromatography to separate and identify DNP-labeled amino acids
- Assembly of overlapping fragment sequences to reconstruct the complete sequence
Sanger received the Nobel Prize in Chemistry in 1958 for this work. His method demonstrated that each protein has a unique, genetically determined amino acid sequence.
Amino Acid Composition vs. Sequence
It is important to distinguish between amino acid composition (the total number of each amino acid present, without regard to order) and amino acid sequence (the precise order from N- to C-terminus). For a peptide of $n$ residues, the number of possible unique sequences is:
For a modest 100-residue protein, $20^{100} \approx 1.27 \times 10^{130}$ possible sequences exist -- a number far exceeding the total number of atoms in the observable universe ($\sim 10^{80}$). This astronomical diversity explains how natural selection can produce proteins with extraordinarily specific functions.
Post-Translational Modifications
After (or during) translation, many proteins undergo covalent chemical modifications that alter their properties. Over 400 types of post-translational modifications (PTMs) have been identified. These modifications dramatically expand the functional diversity of the proteome beyond what is encoded in the genome.
Phosphorylation
The most common regulatory PTM. A phosphate group ($\text{PO}_4^{3-}$) is transferred from ATP to the hydroxyl group of Ser, Thr, or Tyr by protein kinases:
Phosphorylation introduces two negative charges at physiological pH, which can cause conformational changes, alter binding interfaces, or create docking sites for phospho-binding domains (e.g., SH2, 14-3-3). The reaction is reversed by phosphatases. Approximately 30% of all eukaryotic proteins are phosphorylated at any given time.
Glycosylation
The attachment of sugar moieties to proteins. Two major types exist:
- N-linked glycosylation: Oligosaccharide attached to the amide nitrogen of Asn in the sequence $\text{Asn-X-Ser/Thr}$ (where X is any amino acid except Pro). Occurs in the ER.
- O-linked glycosylation: Sugars attached to the hydroxyl group of Ser or Thr. Occurs primarily in the Golgi apparatus.
Glycosylation affects protein folding, stability, cell-cell recognition, and immune evasion. More than 50% of all human proteins are glycosylated.
Other Major PTMs
| Modification | Target Residues | Function | Reversible? |
|---|---|---|---|
| Acetylation | Lys (N-terminal) | Chromatin regulation, protein stability | Yes (deacetylases) |
| Methylation | Lys, Arg | Histone regulation, signal transduction | Yes (demethylases) |
| Ubiquitination | Lys | Proteasomal degradation, signaling | Yes (DUBs) |
| SUMOylation | Lys | Nuclear transport, transcription regulation | Yes (SENPs) |
| Disulfide bonding | Cys | Structural stabilization | Yes (redox) |
| Proteolytic cleavage | Specific sites | Activation (zymogens), processing | No |
Example: Ubiquitin-mediated proteolysis
Polyubiquitination (chains of $\geq 4$ ubiquitin molecules linked via Lys48) tags proteins for degradation by the 26S proteasome. The process requires three enzymes:
- E1 (ubiquitin-activating enzyme): activates ubiquitin via a thioester bond, consuming ATP
- E2 (ubiquitin-conjugating enzyme): receives ubiquitin from E1
- E3 (ubiquitin ligase): transfers ubiquitin to the target protein's Lys residue
Key Concepts
- ▶The peptide bond is a covalent amide linkage formed by condensation of the $\alpha$-carboxyl and $\alpha$-amino groups, releasing $\text{H}_2\text{O}$.
- ▶Resonance gives the peptide bond ~40% double-bond character, making it planar and rigid. Six atoms lie in the peptide plane.
- ▶The trans configuration is strongly favored ($\omega = 180°$). The cis form is significant only for X-Pro bonds (~10%).
- ▶Backbone flexibility comes from rotation about $\phi$ (N-$C_\alpha$) and $\psi$ ($C_\alpha$-C). The Ramachandran plot maps allowed ($\phi, \psi$) combinations.
- ▶Glycine has maximal conformational freedom; Proline has minimal freedom (fixed $\phi \approx -65°$).
- ▶Polypeptides are read N-terminus to C-terminus. Average residue weight is ~110 Da, so $M_r \approx 110n$ for $n$ residues.
- ▶Edman degradation sequentially removes N-terminal residues using PITC. Modern sequencing uses tandem mass spectrometry.
- ▶Post-translational modifications (phosphorylation, glycosylation, acetylation, ubiquitination) expand the functional diversity of the proteome and regulate protein activity.