15. Translation
Translation converts the nucleotide language of mRNA into the amino acid language of proteins. The ribosome โ a massive ribonucleoprotein machine โ reads codons at ~20 amino acids per second, building polypeptide chains with remarkable accuracy guided by the universal genetic code.
Ribosome Structure and Function
The ribosome is composed of two asymmetric subunits, each containing both rRNA and proteins. The landmark crystal structures by Venki Ramakrishnan, Thomas Steitz, and Ada Yonath (Nobel Prize 2009) revealed that the ribosome is fundamentally a ribozyme: the peptidyl transferase center is composed entirely of rRNA (23S in prokaryotes, 28S in eukaryotes), with no protein atoms within 18 angstroms of the active site.
Prokaryotic (70S)
- 30S small subunit: 16S rRNA + 21 proteins; decoding center (codon-anticodon matching)
- 50S large subunit: 23S rRNA + 5S rRNA + 34 proteins; peptidyl transferase center, exit tunnel
Eukaryotic (80S)
- 40S small subunit: 18S rRNA + 33 proteins; scanning mechanism for AUG recognition
- 60S large subunit: 28S + 5.8S + 5S rRNA + 47 proteins; larger exit tunnel with chaperone docking
Three tRNA Binding Sites
The ribosome has three tRNA binding sites that span both subunits:
- A (aminoacyl) site: accepts incoming aminoacyl-tRNA; decoding occurs here
- P (peptidyl) site: holds tRNA carrying the growing peptide chain
- E (exit) site: deacylated tRNA exits here after translocation
Derivation 1: The Genetic Code โ Degeneracy and Wobble
The genetic code was deciphered by Marshall Nirenberg, Har Gobind Khorana, and Robert Holley (Nobel Prize 1968). It is a triplet, non-overlapping, degenerate, nearly universal code mapping 64 codons to 20 amino acids and 3 stop signals.
Why Triplets?
The information-theoretic argument: with 4 nucleotide bases, the minimum codon length to encode 20 amino acids is:
Triplets are the minimum, but 64 codons for 20 amino acids (+ stops) means the code is degenerate: most amino acids are specified by more than one codon. This degeneracy is not random โ it follows a pattern that minimizes the impact of mutations.
The Wobble Hypothesis
Francis Crick proposed the wobble hypothesis (1966) to explain how fewer tRNAs (~45 in mammals) can decode all 61 sense codons. Non-standard base pairing is allowed at the 3rd codon position (wobble position):
- Anticodon G pairs with codon C or U (wobble)
- Anticodon U pairs with codon A or G (wobble)
- Anticodon I (inosine, from deamination of A) pairs with codon U, C, or A
The wobble rules explain why degeneracy is concentrated at the 3rd codon position. The first two positions follow strict Watson-Crick rules, so mutations there are more likely to change the encoded amino acid.
Error Minimization
The genetic code is organized to minimize the phenotypic impact of point mutations. Conservative amino acid substitutions (similar physicochemical properties) are overrepresented:
This ~2-fold enrichment suggests the code was optimized by natural selection, though the degree of optimization is debated. Most single-nucleotide mutations either change the amino acid to a similar one or are silent (synonymous) โ a built-in buffer against translational and replication errors.
Derivation 2: The Translation Mechanism
Initiation
Translation initiation differs fundamentally between prokaryotes and eukaryotes:
Prokaryotic Initiation
The Shine-Dalgarno sequence (AGGAGG, 5-10 nt upstream of AUG) base-pairs with the anti-Shine-Dalgarno sequence in 16S rRNA, positioning the start codon in the P site:
IF1, IF2 (GTPase), IF3 assemble the 30S initiation complex with fMet-tRNA$_f^{\text{Met}}$.
Eukaryotic Initiation
The scanning model: the 43S pre-initiation complex (40S + eIF1, eIF1A, eIF3, eIF2-GTP-Met-tRNA$_i^{\text{Met}}$) binds the 5' cap via eIF4F complex, then scans 5' to 3' until it encounters the first AUG in a favorable Kozak context:
The purine at -3 and G at +4 are most critical. eIF2-GTP hydrolysis commits to the chosen AUG.
Elongation: The Three-Step Cycle
Each amino acid addition involves three steps, consuming 2 GTP:
1. Codon Recognition (EF-Tu/eEF1A)
Aminoacyl-tRNA is delivered to the A site as a ternary complex with EF-Tu-GTP. Correct codon-anticodon matching triggers GTP hydrolysis and EF-Tu release (kinetic proofreading):
2. Peptide Bond Formation (Peptidyl Transferase)
The $\alpha$-amino group of the A-site aminoacyl-tRNA attacks the carbonyl carbon of the P-site peptidyl-tRNA. This is catalyzed by the 23S rRNA (ribozyme):
No energy input required โ driven by the high free energy of the aminoacyl-tRNA ester bond.
3. Translocation (EF-G/eEF2)
EF-G-GTP binds and its GTP hydrolysis drives the ribosome to translocate one codon (3 nt) in the 5' to 3' direction:
Codon Usage Bias and Translation Efficiency
Although the genetic code is degenerate, synonymous codons are not used equally. Highly expressed genes preferentially use codons corresponding to the most abundant tRNAs, a phenomenon called codon usage bias.
The Codon Adaptation Index (CAI)
The CAI quantifies how well a gene's codon usage matches the optimal codons of the organism:
where $w_i$ is the relative adaptiveness of the $i$-th codon and $L$ is the gene length in codons. CAI ranges from 0 (poor adaptation) to 1 (optimal). Highly expressed genes (ribosomal proteins, glycolytic enzymes) have CAI > 0.8 in E. coli.
Practical Applications
Codon optimization is critical for recombinant protein expression: when expressing a human protein in E. coli (or vice versa), the gene sequence is redesigned to use the host's preferred codons. This can increase protein yields 10-100 fold. The mRNA vaccines for COVID-19 were codon-optimized for human codon usage and used modified nucleosides (N1-methylpseudouridine) to enhance translation efficiency and reduce innate immune recognition.
Rare codons also have functional roles: they can slow translation at specific positions, allowing co-translational protein folding of individual domains before the next domain begins to emerge from the ribosome. Deliberately placed rare codons act as "translational speed bumps" that improve folding efficiency for complex multidomain proteins.
Derivation 3: Energetics of Translation
Translation is the most energetically expensive biosynthetic process in the cell โ consuming up to ~75% of a cell's ATP in rapidly growing bacteria.
Cost per Amino Acid
For each amino acid incorporated, the cell expends:
For a typical 300-residue protein, the translation cost is ~1200 ATP. Adding initiation factors, termination, and quality control increases this to ~1300-1400 ATP per protein. For perspective, a single glucose molecule yields only ~32 ATP โ so synthesizing one protein requires the oxidation of ~40 glucose molecules.
Translation Speed and Fidelity
In E. coli, the ribosome incorporates ~20 amino acids/sec. The error rate (~$10^{-4}$ per codon) is achieved through a kinetic proofreading mechanism (Hopfield, 1974):
The first selection step is initial binding ($\sim 10^{-2}$ discrimination). GTP hydrolysis then provides an irreversible step that allows a second selection ($\sim 10^{-2}$), amplifying the fidelity at the cost of energy (the sacrificed GTP).
Aminoacyl-tRNA Synthetases: The Second Genetic Code
The aminoacyl-tRNA synthetases (aaRS) are responsible for correctly matching each amino acid to its cognate tRNA โ a process Paul Schimmel called the "second genetic code" because errors here are propagated into the protein with no opportunity for correction.
The Aminoacylation Reaction
Each of the 20 aaRS catalyzes a two-step reaction:
The PP$_i$ is immediately hydrolyzed by pyrophosphatase ($\text{PP}_i \rightarrow 2\;\text{P}_i$), making the overall reaction irreversible (equivalent to 2 ATP).
Two Classes of aaRS
Class I (10 aaRS)
Monomers or dimers; Rossmann fold; aminoacylate the 2'-OH of tRNA; approach tRNA from the minor groove side. Amino acids: Arg, Cys, Glu, Gln, Ile, Leu, Met, Trp, Tyr, Val.
Class II (10 aaRS)
Primarily dimers or tetramers; antiparallel $\beta$-sheet; aminoacylate the 3'-OH of tRNA; approach from the major groove side. Amino acids: Ala, Asn, Asp, Gly, His, Lys, Phe, Pro, Ser, Thr.
Editing (Proofreading) by aaRS
Some amino acids are structurally similar (e.g., isoleucine vs valine, threonine vs serine), and the initial discrimination by the active site is insufficient. Several aaRS have a separate editing domain that hydrolyzes mischarged AA-tRNA. Isoleucyl-tRNA synthetase (IleRS) is the classic example:
The editing site uses a "double sieve" mechanism: the synthetic site excludes amino acids larger than isoleucine (steric exclusion), while the editing site excludes amino acids as large as isoleucine but hydrolyzes the smaller valine-AMP. This achieves $\sim 1/40{,}000$ error rate, well below the ~1/3,000 rate from the synthetic site alone.
Co-Translational Protein Folding and Chaperones
Newly synthesized polypeptides do not fold in a vacuum. The cellular environment is extremely crowded (~300-400 mg/mL protein), increasing the risk of misfolding and aggregation. A sophisticated chaperone system assists folding:
- Trigger factor (bacteria) / NAC (eukaryotes): bind the ribosome exit tunnel and shield the nascent chain during co-translational folding
- Hsp70/DnaK system: ATP-dependent chaperone that binds hydrophobic stretches on unfolded proteins, preventing aggregation and allowing iterative folding attempts
- Hsp60/GroEL-GroES (chaperonin): a barrel-shaped cavity where proteins up to ~60 kDa fold in isolation from the crowded cytoplasm; uses ~130 ATP per folding cycle
- Hsp90: specialized chaperone for signaling proteins (kinases, steroid receptors, transcription factors); target of the anticancer drug geldanamycin
The Unfolded Protein Response (UPR)
When misfolded proteins accumulate in the endoplasmic reticulum, the unfolded protein response (UPR) is activated through three ER transmembrane sensors:
If the UPR fails to resolve ER stress, the cell activates apoptosis via CHOP (C/EBP homologous protein). Chronic ER stress and UPR activation are implicated in diabetes (beta-cell failure), neurodegeneration (Parkinson's, Alzheimer's), and liver disease. Protein misfolding diseases (amyloidoses) โ including Alzheimer's ($\beta$-amyloid), Parkinson's ($\alpha$-synuclein), and prion diseases (PrP$^{\text{Sc}}$) โ result from the failure of quality control to prevent toxic aggregation.
Derivation 4: Termination and Quality Control
Termination
When a stop codon (UAA, UAG, or UGA) enters the A site, no aminoacyl-tRNA matches. Instead, release factors recognize the stop codon and catalyze peptide release:
- Prokaryotes: RF1 (recognizes UAA/UAG), RF2 (recognizes UAA/UGA), RF3 (GTPase for RF recycling)
- Eukaryotes: eRF1 (recognizes all three stop codons), eRF3 (GTPase)
Ribosome-Associated Quality Control
Three quality control pathways handle aberrant mRNAs:
Nonsense-Mediated Decay (NMD)
Detects premature stop codons (>50 nt upstream of exon-exon junction). UPF1/UPF2/UPF3 recruit SMG1 kinase, triggering mRNA degradation. Prevents production of truncated, potentially toxic proteins.
Non-Stop Decay (NSD)
When mRNA lacks a stop codon (e.g., broken mRNA), the ribosome reaches the 3' end and stalls. Ski7/Dom34-Hbs1 complex targets the mRNA for exosome degradation and rescues the stalled ribosome.
No-Go Decay (NGD)
Ribosomes stalled by mRNA secondary structures, rare codons, or damaged mRNA. Dom34/Hbs1 split the ribosome, and the truncated peptide is targeted for proteasomal degradation via the RQC (ribosome quality control) complex.
Polyribosomes and Translational Control
Multiple ribosomes can simultaneously translate a single mRNA, forming polyribosomes (polysomes). The spacing between ribosomes is ~80 nucleotides (~27 codons), meaning a typical 1 kb mRNA can accommodate ~12 ribosomes simultaneously, dramatically increasing the protein production rate.
mRNA Circularization
Eukaryotic mRNAs adopt a closed-loop conformation during active translation, mediated by the interaction between the 5' cap-binding complex (eIF4E-eIF4G) and the 3' poly(A)-binding protein (PABP):
This circularization enhances translation efficiency by allowing ribosomes that have just terminated to be "recycled" directly to the 5' cap for re-initiation, without dissociating into the cytoplasm. It also ensures that only intact mRNAs (with both a cap and poly(A) tail) are efficiently translated โ providing quality control against truncated or decapped mRNAs.
Global Translational Control
Under stress conditions, cells globally reduce translation to conserve energy. The key mechanism involves phosphorylation of eIF2$\alpha$ at Ser-51 by four stress-responsive kinases:
- GCN2: activated by uncharged tRNAs (amino acid starvation)
- PERK: activated by ER stress (unfolded protein response)
- HRI: activated by heme deficiency (erythroid cells)
- PKR: activated by double-stranded RNA (viral infection)
Paradoxically, eIF2$\alpha$ phosphorylation increases translation of specific mRNAs with upstream open reading frames (uORFs) in their 5' UTR, including ATF4 (stress-responsive transcription factor) and CHOP (pro-apoptotic). This integrated stress response (ISR) allows the cell to reprogram gene expression while reducing overall protein synthesis โ a sophisticated survival strategy.
mTORC1 and Cap-Dependent Translation
The mTORC1 kinase promotes cap-dependent translation by phosphorylating two key targets:
In the unphosphorylated state, 4E-BP1 sequesters eIF4E, blocking cap recognition. mTORC1-mediated phosphorylation releases eIF4E to join the eIF4F complex, enabling cap-dependent translation. Rapamycin (sirolimus) inhibits mTORC1, reducing translation of mRNAs with complex 5' UTRs (often encoding growth factors and oncoproteins) โ hence its use as an immunosuppressant and anticancer agent.
Derivation 5: Post-Translational Modifications
The proteome's diversity vastly exceeds the genome's coding capacity because proteins are extensively modified after translation. Over 400 types of post-translational modifications (PTMs) have been identified, expanding the functional repertoire from 20 amino acids to effectively thousands of distinct chemical states.
Phosphorylation
The most common regulatory PTM, catalyzed by >500 protein kinases in the human genome (the "kinome"):
Phosphorylation adds 2 negative charges at physiological pH, causing conformational changes and creating docking sites for phospho-binding domains (SH2, PTB, 14-3-3). ~30% of human proteins are phosphorylated at any given time. Phosphatases reverse the modification, creating a dynamic on/off switch.
Glycosylation
Two major types: N-linked (to Asn in the sequon Asn-X-Ser/Thr; begins in ER with dolichol-linked oligosaccharide) and O-linked (to Ser/Thr; occurs in Golgi). Glycosylation affects protein folding, stability, and cell-cell recognition. About 50% of human proteins are glycosylated. Congenital disorders of glycosylation (CDGs) cause multisystem disease.
Ubiquitination
The attachment of ubiquitin (76 amino acids) to lysine residues marks proteins for proteasomal degradation. The ubiquitin-proteasome system (Aaron Ciechanover, Avram Hershko, Irwin Rose โ Nobel Prize 2004):
Polyubiquitin chains linked via Lys-48 of ubiquitin signal proteasomal degradation. Other chain types (Lys-63, linear) serve non-degradative functions (NF-$\kappa$B signaling, DNA repair). The human genome encodes >600 E3 ubiquitin ligases, providing remarkable substrate specificity. The N-end rule pathway links protein half-life to the identity of the N-terminal amino acid.
Protein Sorting: Directing Proteins to Their Destinations
Newly synthesized proteins must be directed to the correct subcellular compartment. In eukaryotic cells, proteins can be targeted to the ER/secretory pathway, mitochondria, nucleus, peroxisomes, or remain in the cytoplasm. Targeting depends on signal sequences โ short peptide motifs recognized by specific receptors:
- ER signal peptide (~16-30 aa, hydrophobic core): recognized by SRP during translation; co-translational translocation via Sec61 translocon
- Mitochondrial targeting sequence (N-terminal, amphipathic helix): recognized by TOM/TIM complexes on mitochondrial membranes; post-translational import driven by membrane potential and mtHsp70
- Nuclear localization signal (NLS) (Lys/Arg-rich, e.g., KKKRKV): recognized by importin $\alpha/\beta$; import through nuclear pore complex (NPC) powered by Ran-GTP gradient
- Peroxisomal targeting signal (PTS1) (C-terminal SKL tripeptide): recognized by Pex5 receptor; post-translational import
ER-Associated Degradation (ERAD)
Misfolded proteins in the ER are retrotranslocated to the cytoplasm and degraded by the proteasome โ a process called ERAD. The ER quality control system uses a lectin-based mechanism: the N-linked glycan on newly synthesized proteins is sequentially trimmed by glucosidases and mannosidases. If the protein fails to fold properly within the allotted time, mannose trimming marks it for ERAD:
ERAD is clinically relevant: the most common cystic fibrosis mutation ($\Delta$F508-CFTR) produces a protein that folds slowly but is functional. ERAD prematurely degrades it before it reaches the plasma membrane. Corrector drugs (lumacaftor, elexacaftor in Trikafta) stabilize the mutant CFTR, allowing it to escape ERAD and reach the cell surface โ a triumph of molecular understanding informing therapy.
Applications in Medicine and Research
Ribosome-Targeting Antibiotics
Structural differences between 70S and 80S ribosomes enable selective targeting of bacteria. Tetracyclines block A-site tRNA binding; chloramphenicol inhibits peptidyl transferase; macrolides (erythromycin) block the exit tunnel; aminoglycosides (gentamicin) cause misreading by distorting the decoding center.
Diphtheria and Ricin
Diphtheria toxin ADP-ribosylates eEF2 (diphthamide residue), blocking translocation and killing the cell. A single toxin molecule can inactivate all eEF2 in a cell. Ricin (from castor beans) is an N-glycosidase that depurinates 28S rRNA, inactivating the ribosome. Both illustrate the vulnerability of translation.
mRNA Therapeutics
The COVID-19 mRNA vaccines (Pfizer/BioNTech, Moderna) exploit the translation machinery: synthetic mRNA encoding the spike protein is delivered to ribosomes, which produce the antigen in situ. Modified nucleosides (N1-methylpseudouridine) reduce innate immune recognition and enhance translation efficiency.
Proteasome Inhibitors
Bortezomib (Velcade) inhibits the 26S proteasome, causing accumulation of pro-apoptotic proteins. FDA-approved for multiple myeloma. Cancer cells are more sensitive due to higher protein turnover rates. Thalidomide/lenalidomide redirect E3 ubiquitin ligase cereblon to degrade key oncoproteins (IKZF1/3).
Historical Context
The central dogma (DNA $\rightarrow$ RNA $\rightarrow$ protein) was articulated by Francis Crick in 1958. Nirenberg's landmark experiment (1961) used poly-U mRNA in a cell-free system to show that UUU encodes phenylalanine โ the first codon assignment. By 1966, all 64 codons were deciphered. The ribosome crystal structures (2000) revealed that peptide bond formation is catalyzed by RNA, supporting the "RNA world" hypothesis โ that RNA preceded proteins as catalysts in early life.
The Signal Recognition Particle and Protein Targeting
Proteins destined for the secretory pathway, plasma membrane, or organelles contain signal sequences that direct them to the endoplasmic reticulum during translation. Gunter Blobel proposed the signal hypothesis (Nobel Prize 1999):
The signal recognition particle (SRP) is a ribonucleoprotein (6 proteins + 7SL RNA) that binds the hydrophobic signal peptide as it emerges from the ribosome exit tunnel, pausing translation until the ribosome docks at the ER membrane. The signal peptide is cleaved by signal peptidase in the ER lumen, and the protein is co-translationally folded by ER chaperones (BiP/GRP78, calnexin, calreticulin).
Protein Degradation: The N-End Rule
Alexander Varshavsky discovered the N-end rule (1986), which relates a protein's half-life to the identity of its N-terminal amino acid. In eukaryotes, destabilizing N-terminal residues (Arg, Lys, His, Phe, Trp, Tyr, Leu, Ile) are recognized by specific E3 ubiquitin ligases (N-recognins, UBR1/2), targeting the protein for proteasomal degradation:
The N-end rule has been expanded to include the Ac/N-end rule (N-terminally acetylated residues) and the Pro/N-end rule (proline-specific). These pathways regulate the half-lives of ~30% of cellular proteins and play critical roles in chromosome segregation, cardiovascular development, and neurodegeneration.
Selenocysteine: The 21st Amino Acid
Selenocysteine (Sec) is co-translationally incorporated at UGA codons (normally a stop codon) when a specific SECIS element (selenocysteine insertion sequence) is present in the 3' UTR of the mRNA. The process requires a specialized tRNA (tRNA$^{\text{Sec}}$), a dedicated elongation factor (eEFSec), and SECIS-binding protein 2 (SBP2). The human genome encodes 25 selenoproteins, including glutathione peroxidases, thioredoxin reductases, and iodothyronine deiodinases โ all involved in redox homeostasis and thyroid hormone metabolism.
Python Simulations
Genetic Code Degeneracy and Translation Energetics
PythonAnalyze codon degeneracy across amino acids and calculate the ATP cost of protein synthesis.
Click Run to execute the Python code
Code will be executed with Python 3 on the server
Ribosome Structure and Antibiotic Targets
PythonCompare prokaryotic and eukaryotic ribosome composition and map antibiotic mechanisms of action.
Click Run to execute the Python code
Code will be executed with Python 3 on the server
Protein Half-Life Distribution and PTM Landscape
PythonExplore the distribution of protein half-lives in human cells and the frequency of major post-translational modifications.
Click Run to execute the Python code
Code will be executed with Python 3 on the server
Key Takeaways
- Ribosomes are ribozymes: peptidyl transferase activity resides in rRNA (23S/28S), not protein.
- The genetic code is degenerate (64 codons for 20 AAs + 3 stops), with wobble base-pairing allowing fewer tRNAs than codons.
- Translation initiation differs: prokaryotes use Shine-Dalgarno/16S pairing; eukaryotes use cap-dependent scanning (Kozak sequence).
- Each amino acid costs ~4 ATP equivalents (2 for charging, 1 GTP for decoding, 1 GTP for translocation).
- Kinetic proofreading (Hopfield) achieves ~$10^{-4}$ error rate by sacrificing GTP for a second selection step.
- Post-translational modifications (phosphorylation, glycosylation, ubiquitination) vastly expand proteomic diversity beyond the genetic code.
- Aminoacyl-tRNA synthetases use a "double sieve" editing mechanism to achieve ~1/40,000 error rate for similar amino acids.
- Chaperones (Hsp70, Hsp60/GroEL, Hsp90) assist co-translational folding; the UPR (IRE1, PERK, ATF6) responds to ER protein misfolding stress.
- Polyribosomes and mRNA circularization (eIF4E-eIF4G-PABP) enhance translation efficiency through ribosome recycling.
- The integrated stress response (eIF2$\alpha$ phosphorylation by GCN2/PERK/HRI/PKR) globally reduces translation while selectively upregulating stress-response genes.
- mTORC1 controls cap-dependent translation via 4E-BP1 and S6K1; rapamycin inhibition preferentially reduces translation of oncogenic mRNAs.