15. Translation

Translation converts the nucleotide language of mRNA into the amino acid language of proteins. The ribosome — a massive ribonucleoprotein machine — reads codons at ~20 amino acids per second, building polypeptide chains with remarkable accuracy guided by the universal genetic code.

Ribosome Structure and Function

The ribosome is composed of two asymmetric subunits, each containing both rRNA and proteins. The landmark crystal structures by Venki Ramakrishnan, Thomas Steitz, and Ada Yonath (Nobel Prize 2009) revealed that the ribosome is fundamentally a ribozyme: the peptidyl transferase center is composed entirely of rRNA (23S in prokaryotes, 28S in eukaryotes), with no protein atoms within 18 angstroms of the active site.

Prokaryotic (70S)

30S small subunit: 16S rRNA + 21 proteins; decoding center (codon-anticodon matching)
50S large subunit: 23S rRNA + 5S rRNA + 34 proteins; peptidyl transferase center, exit tunnel

Eukaryotic (80S)

40S small subunit: 18S rRNA + 33 proteins; scanning mechanism for AUG recognition
60S large subunit: 28S + 5.8S + 5S rRNA + 47 proteins; larger exit tunnel with chaperone docking

Three tRNA Binding Sites

The ribosome has three tRNA binding sites that span both subunits:

A (aminoacyl) site: accepts incoming aminoacyl-tRNA; decoding occurs here
P (peptidyl) site: holds tRNA carrying the growing peptide chain
E (exit) site: deacylated tRNA exits here after translocation

Derivation 1: The Genetic Code — Degeneracy and Wobble

The genetic code was deciphered by Marshall Nirenberg, Har Gobind Khorana, and Robert Holley (Nobel Prize 1968). It is a triplet, non-overlapping, degenerate, nearly universal code mapping 64 codons to 20 amino acids and 3 stop signals.

Why Triplets?

The information-theoretic argument: with 4 nucleotide bases, the minimum codon length to encode 20 amino acids is:

$$4^1 = 4 < 20,\quad 4^2 = 16 < 20,\quad 4^3 = 64 \geq 20$$

Triplets are the minimum, but 64 codons for 20 amino acids (+ stops) means the code is degenerate: most amino acids are specified by more than one codon. This degeneracy is not random — it follows a pattern that minimizes the impact of mutations.

The Wobble Hypothesis

Francis Crick proposed the wobble hypothesis (1966) to explain how fewer tRNAs (~45 in mammals) can decode all 61 sense codons. Non-standard base pairing is allowed at the 3rd codon position (wobble position):

Anticodon G pairs with codon C or U (wobble)
Anticodon U pairs with codon A or G (wobble)
Anticodon I (inosine, from deamination of A) pairs with codon U, C, or A

The wobble rules explain why degeneracy is concentrated at the 3rd codon position. The first two positions follow strict Watson-Crick rules, so mutations there are more likely to change the encoded amino acid.

Error Minimization

The genetic code is organized to minimize the phenotypic impact of point mutations. Conservative amino acid substitutions (similar physicochemical properties) are overrepresented:

$$P(\text{conservative substitution}) = 0.68\;\text{(actual code)}\;\text{vs}\;0.34\;\text{(random code)}$$

This ~2-fold enrichment suggests the code was optimized by natural selection, though the degree of optimization is debated. Most single-nucleotide mutations either change the amino acid to a similar one or are silent (synonymous) — a built-in buffer against translational and replication errors.

Derivation 2: The Translation Mechanism

Initiation

Translation initiation differs fundamentally between prokaryotes and eukaryotes:

Prokaryotic Initiation

The Shine-Dalgarno sequence (AGGAGG, 5-10 nt upstream of AUG) base-pairs with the anti-Shine-Dalgarno sequence in 16S rRNA, positioning the start codon in the P site:

$$\text{mRNA: ...AGGAGG---AUG...} \quad\leftrightarrow\quad \text{16S rRNA: ...UCCUCC...}$$

IF1, IF2 (GTPase), IF3 assemble the 30S initiation complex with fMet-tRNA$_f^{\text{Met}}$.

Eukaryotic Initiation

The scanning model: the 43S pre-initiation complex (40S + eIF1, eIF1A, eIF3, eIF2-GTP-Met-tRNA$_i^{\text{Met}}$) binds the 5' cap via eIF4F complex, then scans 5' to 3' until it encounters the first AUG in a favorable Kozak context:

$$\text{Kozak sequence:}\;\text{gcc}(\text{A/G})\text{cc}\underline{\text{AUG}}\text{G}$$

The purine at -3 and G at +4 are most critical. eIF2-GTP hydrolysis commits to the chosen AUG.

Elongation: The Three-Step Cycle

Each amino acid addition involves three steps, consuming 2 GTP:

1. Codon Recognition (EF-Tu/eEF1A)

Aminoacyl-tRNA is delivered to the A site as a ternary complex with EF-Tu-GTP. Correct codon-anticodon matching triggers GTP hydrolysis and EF-Tu release (kinetic proofreading):

$$\text{EF-Tu-GTP-aa-tRNA} + \text{A site} \xrightarrow{\text{codon match}} \text{GTP hydrolysis} \rightarrow \text{EF-Tu-GDP released}$$

2. Peptide Bond Formation (Peptidyl Transferase)

The $\alpha$-amino group of the A-site aminoacyl-tRNA attacks the carbonyl carbon of the P-site peptidyl-tRNA. This is catalyzed by the 23S rRNA (ribozyme):

$$\text{Peptidyl-tRNA (P)} + \text{Aminoacyl-tRNA (A)} \xrightarrow{\text{23S rRNA}} \text{Peptidyl-tRNA (A)} + \text{tRNA (P)}$$

No energy input required — driven by the high free energy of the aminoacyl-tRNA ester bond.

3. Translocation (EF-G/eEF2)

EF-G-GTP binds and its GTP hydrolysis drives the ribosome to translocate one codon (3 nt) in the 5' to 3' direction:

$$\text{A} \rightarrow \text{P},\quad \text{P} \rightarrow \text{E},\quad \text{E} \rightarrow \text{exit}$$

Codon Usage Bias and Translation Efficiency

Although the genetic code is degenerate, synonymous codons are not used equally. Highly expressed genes preferentially use codons corresponding to the most abundant tRNAs, a phenomenon called codon usage bias.

The Codon Adaptation Index (CAI)

The CAI quantifies how well a gene's codon usage matches the optimal codons of the organism:

$$\text{CAI} = \left(\prod_{i=1}^{L} w_i\right)^{1/L}$$

where $w_i$ is the relative adaptiveness of the $i$-th codon and $L$ is the gene length in codons. CAI ranges from 0 (poor adaptation) to 1 (optimal). Highly expressed genes (ribosomal proteins, glycolytic enzymes) have CAI > 0.8 in E. coli.

Practical Applications

Codon optimization is critical for recombinant protein expression: when expressing a human protein in E. coli (or vice versa), the gene sequence is redesigned to use the host's preferred codons. This can increase protein yields 10-100 fold. The mRNA vaccines for COVID-19 were codon-optimized for human codon usage and used modified nucleosides (N1-methylpseudouridine) to enhance translation efficiency and reduce innate immune recognition.

Rare codons also have functional roles: they can slow translation at specific positions, allowing co-translational protein folding of individual domains before the next domain begins to emerge from the ribosome. Deliberately placed rare codons act as "translational speed bumps" that improve folding efficiency for complex multidomain proteins.

Derivation 3: Energetics of Translation

Translation is the most energetically expensive biosynthetic process in the cell — consuming up to ~75% of a cell's ATP in rapidly growing bacteria.

Cost per Amino Acid

For each amino acid incorporated, the cell expends:

$$\text{Aminoacyl-tRNA synthetase:}\;\text{AA} + \text{tRNA} + \text{ATP} \rightarrow \text{AA-tRNA} + \text{AMP} + \text{PP}_i\;(\equiv 2\;\text{ATP})$$

$$\text{EF-Tu:}\;1\;\text{GTP (codon recognition)}\;\equiv 1\;\text{ATP}$$

$$\text{EF-G:}\;1\;\text{GTP (translocation)}\;\equiv 1\;\text{ATP}$$

$$\text{Total:}\;4\;\text{ATP equivalents per amino acid}$$

For a typical 300-residue protein, the translation cost is ~1200 ATP. Adding initiation factors, termination, and quality control increases this to ~1300-1400 ATP per protein. For perspective, a single glucose molecule yields only ~32 ATP — so synthesizing one protein requires the oxidation of ~40 glucose molecules.

Translation Speed and Fidelity

In E. coli, the ribosome incorporates ~20 amino acids/sec. The error rate (~$10^{-4}$ per codon) is achieved through a kinetic proofreading mechanism (Hopfield, 1974):

$$\text{Error rate} = \frac{k_{\text{inc}}}{k_{\text{corr}}} \times \frac{k_{\text{inc}}^*}{k_{\text{corr}}^*} \approx (10^{-2}) \times (10^{-2}) = 10^{-4}$$

The first selection step is initial binding ($\sim 10^{-2}$ discrimination). GTP hydrolysis then provides an irreversible step that allows a second selection ($\sim 10^{-2}$), amplifying the fidelity at the cost of energy (the sacrificed GTP).

Aminoacyl-tRNA Synthetases: The Second Genetic Code

The aminoacyl-tRNA synthetases (aaRS) are responsible for correctly matching each amino acid to its cognate tRNA — a process Paul Schimmel called the "second genetic code" because errors here are propagated into the protein with no opportunity for correction.

The Aminoacylation Reaction

Each of the 20 aaRS catalyzes a two-step reaction:

$$\text{Step 1: AA} + \text{ATP} \xrightarrow{\text{aaRS}} \text{AA-AMP} + \text{PP}_i\quad(\text{aminoacyl adenylate})$$

$$\text{Step 2: AA-AMP} + \text{tRNA} \xrightarrow{\text{aaRS}} \text{AA-tRNA} + \text{AMP}$$

The PP$_i$ is immediately hydrolyzed by pyrophosphatase ($\text{PP}_i \rightarrow 2\;\text{P}_i$), making the overall reaction irreversible (equivalent to 2 ATP).

Two Classes of aaRS

Class I (10 aaRS)

Monomers or dimers; Rossmann fold; aminoacylate the 2'-OH of tRNA; approach tRNA from the minor groove side. Amino acids: Arg, Cys, Glu, Gln, Ile, Leu, Met, Trp, Tyr, Val.

Class II (10 aaRS)

Primarily dimers or tetramers; antiparallel $\beta$-sheet; aminoacylate the 3'-OH of tRNA; approach from the major groove side. Amino acids: Ala, Asn, Asp, Gly, His, Lys, Phe, Pro, Ser, Thr.

Editing (Proofreading) by aaRS

Some amino acids are structurally similar (e.g., isoleucine vs valine, threonine vs serine), and the initial discrimination by the active site is insufficient. Several aaRS have a separate editing domain that hydrolyzes mischarged AA-tRNA. Isoleucyl-tRNA synthetase (IleRS) is the classic example:

$$\text{Discrimination: }\frac{k_{\text{cat}}/K_m\;(\text{Ile})}{k_{\text{cat}}/K_m\;(\text{Val})} \approx 200\;(\text{synthetic site}) \times 200\;(\text{editing site}) = 40{,}000$$

The editing site uses a "double sieve" mechanism: the synthetic site excludes amino acids larger than isoleucine (steric exclusion), while the editing site excludes amino acids as large as isoleucine but hydrolyzes the smaller valine-AMP. This achieves $\sim 1/40{,}000$ error rate, well below the ~1/3,000 rate from the synthetic site alone.

Co-Translational Protein Folding and Chaperones

Newly synthesized polypeptides do not fold in a vacuum. The cellular environment is extremely crowded (~300-400 mg/mL protein), increasing the risk of misfolding and aggregation. A sophisticated chaperone system assists folding:

Trigger factor (bacteria) / NAC (eukaryotes): bind the ribosome exit tunnel and shield the nascent chain during co-translational folding
Hsp70/DnaK system: ATP-dependent chaperone that binds hydrophobic stretches on unfolded proteins, preventing aggregation and allowing iterative folding attempts
Hsp60/GroEL-GroES (chaperonin): a barrel-shaped cavity where proteins up to ~60 kDa fold in isolation from the crowded cytoplasm; uses ~130 ATP per folding cycle
Hsp90: specialized chaperone for signaling proteins (kinases, steroid receptors, transcription factors); target of the anticancer drug geldanamycin

The Unfolded Protein Response (UPR)

When misfolded proteins accumulate in the endoplasmic reticulum, the unfolded protein response (UPR) is activated through three ER transmembrane sensors:

$$\text{IRE1}\alpha \rightarrow \text{XBP1 splicing} \rightarrow \text{ER chaperone upregulation}$$

$$\text{PERK} \rightarrow \text{eIF2}\alpha\text{ phosphorylation} \rightarrow \text{global translation}\downarrow\text{ + ATF4}\uparrow$$

$$\text{ATF6} \rightarrow \text{S1P/S2P cleavage} \rightarrow \text{ER chaperone gene activation}$$

If the UPR fails to resolve ER stress, the cell activates apoptosis via CHOP (C/EBP homologous protein). Chronic ER stress and UPR activation are implicated in diabetes (beta-cell failure), neurodegeneration (Parkinson's, Alzheimer's), and liver disease. Protein misfolding diseases (amyloidoses) — including Alzheimer's ($\beta$-amyloid), Parkinson's ($\alpha$-synuclein), and prion diseases (PrP$^{\text{Sc}}$) — result from the failure of quality control to prevent toxic aggregation.

Derivation 4: Termination and Quality Control

Termination

When a stop codon (UAA, UAG, or UGA) enters the A site, no aminoacyl-tRNA matches. Instead, release factors recognize the stop codon and catalyze peptide release:

Prokaryotes: RF1 (recognizes UAA/UAG), RF2 (recognizes UAA/UGA), RF3 (GTPase for RF recycling)
Eukaryotes: eRF1 (recognizes all three stop codons), eRF3 (GTPase)

$$\text{RF (GGQ motif)} \xrightarrow{\text{positions H}_2\text{O}} \text{Hydrolysis of peptidyl-tRNA ester bond} \rightarrow \text{Free polypeptide}$$

Ribosome-Associated Quality Control

Three quality control pathways handle aberrant mRNAs:

Nonsense-Mediated Decay (NMD)

Detects premature stop codons (>50 nt upstream of exon-exon junction). UPF1/UPF2/UPF3 recruit SMG1 kinase, triggering mRNA degradation. Prevents production of truncated, potentially toxic proteins.

Non-Stop Decay (NSD)

When mRNA lacks a stop codon (e.g., broken mRNA), the ribosome reaches the 3' end and stalls. Ski7/Dom34-Hbs1 complex targets the mRNA for exosome degradation and rescues the stalled ribosome.

No-Go Decay (NGD)

Ribosomes stalled by mRNA secondary structures, rare codons, or damaged mRNA. Dom34/Hbs1 split the ribosome, and the truncated peptide is targeted for proteasomal degradation via the RQC (ribosome quality control) complex.

Polyribosomes and Translational Control

Multiple ribosomes can simultaneously translate a single mRNA, forming polyribosomes (polysomes). The spacing between ribosomes is ~80 nucleotides (~27 codons), meaning a typical 1 kb mRNA can accommodate ~12 ribosomes simultaneously, dramatically increasing the protein production rate.

mRNA Circularization

Eukaryotic mRNAs adopt a closed-loop conformation during active translation, mediated by the interaction between the 5' cap-binding complex (eIF4E-eIF4G) and the 3' poly(A)-binding protein (PABP):

$$\text{5' m}^7\text{G-eIF4E-eIF4G} \longleftrightarrow \text{PABP-AAAA...3'}\quad(\text{closed loop})$$

This circularization enhances translation efficiency by allowing ribosomes that have just terminated to be "recycled" directly to the 5' cap for re-initiation, without dissociating into the cytoplasm. It also ensures that only intact mRNAs (with both a cap and poly(A) tail) are efficiently translated — providing quality control against truncated or decapped mRNAs.

Global Translational Control

Under stress conditions, cells globally reduce translation to conserve energy. The key mechanism involves phosphorylation of eIF2$\alpha$ at Ser-51 by four stress-responsive kinases:

GCN2: activated by uncharged tRNAs (amino acid starvation)
PERK: activated by ER stress (unfolded protein response)
HRI: activated by heme deficiency (erythroid cells)
PKR: activated by double-stranded RNA (viral infection)

$$\text{eIF2}\alpha\text{-P} \rightarrow \text{sequesters eIF2B (GEF)} \rightarrow \text{[eIF2-GTP]}\downarrow \rightarrow \text{global translation}\downarrow$$

Paradoxically, eIF2$\alpha$ phosphorylation increases translation of specific mRNAs with upstream open reading frames (uORFs) in their 5' UTR, including ATF4 (stress-responsive transcription factor) and CHOP (pro-apoptotic). This integrated stress response (ISR) allows the cell to reprogram gene expression while reducing overall protein synthesis — a sophisticated survival strategy.

mTORC1 and Cap-Dependent Translation

The mTORC1 kinase promotes cap-dependent translation by phosphorylating two key targets:

$$\text{mTORC1} \xrightarrow{\text{phosphorylates}} \text{4E-BP1 (releases eIF4E)} + \text{S6K1 (activates eIF4B, rpS6)}$$

In the unphosphorylated state, 4E-BP1 sequesters eIF4E, blocking cap recognition. mTORC1-mediated phosphorylation releases eIF4E to join the eIF4F complex, enabling cap-dependent translation. Rapamycin (sirolimus) inhibits mTORC1, reducing translation of mRNAs with complex 5' UTRs (often encoding growth factors and oncoproteins) — hence its use as an immunosuppressant and anticancer agent.

Derivation 5: Post-Translational Modifications

The proteome's diversity vastly exceeds the genome's coding capacity because proteins are extensively modified after translation. Over 400 types of post-translational modifications (PTMs) have been identified, expanding the functional repertoire from 20 amino acids to effectively thousands of distinct chemical states.

Phosphorylation

The most common regulatory PTM, catalyzed by >500 protein kinases in the human genome (the "kinome"):

$$\text{Protein-OH (Ser/Thr/Tyr)} + \text{ATP} \xrightarrow{\text{Kinase}} \text{Protein-OPO}_3^{2-} + \text{ADP}$$

Phosphorylation adds 2 negative charges at physiological pH, causing conformational changes and creating docking sites for phospho-binding domains (SH2, PTB, 14-3-3). ~30% of human proteins are phosphorylated at any given time. Phosphatases reverse the modification, creating a dynamic on/off switch.

Glycosylation

Two major types: N-linked (to Asn in the sequon Asn-X-Ser/Thr; begins in ER with dolichol-linked oligosaccharide) and O-linked (to Ser/Thr; occurs in Golgi). Glycosylation affects protein folding, stability, and cell-cell recognition. About 50% of human proteins are glycosylated. Congenital disorders of glycosylation (CDGs) cause multisystem disease.

Ubiquitination

The attachment of ubiquitin (76 amino acids) to lysine residues marks proteins for proteasomal degradation. The ubiquitin-proteasome system (Aaron Ciechanover, Avram Hershko, Irwin Rose — Nobel Prize 2004):

$$\text{E1 (activation)} \xrightarrow{\text{ATP}} \text{E2 (conjugation)} \xrightarrow{\text{E3 (ligase)}} \text{Ub-Protein} \xrightarrow{\text{26S proteasome}} \text{Peptides}$$

Polyubiquitin chains linked via Lys-48 of ubiquitin signal proteasomal degradation. Other chain types (Lys-63, linear) serve non-degradative functions (NF-$\kappa$B signaling, DNA repair). The human genome encodes >600 E3 ubiquitin ligases, providing remarkable substrate specificity. The N-end rule pathway links protein half-life to the identity of the N-terminal amino acid.

Protein Sorting: Directing Proteins to Their Destinations

Newly synthesized proteins must be directed to the correct subcellular compartment. In eukaryotic cells, proteins can be targeted to the ER/secretory pathway, mitochondria, nucleus, peroxisomes, or remain in the cytoplasm. Targeting depends on signal sequences — short peptide motifs recognized by specific receptors:

ER signal peptide (~16-30 aa, hydrophobic core): recognized by SRP during translation; co-translational translocation via Sec61 translocon
Mitochondrial targeting sequence (N-terminal, amphipathic helix): recognized by TOM/TIM complexes on mitochondrial membranes; post-translational import driven by membrane potential and mtHsp70
Nuclear localization signal (NLS) (Lys/Arg-rich, e.g., KKKRKV): recognized by importin $\alpha/\beta$; import through nuclear pore complex (NPC) powered by Ran-GTP gradient
Peroxisomal targeting signal (PTS1) (C-terminal SKL tripeptide): recognized by Pex5 receptor; post-translational import

ER-Associated Degradation (ERAD)

Misfolded proteins in the ER are retrotranslocated to the cytoplasm and degraded by the proteasome — a process called ERAD. The ER quality control system uses a lectin-based mechanism: the N-linked glycan on newly synthesized proteins is sequentially trimmed by glucosidases and mannosidases. If the protein fails to fold properly within the allotted time, mannose trimming marks it for ERAD:

$$\text{Misfolded protein} \xrightarrow{\text{mannose trimming}} \text{OS-9/XTP3-B recognition} \xrightarrow{\text{Hrd1 ubiquitin ligase}} \text{Retrotranslocation} \xrightarrow{\text{p97/VCP}} \text{Proteasome}$$

ERAD is clinically relevant: the most common cystic fibrosis mutation ($\Delta$F508-CFTR) produces a protein that folds slowly but is functional. ERAD prematurely degrades it before it reaches the plasma membrane. Corrector drugs (lumacaftor, elexacaftor in Trikafta) stabilize the mutant CFTR, allowing it to escape ERAD and reach the cell surface — a triumph of molecular understanding informing therapy.

Applications in Medicine and Research

Ribosome-Targeting Antibiotics

Structural differences between 70S and 80S ribosomes enable selective targeting of bacteria. Tetracyclines block A-site tRNA binding; chloramphenicol inhibits peptidyl transferase; macrolides (erythromycin) block the exit tunnel; aminoglycosides (gentamicin) cause misreading by distorting the decoding center.

Diphtheria and Ricin

Diphtheria toxin ADP-ribosylates eEF2 (diphthamide residue), blocking translocation and killing the cell. A single toxin molecule can inactivate all eEF2 in a cell. Ricin (from castor beans) is an N-glycosidase that depurinates 28S rRNA, inactivating the ribosome. Both illustrate the vulnerability of translation.

mRNA Therapeutics

The COVID-19 mRNA vaccines (Pfizer/BioNTech, Moderna) exploit the translation machinery: synthetic mRNA encoding the spike protein is delivered to ribosomes, which produce the antigen in situ. Modified nucleosides (N1-methylpseudouridine) reduce innate immune recognition and enhance translation efficiency.

Proteasome Inhibitors

Bortezomib (Velcade) inhibits the 26S proteasome, causing accumulation of pro-apoptotic proteins. FDA-approved for multiple myeloma. Cancer cells are more sensitive due to higher protein turnover rates. Thalidomide/lenalidomide redirect E3 ubiquitin ligase cereblon to degrade key oncoproteins (IKZF1/3).

Historical Context

The central dogma (DNA $\rightarrow$ RNA $\rightarrow$ protein) was articulated by Francis Crick in 1958. Nirenberg's landmark experiment (1961) used poly-U mRNA in a cell-free system to show that UUU encodes phenylalanine — the first codon assignment. By 1966, all 64 codons were deciphered. The ribosome crystal structures (2000) revealed that peptide bond formation is catalyzed by RNA, supporting the "RNA world" hypothesis — that RNA preceded proteins as catalysts in early life.

The Signal Recognition Particle and Protein Targeting

Proteins destined for the secretory pathway, plasma membrane, or organelles contain signal sequences that direct them to the endoplasmic reticulum during translation. Gunter Blobel proposed the signal hypothesis (Nobel Prize 1999):

$$\text{Signal peptide (N-term)} \xrightarrow{\text{recognized by SRP}} \text{ribosome docking at ER (SRP receptor)} \xrightarrow{\text{Sec61 translocon}} \text{co-translational insertion}$$

The signal recognition particle (SRP) is a ribonucleoprotein (6 proteins + 7SL RNA) that binds the hydrophobic signal peptide as it emerges from the ribosome exit tunnel, pausing translation until the ribosome docks at the ER membrane. The signal peptide is cleaved by signal peptidase in the ER lumen, and the protein is co-translationally folded by ER chaperones (BiP/GRP78, calnexin, calreticulin).

Protein Degradation: The N-End Rule

Alexander Varshavsky discovered the N-end rule (1986), which relates a protein's half-life to the identity of its N-terminal amino acid. In eukaryotes, destabilizing N-terminal residues (Arg, Lys, His, Phe, Trp, Tyr, Leu, Ile) are recognized by specific E3 ubiquitin ligases (N-recognins, UBR1/2), targeting the protein for proteasomal degradation:

$$t_{1/2} = \begin{cases} >20\;\text{hours} & \text{Met, Ser, Ala, Thr, Val, Gly, Pro (stabilizing)} \\ 2\text{-}30\;\text{min} & \text{Arg, Lys, Phe, Leu, Trp, Tyr, Ile, His (destabilizing)} \end{cases}$$

The N-end rule has been expanded to include the Ac/N-end rule (N-terminally acetylated residues) and the Pro/N-end rule (proline-specific). These pathways regulate the half-lives of ~30% of cellular proteins and play critical roles in chromosome segregation, cardiovascular development, and neurodegeneration.

Selenocysteine: The 21st Amino Acid

Selenocysteine (Sec) is co-translationally incorporated at UGA codons (normally a stop codon) when a specific SECIS element (selenocysteine insertion sequence) is present in the 3' UTR of the mRNA. The process requires a specialized tRNA (tRNA$^{\text{Sec}}$), a dedicated elongation factor (eEFSec), and SECIS-binding protein 2 (SBP2). The human genome encodes 25 selenoproteins, including glutathione peroxidases, thioredoxin reductases, and iodothyronine deiodinases — all involved in redox homeostasis and thyroid hormone metabolism.

Python Simulations

Genetic Code Degeneracy and Translation Energetics

Python

Analyze codon degeneracy across amino acids and calculate the ATP cost of protein synthesis.

script.py72 lines

import numpy as np
import matplotlib.pyplot as plt

# Genetic code analysis: degeneracy and codon usage
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Panel 1: Degeneracy of the genetic code
ax1 = axes[0]
codon_counts = {
    'Leu': 6, 'Ser': 6, 'Arg': 6,
    'Val': 4, 'Pro': 4, 'Thr': 4, 'Ala': 4, 'Gly': 4,
    'Ile': 3,
    'Phe': 2, 'Tyr': 2, 'His': 2, 'Gln': 2, 'Asn': 2, 'Lys': 2,
    'Asp': 2, 'Glu': 2, 'Cys': 2,
    'Met': 1, 'Trp': 1,
    'Stop': 3,
}

# Sort by codon count
sorted_aa = sorted(codon_counts.items(), key=lambda x: (-x[1], x[0]))
aa_names = [a[0] for a in sorted_aa]
counts = [a[1] for a in sorted_aa]
color_map = {6: '#34d399', 4: '#6ee7b7', 3: '#fbbf24', 2: '#38bdf8', 1: '#f87171'}
colors = [color_map.get(c, '#a78bfa') for c in counts]

bars = ax1.barh(range(len(aa_names)), counts, color=colors, alpha=0.85,
                edgecolor='white', linewidth=0.3, height=0.7)
ax1.set_yticks(range(len(aa_names)))
ax1.set_yticklabels(aa_names, fontsize=8, color='white')
ax1.set_xlabel('Number of Codons', fontsize=12, color='white')
ax1.set_title('Genetic Code Degeneracy', fontsize=14, color='white', fontweight='bold')
ax1.invert_yaxis()
ax1.set_facecolor('#0a0a1a')
ax1.tick_params(colors='white')
ax1.grid(True, alpha=0.2, color='#34d399', axis='x')
for spine in ax1.spines.values():
    spine.set_color('#34d399')

# Panel 2: Translation energetics - cost per amino acid
ax2 = axes[1]
steps = ['tRNA Charging\n(2 ATP equiv)', 'Initiation\n(1 GTP)', 'Codon Recog.\n(1 GTP per AA)',
         'Translocation\n(1 GTP per AA)', 'Total per AA']
atp_cost = [2, 0.03, 1, 1, 4.03]  # GTP ~ ATP equivalent; initiation amortized
bar_colors = ['#34d399', '#fbbf24', '#38bdf8', '#a78bfa', '#f87171']

bars2 = ax2.bar(range(len(steps)), atp_cost, color=bar_colors, alpha=0.85,
                edgecolor='white', linewidth=0.5, width=0.6)
ax2.set_xticks(range(len(steps)))
ax2.set_xticklabels(steps, fontsize=9, color='white')
ax2.set_ylabel('ATP/GTP Equivalents', fontsize=12, color='white')
ax2.set_title('Energy Cost of Translation', fontsize=14, color='white', fontweight='bold')
ax2.set_facecolor('#0a0a1a')
ax2.tick_params(colors='white')
ax2.grid(True, alpha=0.2, color='#34d399', axis='y')
for spine in ax2.spines.values():
    spine.set_color('#34d399')

for bar, val in zip(bars2, atp_cost):
    ax2.text(bar.get_x() + bar.get_width()/2, val + 0.1,
             f'{val:.1f}' if val != int(val) else f'{int(val)}',
             ha='center', va='bottom', fontsize=11, color='white', fontweight='bold')

fig.patch.set_facecolor('#0a0a1a')
plt.tight_layout()
plt.savefig('output.png', dpi=150, bbox_inches='tight', facecolor='#0a0a1a')
plt.show()
print("64 codons: 61 sense + 3 stop (UAA, UAG, UGA)")
print("Most degenerate: Leu, Ser, Arg (6 codons each)")
print("Least degenerate: Met (AUG) and Trp (UGG) - 1 codon each")
print("Translation cost: ~4 ATP equivalents per amino acid added")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Ribosome Structure and Antibiotic Targets

Python

Compare prokaryotic and eukaryotic ribosome composition and map antibiotic mechanisms of action.

script.py73 lines

import numpy as np
import matplotlib.pyplot as plt

# Ribosome structure and antibiotic targets
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Panel 1: Ribosome composition comparison (prokaryotic vs eukaryotic)
ax1 = axes[0]
categories = ['Total\nRibosome', 'Small\nSubunit', 'Large\nSubunit', 'rRNA\nComponents', 'Protein\nCount']
prokaryote = [70, 30, 50, 3, 55]
eukaryote = [80, 40, 60, 4, 80]

x = np.arange(len(categories))
width = 0.3
bars1 = ax1.bar(x - width/2, prokaryote, width, label='Prokaryotic (S values)', color='#34d399', alpha=0.85)
bars2 = ax1.bar(x + width/2, eukaryote, width, label='Eukaryotic (S values)', color='#fbbf24', alpha=0.85)

ax1.set_xticks(x)
ax1.set_xticklabels(categories, fontsize=9, color='white')
ax1.set_ylabel('Svedberg (S) units / Count', fontsize=12, color='white')
ax1.set_title('Prokaryotic vs Eukaryotic Ribosomes', fontsize=14, color='white', fontweight='bold')
ax1.legend(fontsize=10, facecolor='#1a1a2e', edgecolor='#34d399', labelcolor='white')
ax1.set_facecolor('#0a0a1a')
ax1.tick_params(colors='white')
ax1.grid(True, alpha=0.2, color='#34d399', axis='y')
for spine in ax1.spines.values():
    spine.set_color('#34d399')

# Add labels
prok_labels = ['70S', '30S', '50S', '3 rRNAs', '~55']
euk_labels = ['80S', '40S', '60S', '4 rRNAs', '~80']
for i in range(len(categories)):
    ax1.text(x[i] - width/2, prokaryote[i] + 1, prok_labels[i],
             ha='center', fontsize=8, color='#34d399', fontweight='bold')
    ax1.text(x[i] + width/2, eukaryote[i] + 1, euk_labels[i],
             ha='center', fontsize=8, color='#fbbf24', fontweight='bold')

# Panel 2: Antibiotic targets on the ribosome
ax2 = axes[1]
antibiotics = ['Tetracycline', 'Streptomycin', 'Chloramphenicol', 'Erythromycin',
               'Linezolid', 'Puromycin', 'Fusidic acid', 'Diphtheria\ntoxin']
targets = ['30S: blocks\ntRNA binding', '30S: misreading\nof mRNA', '50S: blocks\npeptidyl transferase',
           '50S: blocks\ntranslocation', '50S: blocks\nA-site', '50S: premature\nchain release',
           'EF-G: blocks\ntranslocation', 'EF-2: blocks\ntranslocation']
selectivity = [85, 80, 70, 88, 92, 30, 75, 95]  # selectivity for prokaryotic over eukaryotic

bars3 = ax2.barh(range(len(antibiotics)), selectivity,
                 color=['#34d399' if s > 60 else '#f87171' for s in selectivity],
                 alpha=0.85, edgecolor='white', linewidth=0.5, height=0.6)
ax2.set_yticks(range(len(antibiotics)))
ax2.set_yticklabels(antibiotics, fontsize=9, color='white')
ax2.set_xlabel('Selectivity Index (prokaryotic/eukaryotic)', fontsize=11, color='white')
ax2.set_title('Antibiotic Targets on the Ribosome', fontsize=14, color='white', fontweight='bold')
ax2.set_facecolor('#0a0a1a')
ax2.tick_params(colors='white')
ax2.grid(True, alpha=0.2, color='#34d399', axis='x')
for spine in ax2.spines.values():
    spine.set_color('#34d399')

for i, (bar, target) in enumerate(zip(bars3, targets)):
    ax2.text(bar.get_width() + 1, i, target, va='center',
             fontsize=7, color='#6ee7b7')

fig.patch.set_facecolor('#0a0a1a')
plt.tight_layout()
plt.savefig('output.png', dpi=150, bbox_inches='tight', facecolor='#0a0a1a')
plt.show()
print("Prokaryotic ribosome: 70S (30S + 50S), 3 rRNAs (16S, 23S, 5S), ~55 proteins")
print("Eukaryotic ribosome: 80S (40S + 60S), 4 rRNAs (18S, 28S, 5.8S, 5S), ~80 proteins")
print("Peptidyl transferase is a RIBOZYME - the 23S/28S rRNA catalyzes peptide bond formation")
print("Ribosomal differences are exploited by antibiotics for selective toxicity")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Protein Half-Life Distribution and PTM Landscape

Python

Explore the distribution of protein half-lives in human cells and the frequency of major post-translational modifications.

script.py64 lines

import numpy as np
import matplotlib.pyplot as plt

# Protein folding and quality control
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Panel 1: Protein half-life distribution in human cells
ax1 = axes[0]
half_life_bins = ['<1 min', '1-30 min', '30 min-\n2 hours', '2-8\nhours', '8-24\nhours',
                  '1-3\ndays', '3-7\ndays', '>7\ndays']
pct_proteins = [2, 8, 15, 25, 20, 15, 10, 5]
colors_hl = ['#f87171', '#f87171', '#fbbf24', '#fbbf24', '#34d399', '#34d399', '#6ee7b7', '#6ee7b7']

bars = ax1.bar(range(len(half_life_bins)), pct_proteins, color=colors_hl, alpha=0.85,
               edgecolor='white', linewidth=0.5, width=0.7)
ax1.set_xticks(range(len(half_life_bins)))
ax1.set_xticklabels(half_life_bins, fontsize=8, color='white')
ax1.set_ylabel('% of Proteome', fontsize=12, color='white')
ax1.set_title('Protein Half-Life Distribution', fontsize=14, color='white', fontweight='bold')
ax1.set_facecolor('#0a0a1a')
ax1.tick_params(colors='white')
ax1.grid(True, alpha=0.2, color='#34d399', axis='y')
for spine in ax1.spines.values():
    spine.set_color('#34d399')

# Examples
ax1.annotate('Cyclins, p53\n(rapid turnover)', xy=(1, 10), fontsize=7, color='#f87171',
            ha='center', fontstyle='italic')
ax1.annotate('Hemoglobin\n(RBC lifespan)', xy=(7, 7), fontsize=7, color='#6ee7b7',
            ha='center', fontstyle='italic')

# Panel 2: Post-translational modification frequency
ax2 = axes[1]
ptm_types = ['Phosphorylation', 'Ubiquitination', 'Acetylation', 'Glycosylation',
             'Methylation', 'SUMOylation', 'Lipidation', 'Disulfide bonds']
num_sites = [230000, 120000, 50000, 35000, 25000, 15000, 10000, 20000]

bars2 = ax2.barh(range(len(ptm_types)), [np.log10(n) for n in num_sites],
                 color=['#34d399', '#fbbf24', '#f87171', '#38bdf8', '#a78bfa', '#6ee7b7', '#f59e0b', '#ec4899'],
                 alpha=0.85, edgecolor='white', linewidth=0.5, height=0.6)
ax2.set_yticks(range(len(ptm_types)))
ax2.set_yticklabels(ptm_types, fontsize=10, color='white')
ax2.set_xlabel('Log10(Known Sites in Human Proteome)', fontsize=12, color='white')
ax2.set_title('Post-Translational Modifications', fontsize=14, color='white', fontweight='bold')
ax2.set_facecolor('#0a0a1a')
ax2.tick_params(colors='white')
ax2.grid(True, alpha=0.2, color='#34d399', axis='x')
for spine in ax2.spines.values():
    spine.set_color('#34d399')

for bar, n in zip(bars2, num_sites):
    ax2.text(bar.get_width() + 0.05, bar.get_y() + bar.get_height()/2,
             f'{n:,}', va='center', fontsize=8, color='white')

fig.patch.set_facecolor('#0a0a1a')
plt.tight_layout()
plt.savefig('output.png', dpi=150, bbox_inches='tight', facecolor='#0a0a1a')
plt.show()
print("Most proteins have half-lives of 2-24 hours; some (histones, hemoglobin) last days-months")
print("Phosphorylation: >230,000 known sites on ~13,000 human proteins")
print("The kinome (>500 kinases) phosphorylates ~30% of all proteins at any time")
print("Ubiquitin-proteasome system degrades ~80% of intracellular proteins")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Key Takeaways

Ribosomes are ribozymes: peptidyl transferase activity resides in rRNA (23S/28S), not protein.
The genetic code is degenerate (64 codons for 20 AAs + 3 stops), with wobble base-pairing allowing fewer tRNAs than codons.
Translation initiation differs: prokaryotes use Shine-Dalgarno/16S pairing; eukaryotes use cap-dependent scanning (Kozak sequence).
Each amino acid costs ~4 ATP equivalents (2 for charging, 1 GTP for decoding, 1 GTP for translocation).
Kinetic proofreading (Hopfield) achieves ~$10^{-4}$ error rate by sacrificing GTP for a second selection step.
Post-translational modifications (phosphorylation, glycosylation, ubiquitination) vastly expand proteomic diversity beyond the genetic code.
Aminoacyl-tRNA synthetases use a "double sieve" editing mechanism to achieve ~1/40,000 error rate for similar amino acids.
Chaperones (Hsp70, Hsp60/GroEL, Hsp90) assist co-translational folding; the UPR (IRE1, PERK, ATF6) responds to ER protein misfolding stress.
Polyribosomes and mRNA circularization (eIF4E-eIF4G-PABP) enhance translation efficiency through ribosome recycling.
The integrated stress response (eIF2$\alpha$ phosphorylation by GCN2/PERK/HRI/PKR) globally reduces translation while selectively upregulating stress-response genes.
mTORC1 controls cap-dependent translation via 4E-BP1 and S6K1; rapamycin inhibition preferentially reduces translation of oncogenic mRNAs.

Share:X Reddit LinkedIn

← Transcription Gene Regulation →