Part 6: Translation and Protein Synthesis

RNA to Protein: The Central Dogma's Final Step

Translation is the process by which the nucleotide sequence of messenger RNA (mRNA) is decoded to produce a specific polypeptide chain. This remarkable molecular machine operates at the ribosome, reading mRNA in the 5' to 3' direction while synthesizing protein from the N-terminus to the C-terminus. The process requires transfer RNAs (tRNAs) as adaptor molecules, aminoacyl-tRNA synthetases for charging tRNAs, and numerous protein factors that orchestrate initiation, elongation, and termination.

Translation consumes approximately 4 high-energy phosphate bonds per amino acid incorporated (2 for aminoacyl-tRNA synthesis, 1 for EF-Tu GTP hydrolysis during A-site delivery, 1 for EF-G GTP hydrolysis during translocation), making it one of the most energy-intensive processes in the cell. In rapidly growing E. coli, up to 80% of cellular energy is devoted to translation.

Ninja Nerd · Cell Biology

Translation: Protein Synthesis

The ribosome from initiation to termination — aminoacyl-tRNA charging, codon–anticodon pairing, and the energetics of peptide-bond formation.

1. The Genetic Code

Properties of the Genetic Code

Fundamental Features

Triplet: Each codon consists of 3 consecutive nucleotides, providing 4³ = 64 possible codons
Degenerate (redundant): 64 codons encode only 20 amino acids + 3 stop signals. Most amino acids are specified by 2-6 synonymous codons
Non-overlapping: Codons are read sequentially in a single reading frame without sharing nucleotides
Comma-free: No punctuation between codons; the reading frame is set by the start codon
Unambiguous: Each codon specifies exactly one amino acid (or stop)
Nearly universal: Same code in virtually all organisms, with minor exceptions (mitochondria, Mycoplasma, some ciliates)

Degeneracy Pattern

1 codon: Met (AUG), Trp (UGG)

2 codons: Phe, Tyr, His, Gln, Asn, Lys, Asp, Glu, Cys

3 codons: Ile

4 codons: Val, Pro, Thr, Ala, Gly

6 codons: Leu, Ser, Arg

Most degeneracy occurs at the 3rd (wobble) position of the codon, where changes often do not alter the encoded amino acid.

Start and Stop Codons

Start Codon: AUG

Encodes methionine (Met) in eukaryotes
Encodes N-formylmethionine (fMet) in prokaryotes
Sets the reading frame for the entire ORF
Internal AUGs encode regular Met residues
Rare alternative starts: GUG (~8% in E. coli), UUG (~1%)
fMet is later removed by methionine aminopeptidase (MAP) in most proteins

Stop Codons (Nonsense Codons)

UAA (Ochre): Most common in E. coli; recognized by RF1
UAG (Amber): Recognized by RF1; site for amber suppressor tRNAs
UGA (Opal/Umber): Recognized by RF2; can encode selenocysteine (21st amino acid) via SECIS element
Suppressor tRNAs carry anticodons that read stop codons, inserting an amino acid instead of terminating
UGA is also recoded for pyrrolysine (22nd amino acid) in some methanogens

Wobble Hypothesis (Crick, 1966)

Francis Crick proposed that the first two positions of the codon form standard Watson-Crick base pairs with the anticodon, but the third codon position (5' end of the anticodon) allows non-standard "wobble" pairing. This explains how fewer than 61 tRNAs can decode all 61 sense codons.

Wobble Base Pairing Rules

5' Anticodon Base	3' Codon Base(s) Recognized	Notes
G	U or C	G-U wobble pair is thermodynamically stable
C	G only	Standard Watson-Crick only
A	U only	Standard Watson-Crick only
U	A or G	U-G wobble pair
I (Inosine)	U, C, or A	Inosine can pair with 3 bases; found in many tRNAs

Inosine (I) is formed by deamination of adenosine and is the most versatile wobble base. A single tRNA with I at the wobble position can decode three different codons, significantly reducing the number of tRNA species required. The minimum number of tRNAs needed to decode all 61 sense codons is 31.

Codon Usage Bias

Although synonymous codons encode the same amino acid, organisms show strong preferences for certain codons over others. This codon usage bias correlates with the abundance of cognate tRNA species and affects translation speed and accuracy.

Biological Significance

Highly expressed genes use "optimal" codons matched to abundant tRNAs
Rare codons cause ribosome pausing/stalling
Codon bias affects co-translational folding
Selection maintains codon bias in large populations
Important for heterologous gene expression (codon optimization)

Near-Cognate Misreading

tRNAs with near-cognate anticodons can occasionally misread codons
Error rate: ~10^-3 to 10^-4 per codon
Ribosome uses kinetic proofreading to reduce errors
Two-step selection: initial selection + proofreading after GTP hydrolysis
Aminoglycoside antibiotics increase misreading by distorting the A site

Codon Adaptation Index (CAI)

The CAI quantifies how well a gene's codon usage matches the optimal codons of the organism:

$$\text{CAI} = \left( \prod_{i=1}^{L} w_{c_i} \right)^{1/L} = \exp\left( \frac{1}{L} \sum_{i=1}^{L} \ln w_{c_i} \right)$$

where L is the number of codons, w_{c_i} is the relative adaptiveness of codon c_i, defined as the ratio of the observed frequency of that codon to the frequency of the most common synonymous codon for the same amino acid. CAI ranges from 0 to 1, with 1 indicating maximal codon optimization.

$$w_i = \frac{f_i}{\max_j(f_j)} \quad \text{where } j \text{ ranges over synonymous codons}$$

$$\text{RSCU}_i = \frac{\text{observed count of codon } i}{(1/n) \times \sum_{j=1}^{n} \text{count}_j} = \frac{X_i \cdot n}{\sum_{j=1}^{n} X_j}$$

RSCU (Relative Synonymous Codon Usage) equals 1.0 for codons with no bias. Values above 1.0 indicate preferred codons; values below 1.0 indicate avoided codons.

Derivation: Codon Usage Bias and the tRNA Adaptation Index (tAI)

Starting from the observation that tRNA gene copy number correlates with tRNA abundance, we derive the tRNA Adaptation Index as a measure of translational efficiency.

Step 1: tRNA gene copy number as a proxy for abundance

In rapidly growing bacteria and yeast, tRNA abundance is approximately proportional to tRNA gene copy number. For each codon $c$, the available pool of cognate tRNAs includes exact-match (Watson-Crick) and wobble-pairing species. Define the absolute adaptiveness:

$$W_c = \sum_{j} (1 - s_{cj}) \cdot \text{tGCN}_j$$

where $\text{tGCN}_j$ is the gene copy number of tRNA isoacceptor $j$, and $s_{cj}$ is a penalty for wobble pairing (0 for Watson-Crick, 0.5 for G-U wobble, etc.).

Step 2: Normalize to get relative adaptiveness

For each amino acid, normalize by the maximum $W$ value among its synonymous codons:

$$w_c = \frac{W_c}{\max_{c' \in \text{syn}} W_{c'}}$$

This ensures $0 < w_c \leq 1$ for each codon. Codons with $w_c = 1$ are optimally adapted; lower values indicate slower decoding.

Step 3: Define the tAI for a gene

The tAI is the geometric mean of $w_c$ values over all $L$ codons in the coding sequence (analogous to CAI):

$$\text{tAI} = \left(\prod_{i=1}^{L} w_{c_i}\right)^{1/L} = \exp\left(\frac{1}{L}\sum_{i=1}^{L} \ln w_{c_i}\right)$$

Step 4: Relationship between tAI and translation speed

Since the elongation rate at each codon follows Michaelis-Menten kinetics with cognate tRNA as substrate, and tRNA abundance $\propto$ tGCN, the per-codon rate is approximately:

$$k_i \propto \frac{[\text{tRNA}]_{\text{cognate}}}{K_M + [\text{tRNA}]_{\text{cognate}}} \propto w_{c_i} \quad \text{(when not saturated)}$$

Step 5: tAI predicts protein abundance

The overall translation rate for a gene scales with the harmonic mean of per-codon rates (since slow codons bottleneck the ribosome). The tAI (geometric mean) approximates this and correlates strongly ($r \approx 0.6\text{--}0.7$) with measured protein abundance in E. coli and yeast.

Step 6: Comparison: tAI vs. CAI

CAI uses observed codon frequencies in highly expressed genes as a reference (empirical). tAI uses tRNA gene copy numbers and wobble rules (mechanistic). Both predict expression level, but tAI has a clearer biophysical basis. For E. coli: ribosomal protein genes have CAI $\approx 0.8\text{--}0.9$ and tAI $\approx 0.7\text{--}0.9$, while horizontally transferred genes have CAI $\approx 0.3\text{--}0.5$ and tAI $\approx 0.2\text{--}0.4$.

2. Transfer RNA (tRNA)

tRNA Structure

Transfer RNAs are small RNA molecules (76-90 nucleotides) that serve as the physical link between the nucleotide sequence of mRNA and the amino acid sequence of proteins. Each tRNA carries a specific amino acid and recognizes one or more codons through its anticodon triplet.

Cloverleaf Secondary Structure

Acceptor Stem (7 bp): 5' and 3' ends base-pair; 3' end extends as single-stranded CCA-OH, the site of amino acid attachment (ester bond to 2' or 3'-OH of terminal adenosine)
D-loop (Dihydrouridine arm): Contains dihydrouridine (D) modified bases; varies in length (8-12 nt loop); interacts with aminoacyl-tRNA synthetase
Anticodon Loop: Always 7 nucleotides; anticodon at positions 34-36; position 34 is the wobble position; flanked by modified bases (position 37 often has a hypermodified purine)
TΨC Loop (T-arm): Contains the conserved TΨC sequence (ribothymidine-pseudouridine-cytidine); interacts with the ribosome large subunit
Variable Loop: 4-21 nucleotides; short in Class 1 tRNAs, long in Class 2 (tRNA^Ser, tRNA^Leu)

L-shaped 3D Structure

The cloverleaf folds into a compact L-shaped tertiary structure (~60 x 20 Angstroms)
The acceptor stem and TΨC arm stack coaxially to form one arm of the L
The D-arm and anticodon arm stack coaxially to form the other arm
The anticodon is at one end of the L; the amino acid attachment (CCA 3') is at the other end, ~75 Angstroms apart
Tertiary interactions include base triples, non-Watson-Crick pairs, and intercalation
Mg²⁺ ions stabilize the tertiary fold
The invariant G₁₉-C₅₆ pair connects the D-loop and T-loop at the corner of the L

Modified Bases in tRNA

tRNAs contain the highest density of modified nucleosides of any RNA class. Over 100 different modifications have been identified, with each tRNA containing an average of 12-14 modified bases.

Inosine (I)

Formed by deamination of adenosine by ADAT enzymes
Found at wobble position 34
Can pair with U, C, or A
Expands decoding capacity

Pseudouridine (Ψ)

C-glycoside isomer of uridine (C-C bond instead of N-C)
Found in TΨC loop (position 55)
Stabilizes RNA structure via extra H-bond donor
Most abundant tRNA modification

Dihydrouridine (D)

Saturated 5,6-double bond of uracil
Primarily in D-loop (hence the name)
Increases conformational flexibility
Cannot stack efficiently

Other important modifications include m¹A58 (stabilizes T-loop), t⁶A37 and i⁶A37 (adjacent to anticodon, prevent frameshifting), and 2'-O-methylation (protects against nucleases).

Aminoacyl-tRNA Synthetases (aaRS)

These essential enzymes catalyze the attachment of the correct amino acid to its cognate tRNA(s). There are 20 synthetases in most organisms (one per amino acid), and they must achieve extremely high fidelity since the ribosome cannot verify the amino acid identity — only the codon-anticodon match.

Class I Synthetases (10 enzymes)

Rossmann fold catalytic domain
HIGH and KMSKS signature motifs
Aminoacylate 2'-OH of terminal A
Approach tRNA from minor groove side
Generally monomeric
Amino acids: Met, Val, Ile, Leu, Cys, Arg, Glu, Gln, Tyr, Trp

Class II Synthetases (10 enzymes)

Antiparallel beta-sheet catalytic domain
Motifs 1, 2, 3 (distinct from Class I)
Aminoacylate 3'-OH of terminal A
Approach tRNA from major groove side
Generally dimeric or tetrameric
Amino acids: Gly, Ala, Pro, Ser, Thr, His, Asp, Asn, Lys, Phe

Two-Step Aminoacylation Reaction

$$\text{Step 1 (Activation):} \quad \text{AA} + \text{ATP} \rightleftharpoons \text{AA-AMP} + \text{PP}_i$$

$$\text{Step 2 (Transfer):} \quad \text{AA-AMP} + \text{tRNA} \rightarrow \text{AA-tRNA} + \text{AMP}$$

$$\text{Overall:} \quad \text{AA} + \text{tRNA} + \text{ATP} \rightarrow \text{AA-tRNA} + \text{AMP} + \text{PP}_i$$

The aminoacyl-adenylate (AA-AMP) intermediate remains enzyme-bound. Pyrophosphatase hydrolyzes PP_i to 2P_i, driving the reaction forward (cost: 2 high-energy bonds per amino acid).

Editing and Proofreading: The Double-Sieve Mechanism

Some synthetases (e.g., IleRS, ValRS, LeuRS, ThrRS) have a separate editing domain (CP1 domain) that hydrolyzes mischarged amino acids. The double-sieve model (Fersht, 1977):

First sieve (synthetic site): Excludes amino acids larger than the cognate substrate based on steric fit. Cannot reject smaller amino acids (e.g., IleRS cannot exclude Val, which differs by one methyl group).
Second sieve (editing site): Hydrolyzes amino acids smaller than the cognate substrate. The correctly charged product is too large to enter the editing site. IleRS editing site accepts Val-tRNA^Ile and hydrolyzes it, but Ile-tRNA^Ile is sterically excluded.

Editing can occur pre-transfer (hydrolyzing AA-AMP) or post-transfer (hydrolyzing AA-tRNA). This reduces the error rate from ~10^-2 to ~10^-4 (1 in 10,000).

Wobble Pairing Energetics

The free energy contributions of wobble base pairs differ from standard Watson-Crick pairs:

$$\Delta G^\circ_{\text{G-C}} \approx -3.0 \text{ kcal/mol} \quad \text{(Watson-Crick, 3 H-bonds)}$$

$$\Delta G^\circ_{\text{A-U}} \approx -1.5 \text{ kcal/mol} \quad \text{(Watson-Crick, 2 H-bonds)}$$

$$\Delta G^\circ_{\text{G-U}} \approx -1.0 \text{ kcal/mol} \quad \text{(Wobble pair, 2 H-bonds)}$$

$$\Delta G^\circ_{\text{I-C}} \approx -2.0 \text{ kcal/mol}, \quad \Delta G^\circ_{\text{I-A}} \approx -0.5 \text{ kcal/mol}, \quad \Delta G^\circ_{\text{I-U}} \approx -0.8 \text{ kcal/mol}$$

The total codon-anticodon interaction energy is the sum of all three base pair contributions. A minimum threshold of approximately -6 kcal/mol is needed for stable ribosomal A-site binding.

3. Ribosome Structure

The ribosome is a massive ribonucleoprotein machine (2.5-4.2 MDa) that catalyzes peptide bond formation. The catalytic center is composed of RNA, making the ribosome a ribozyme. Atomic-resolution structures (Ramakrishnan, Steitz, Yonath; Nobel 2009) revealed that no protein comes within 18 Angstroms of the peptidyl transferase active site.

Prokaryotic Ribosome (70S, ~2.5 MDa)

30S Small Subunit

16S rRNA (1542 nt) — decoding center; monitors codon-anticodon complementarity; 3' end contains anti-Shine-Dalgarno sequence (CCUCCU)
21 proteins (S1-S21)
Responsible for mRNA binding and decoding fidelity
Head, body, platform, and shoulder domains

50S Large Subunit

23S rRNA (2904 nt) — contains peptidyl transferase center (PTC); domain V forms the active site; this is the ribozyme activity
5S rRNA (120 nt) — structural role; bridges to tRNA and factors
~34 proteins (L1-L36)
Contains the peptide exit tunnel (~100 Angstroms long, ~15 Angstroms wide)

Eukaryotic Ribosome (80S, ~4.2 MDa)

40S Small Subunit

18S rRNA (~1900 nt) — decoding; contains expansion segments absent in prokaryotes
~33 proteins (eS and uS nomenclature)
More complex than 30S; additional regulatory interactions

60S Large Subunit

28S rRNA (~4700 nt) — peptidyl transferase center (homologous to 23S)
5.8S rRNA (~160 nt) — homologous to prokaryotic 23S 5' end; H-bonded to 28S
5S rRNA (~120 nt) — structural role
~47 proteins (eL and uL nomenclature)

Functional Sites

A Site (Aminoacyl)

Accepts incoming aminoacyl-tRNA
Codon-anticodon recognition occurs here
30S decoding center monitors base pairing geometry
A1492, A1493 (16S rRNA) flip out to sense minor groove of codon-anticodon helix
G530 switches from syn to anti upon cognate tRNA binding

P Site (Peptidyl)

Holds peptidyl-tRNA (growing chain)
During initiation, fMet-tRNA binds directly here
CCA end positioned at PTC for peptide bond formation
P-site tRNA contacts both subunits extensively

E Site (Exit)

Deacylated tRNA exits here after translocation
Low affinity for aminoacyl-tRNA
E-site tRNA release is coupled to A-site tRNA binding (allosteric coupling)
Codon-anticodon interaction maintained in E site

Additional Structural Features

mRNA Channel: Formed between head and body of small subunit; accommodates ~30 nt of mRNA; Shine-Dalgarno helix fits in the channel exit
Peptide Exit Tunnel: ~100 Angstrom tunnel through 50S subunit; lined mostly with 23S rRNA (domain I-V); nascent chain can begin folding near the exit (SRP recognition occurs here); proteins L4 and L22 form a constriction point
Factor Binding Center: GTPase-associated center (GAC) on 50S; sarcin-ricin loop (SRL, nt 2653-2667 of 23S) is essential for stimulating GTP hydrolysis by EF-Tu and EF-G
Inter-subunit Bridges: ~12 bridges connecting 30S and 50S; mostly RNA-RNA contacts; include B2a (the largest, involving h44 of 16S and H69 of 23S)

4. Translation Initiation

Prokaryotic Initiation

Prokaryotic initiation is directed by the Shine-Dalgarno (SD) sequence, a purine-rich region (consensus: AGGAGG) located 5-10 nucleotides upstream of the AUG start codon. The SD sequence base-pairs with the complementary anti-SD sequence (3'-AUUCCUCCACUAG-5') at the 3' end of 16S rRNA.

Step 1: 30S Pre-Initiation Complex

IF3 binds free 30S subunit, prevents premature 50S joining; also performs a proofreading role for start codon selection
IF1 binds at the A site of 30S, blocks tRNA entry; enhances IF2 and IF3 activities
mRNA binds through SD-anti-SD base pairing, positioning AUG at the P site

Step 2: 30S Initiation Complex

IF2-GTP delivers fMet-tRNA_f^Met to the P site
IF2 is a GTPase that specifically recognizes the formyl group on fMet (discriminates initiator from elongator Met-tRNA)
The initiator tRNA has unique features: 3 consecutive G-C pairs in the anticodon stem, no Watson-Crick pair at 1:72 position

Step 3: 70S Initiation Complex

50S subunit joins, triggering GTP hydrolysis by IF2
IF1, IF2-GDP, and IF3 dissociate
fMet-tRNA_f^Met is positioned in the P site, ready for elongation
The A site is now empty and ready to accept the first elongator aminoacyl-tRNA

Eukaryotic Initiation (Cap-Dependent Scanning)

Eukaryotic initiation is far more complex, involving at least 12 initiation factors (eIFs) and a scanning mechanism to locate the start codon. The process is the primary point of translational regulation.

Step 1: 43S Pre-Initiation Complex (PIC) Formation

eIF2-GTP-Met-tRNA_i^Met ternary complex forms (eIF2 is a heterotrimeric GTPase: alpha, beta, gamma subunits)
Ternary complex joins 40S subunit along with eIF1 (fidelity), eIF1A (A-site occupation, like IF1), eIF3 (13-subunit complex, anti-association), and eIF5 (GAP for eIF2)
This forms the 43S PIC in an "open" conformation capable of scanning

Step 2: mRNA Activation and 48S Complex

eIF4F complex binds the 5' m⁷G cap:
- - eIF4E: Cap-binding protein (regulated by 4E-BPs)
- - eIF4G: Large scaffold protein; bridges eIF4E to eIF3 (40S recruitment); also binds PABP (circularizes mRNA)
- - eIF4A: DEAD-box RNA helicase; unwinds 5'-UTR secondary structure; stimulated by eIF4B
eIF4B stimulates eIF4A helicase activity; eIF4H is a cofactor
43S PIC is recruited to the mRNA 5' end via eIF3-eIF4G interaction, forming the 48S complex

Step 3: Scanning and Start Codon Recognition

48S complex scans 5' to 3' along the 5'-UTR, powered by eIF4A helicase
Scans for the first AUG in a favorable Kozak consensus context:
5'-gcc(A/G)ccAUGG-3'
Position -3 (purine, especially A) and +4 (G) are most critical
AUG recognition triggers conformational change: eIF1 displacement, "closed" complex, eIF2 GTP hydrolysis (stimulated by eIF5)
Poor Kozak context allows "leaky scanning" to downstream AUGs
Upstream open reading frames (uORFs) in the 5'-UTR can regulate reinitiation

Step 4: 60S Joining (80S Formation)

eIF5B-GTP (homolog of prokaryotic IF2) promotes 60S subunit joining
eIF2-GDP, eIF1, eIF3, eIF5 dissociate
eIF5B GTP hydrolysis triggers release of eIF5B-GDP and eIF1A
eIF2B (guanine nucleotide exchange factor) recycles eIF2-GDP to eIF2-GTP (rate-limiting; regulated by phosphorylation of eIF2-alpha by kinases: HRI, PKR, PERK, GCN2 = integrated stress response)

IRES-Mediated Internal Initiation

Internal Ribosome Entry Sites (IRESes) are structured RNA elements that recruit ribosomes directly to an internal position in the mRNA, bypassing the need for a 5' cap and scanning. Originally discovered in picornavirus RNAs (poliovirus, EMCV), IRESes are also found in some cellular mRNAs.

Types of IRESes

Type I (Picornavirus): Requires most eIFs except eIF4E (e.g., poliovirus)
Type II (EMCV, HCV-like): Binds 40S directly; requires fewer factors
Type III (HCV): Binds 40S directly via RNA structure; needs only eIF3 and ternary complex
Type IV (Cricket paralysis virus): Requires NO initiation factors; RNA structure mimics tRNA in the P site

Biological Significance

Allows translation during stress when cap-dependent translation is inhibited
Viral strategy: many viruses cleave eIF4G (by viral proteases) to shut off host translation while using IRES for their own mRNAs
Some cellular mRNAs use IRES during apoptosis, mitosis, hypoxia
Important drug target for HCV (sofosbuvir era)

5. Translation Elongation

Elongation is a cyclic process with three major steps: aminoacyl-tRNA delivery, peptide bond formation, and translocation. Each cycle adds one amino acid to the growing polypeptide and moves the ribosome by one codon (3 nucleotides) along the mRNA.

Step 1: Aminoacyl-tRNA Delivery (Decoding)

EF-Tu-GTP-aa-tRNA ternary complex delivers aminoacyl-tRNA to the ribosomal A site. EF-Tu (in eukaryotes: eEF1A) is the most abundant protein in the cell (~5-10% of total protein in E. coli).
Initial selection: Codon-anticodon base pairing is monitored by 16S rRNA nucleotides A1492, A1493, and G530. Cognate tRNA induces a conformational change (domain closure of 30S) that activates the GTPase center on 50S.
GTP hydrolysis: Triggered by the sarcin-ricin loop (SRL) of 23S rRNA interacting with EF-Tu. GTP hydrolysis causes a conformational change in EF-Tu (switch I and II regions), releasing EF-Tu-GDP from the ribosome.
Proofreading: After GTP hydrolysis but before peptide bond formation, the aa-tRNA can still dissociate if the codon-anticodon interaction is incorrect (kinetic proofreading, Hopfield 1974). Near-cognate tRNAs are rejected at this stage.
Accommodation: Cognate aa-tRNA swings its CCA end into the PTC of the 50S A site (from the A/T state to the A/A state). This involves a ~70-Angstrom movement of the acceptor end.
EF-Ts (eEF1B in eukaryotes) serves as the guanine nucleotide exchange factor (GEF) for EF-Tu, recycling EF-Tu-GDP back to EF-Tu-GTP.

Step 2: Peptide Bond Formation

Peptide bond formation is catalyzed by the peptidyl transferase center (PTC), which is composed entirely of 23S rRNA — making the ribosome a ribozyme. The reaction is an aminolysis: the alpha-amino group of the A-site aminoacyl-tRNA attacks the carbonyl carbon of the ester bond linking the peptide to the P-site tRNA.

$$\text{Peptidyl-tRNA}_{\text{P}} + \text{AA-tRNA}_{\text{A}} \rightarrow \text{Peptidyl-AA-tRNA}_{\text{A}} + \text{tRNA}_{\text{P}}$$

Substrate-assisted catalysis: The 2'-OH of the P-site tRNA A76 ribose participates directly in catalysis, acting as a proton shuttle. Mutation to 2'-deoxy reduces rate ~10⁶-fold.
Entropy reduction: The ribosome achieves most of its catalytic power (~10⁷-fold rate enhancement) by precisely positioning the substrates, reducing the entropic cost of the reaction. The PTC provides a pre-organized environment.
Rate: Peptide bond formation itself is very fast (~50-300 s^-1), not rate-limiting in elongation. The chemical step may be preceded by rate-limiting accommodation.
Key 23S rRNA residues: A2451, U2506, U2585, A2602 form the PTC walls; A2451 was initially proposed as a general acid-base catalyst but is now thought to play a structural/positioning role.

Step 3: Translocation

Hybrid states: After peptide bond formation, tRNAs spontaneously adopt hybrid states: the deacylated tRNA moves to P/E (P-site on 30S, E-site on 50S) and the peptidyl-tRNA moves to A/P (A-site on 30S, P-site on 50S). This is driven by the thermodynamics of the CCA end interactions.
EF-G-GTP binding: EF-G (eEF2 in eukaryotes) binds to the ribosome at the A site (its domain IV mimics the shape of tRNA — "molecular mimicry"). EF-G accelerates translocation ~50-fold.
Ratchet-like motion: GTP hydrolysis by EF-G drives a ratchet-like rotation of the 30S subunit relative to 50S (~6 degree counterclockwise rotation), coupled with swiveling of the 30S head domain (~18 degrees). This moves the mRNA-tRNA complex by exactly one codon.
Post-translocation state: After translocation, deacylated tRNA is in the E site (released on next cycle), peptidyl-tRNA is in the P site, and the A site is empty and ready for the next aa-tRNA. EF-G-GDP dissociates.

Translation Speed and Energetics

Elongation Rates

E. coli (37 C): ~15-20 amino acids/second
Eukaryotes: ~5-6 amino acids/second
Mitochondria: ~1-2 amino acids/second
Complete 300-aa protein in E. coli: ~15-20 seconds
Same protein in eukaryotes: ~50-60 seconds

Energy Cost Per Amino Acid

$$\Delta G_{\text{peptide bond}} \approx +0.5 \text{ kcal/mol (endergonic)}$$

$$\text{ATP cost: } 2 \text{ (aminoacylation)} + 1 \text{ (EF-Tu)} + 1 \text{ (EF-G)} = 4 \text{ NTPs}$$

Each high-energy bond provides ~7.3 kcal/mol. Total cost per amino acid: ~29.2 kcal/mol, making translation thermodynamically highly favorable and effectively irreversible.

Overall translation rate considering tRNA competition:

$$k_{\text{elong}} = \frac{k_{\text{cat}}}{1 + \frac{K_M}{[\text{aa-tRNA}_{\text{cognate}}]} \left(1 + \frac{[\text{aa-tRNA}_{\text{near-cognate}}]}{K_I}\right)}$$

This Michaelis-Menten-like expression shows that the elongation rate at each codon depends on the concentration of cognate aminoacyl-tRNA and competition from near-cognate species. Rare codons with low cognate tRNA concentrations have slower elongation, leading to ribosome pausing.

Derivation: Ribosome Elongation Rate from Michaelis-Menten tRNA Selection

Starting from the kinetic scheme for aminoacyl-tRNA selection at the ribosomal A site, we derive the codon-specific elongation rate.

Step 1: Define the ternary complex delivery scheme

EF-Tu-GTP-aa-tRNA ternary complexes sample the A site. The cognate complex binds with association rate $k_1$ and either dissociates ($k_{-1}$) or triggers GTP hydrolysis ($k_2$):

$$\text{Ribosome} + \text{TC}_{\text{cog}} \underset{k_{-1}}{\overset{k_1}{\rightleftharpoons}} \text{Initial complex} \xrightarrow{k_2} \text{GTP hydrolysis} \xrightarrow{k_3} \text{Accommodation}$$

Step 2: Effective M-M rate for cognate tRNA selection

Applying the steady-state approximation to the initial recognition complex and combining the subsequent steps into an effective $k_{\text{cat}}$:

$$k_{\text{elong}} = \frac{k_{\text{cat}} \cdot [\text{TC}_{\text{cog}}]}{K_M + [\text{TC}_{\text{cog}}]}$$

where $K_M = (k_{-1} + k_2)/k_1$ and $k_{\text{cat}}$ combines GTP hydrolysis, proofreading, accommodation, peptide bond formation, and translocation.

Step 3: Include competitive inhibition by near-cognate tRNAs

Near-cognate ternary complexes compete for the A site but are mostly rejected during initial selection and proofreading. They act as competitive inhibitors with inhibition constant $K_I$:

$$k_{\text{elong}} = \frac{k_{\text{cat}} \cdot [\text{TC}_{\text{cog}}]}{K_M\left(1 + \frac{[\text{TC}_{\text{near}}]}{K_I}\right) + [\text{TC}_{\text{cog}}]}$$

Step 4: Kinetic proofreading (Hopfield, 1974)

After GTP hydrolysis, near-cognate tRNAs have a second chance to dissociate before accommodation (proofreading step). This adds an irreversible energy-consuming step that amplifies discrimination beyond thermodynamic equilibrium:

$$\text{Selectivity} = \underbrace{\frac{(k_2)_{\text{cog}}}{(k_2)_{\text{near}}}}_{\text{initial selection}} \times \underbrace{\frac{(k_3)_{\text{cog}}}{(k_3)_{\text{near}}}}_{\text{proofreading}} \approx 10^{2} \times 10^{1} = 10^{3}$$

Overall error rate per codon: ~$10^{-3}\text{--}10^{-4}$.

Step 5: Codon-specific rate variation

Since $[\text{TC}_{\text{cog}}] \propto$ tRNA abundance, which varies 10-fold between common and rare codons, the elongation rate varies accordingly. At a rare codon ($[\text{TC}] \ll K_M$):

$$k_{\text{elong}}^{\text{rare}} \approx \frac{k_{\text{cat}}}{K_M} \cdot [\text{TC}_{\text{rare}}] \quad \text{(first-order, slow)}$$

At an optimal codon ($[\text{TC}] \gg K_M$):

$$k_{\text{elong}}^{\text{optimal}} \approx k_{\text{cat}} \quad \text{(zero-order, maximal speed)}$$

Step 6: Overall translation rate for an mRNA

The total time to translate an mRNA of $L$ codons is the sum of per-codon dwell times. The overall rate is limited by the slowest codons (harmonic mean):

$$v_{\text{overall}} = \frac{L}{\sum_{i=1}^{L} 1/k_i} = L \cdot \left(\sum_{i=1}^{L} \frac{1}{k_i}\right)^{-1}$$

For E. coli: optimal codons give $k \approx 20$ aa/s, rare codons $k \approx 2$ aa/s. A cluster of rare codons can cause ribosome pausing, traffic jams, and co-translational folding pauses that may be biologically functional.

6. Translation Termination

Termination occurs when a stop codon (UAA, UAG, or UGA) enters the ribosomal A site. Since no aminoacyl-tRNA has an anticodon complementary to stop codons, protein release factors recognize them instead and trigger hydrolysis of the peptidyl-tRNA bond.

Prokaryotic Release Factors

RF1: Recognizes UAA and UAG. Contains the PxT tripeptide motif in the anticodon-mimicking domain; GGQ motif catalyzes peptidyl-tRNA hydrolysis
RF2: Recognizes UAA and UGA. Contains SPF tripeptide for stop codon recognition; also has GGQ motif
RF3: GTPase that accelerates RF1/RF2 dissociation from the ribosome after peptide release. RF3-GTP binding promotes RF1/RF2 release; GTP hydrolysis releases RF3 itself.

Eukaryotic Release Factors

eRF1: Recognizes all three stop codons (single omnipotent factor). NIKS motif in domain 1 for stop codon recognition; GGQ motif in domain 2 for peptidyl-tRNA hydrolysis; domain 3 interacts with eRF3
eRF3: GTPase (translational GTPase superfamily). eRF3-GTP stimulates eRF1 activity; GTP hydrolysis triggers conformational changes for efficient peptide release. Also involved in NMD pathway.

The GGQ Motif: Catalytic Mechanism

The universally conserved Gly-Gly-Gln (GGQ) motif in both RF1/RF2 and eRF1 is positioned in the PTC to catalyze the hydrolysis of the ester bond between the peptide and the P-site tRNA. The glutamine backbone NH positions a water molecule for nucleophilic attack on the ester carbonyl. The glutamine sidechain is methylated post-translationally (by PrmC/HemK) in prokaryotes, enhancing activity. Mutation of either Gly to any other amino acid is lethal.

Ribosome Recycling

Prokaryotes: Ribosome Recycling Factor (RRF) + EF-G-GTP split the 70S ribosome into subunits. RRF mimics the shape of tRNA and binds to the A site. EF-G-mediated GTP hydrolysis drives subunit dissociation. IF3 then prevents 30S-50S reassociation. Deacylated tRNA and mRNA are released.
Eukaryotes: ABCE1 (Rli1), an ABC-type ATPase, is the primary recycling factor. ABCE1 uses ATP hydrolysis to mechanically split the 80S ribosome after eRF1-mediated peptide release. Ligatin (eIF2D) and MCT-1/DENR can also promote recycling and reinitiation.

7. Post-Translational Modifications (PTMs)

Most proteins undergo covalent modifications after (or during) translation that are essential for their function, localization, and regulation. The proteome is vastly more complex than the genome due to combinatorial PTMs.

Signal Peptide Cleavage

The signal recognition particle (SRP) recognizes hydrophobic signal peptides (typically 16-30 residues at the N-terminus) as they emerge from the ribosome exit tunnel. SRP directs the ribosome-nascent chain complex to the ER membrane (eukaryotes) or plasma membrane (prokaryotes). After translocation through the Sec61/SecYEG translocon, signal peptidase cleaves the signal peptide.

N-linked Glycosylation (ER)

Occurs co-translationally in the ER lumen
Oligosaccharyltransferase (OST) transfers a preassembled 14-sugar core glycan (Glc₃Man₉GlcNAc₂) from dolichol-PP to Asn in the sequon N-X-S/T (X is not Pro)
Glucose residues trimmed by glucosidases I and II; calnexin/calreticulin cycle ensures proper folding
Further processed in Golgi (trimming, addition of GlcNAc, Gal, sialic acid, fucose)
Critical for glycoprotein folding, stability, cell-cell recognition

O-linked Glycosylation (Golgi)

Occurs post-translationally in the Golgi apparatus
Sugars added one at a time to Ser or Thr hydroxyl groups
No consensus sequence (unlike N-linked); often in Ser/Thr-rich regions
Common core: GalNAc (mucin-type); also O-GlcNAc (cytoplasmic/nuclear; regulatory; competes with phosphorylation)
Important for mucins, proteoglycans, signaling

Phosphorylation

Kinases transfer gamma-phosphate from ATP to Ser, Thr, or Tyr
~518 kinases in human genome (kinome)
Reversed by phosphatases
Major regulatory switch in signaling
~30% of all proteins are phosphorylated

Ubiquitination

76-residue ubiquitin conjugated to Lys residues
E1 (activating) → E2 (conjugating) → E3 (ligase) cascade
K48-linked polyUb: proteasome degradation
K63-linked polyUb: signaling, DNA repair
MonoUb: endocytosis, histone regulation
Reversed by deubiquitinases (DUBs)

SUMOylation

Small Ubiquitin-like Modifier (~100 aa)
Conjugated to Lys in ΨKxE consensus
E1 (SAE1/SAE2) → E2 (Ubc9) → E3 ligases
Regulates nuclear transport, transcription
Often antagonistic to ubiquitination
SUMO-1, SUMO-2/3 have distinct targets

8. Translation Quality Control

Cells have evolved multiple surveillance pathways to detect and deal with aberrant mRNAs and stalled ribosomes. These quality control mechanisms prevent the accumulation of potentially toxic truncated or aberrant proteins.

Nonsense-Mediated Decay (NMD)

Trigger: Premature termination codon (PTC) located >50 nt upstream of an exon-exon junction
Mechanism: UPF1 (RNA helicase) interacts with eRF3 during termination. If UPF1 encounters a downstream exon junction complex (EJC, deposited during splicing), it triggers NMD. UPF2 and UPF3 bridge UPF1 to the EJC.
Outcome: SMG1 kinase phosphorylates UPF1, recruiting SMG5/6/7 which activate mRNA decapping (Dcp1/Dcp2) and deadenylation; also endonucleolytic cleavage by SMG6
Significance: Degrades ~5-10% of all mRNAs; important for eliminating PTC-containing transcripts from nonsense mutations; also regulates normal gene expression

No-Go Decay (NGD)

Trigger: Ribosome stalling due to mRNA secondary structure, rare codons, damaged bases, or poly(A) sequences within the ORF
Mechanism: Stalled ribosomes are sensed by Dom34 (Pelota in mammals) and Hbs1 (HBS1L), which mimic eRF1 and eRF3 respectively. Dom34 lacks the GGQ motif and does not trigger peptide release.
Outcome: Endonucleolytic cleavage of mRNA near the stall site (by an unknown endonuclease, possibly Cue2); ribosome splitting by ABCE1; fragments degraded by Xrn1 (5' to 3') and the exosome (3' to 5')

Non-Stop Decay (NSD)

Trigger: mRNAs lacking a stop codon (e.g., premature polyadenylation within ORF, or endonucleolytic cleavage)
Mechanism: Ribosome translates into the poly(A) tail, producing poly-lysine (AAA = Lys). Poly(A) in the mRNA channel triggers stalling. Ski7 (GTPase) or Dom34/Hbs1 recognize the stalled ribosome.
Outcome: mRNA degraded by the exosome (recruited by Ski complex: Ski2/Ski3/Ski8). Nascent peptide targeted for proteasomal degradation.

Ribosome-Associated Quality Control (RQC)

Trigger: Stalled 60S-peptidyl-tRNA complex remaining after ribosome splitting (by Dom34/Hbs1/ABCE1) during NGD or NSD
Key factor: Listerin (Ltn1/NEMF pathway) — an E3 ubiquitin ligase that ubiquitinates the nascent chain on the stalled 60S subunit
RQC2 (NEMF): Stabilizes tRNA in the P site of the 60S; remarkably, can add C-terminal Ala-Thr extensions (CATylation or "CAT tails") to the nascent chain without mRNA template, exposing Lys residues buried in the exit tunnel for Ltn1 ubiquitination
Vms1 (ANKZF1): Releases peptidyl-tRNA from the 60S when Ltn1 pathway is overwhelmed; acts as a backup
Outcome: Ubiquitinated nascent chain extracted by Cdc48/p97 (AAA-ATPase) and delivered to the 26S proteasome for degradation
Failure consequences: RQC defects linked to neurodegeneration; Ltn1 mutation causes protein aggregation in mice (cerebellar neurodegeneration)

Prokaryotic Rescue: tmRNA (SsrA)

In bacteria, stalled ribosomes on truncated mRNAs are rescued by the tmRNA (transfer-messenger RNA) system. tmRNA mimics both a tRNA (alanyl-tRNA at its 5' end) and an mRNA (contains a short ORF encoding a degradation tag). When a ribosome stalls at the 3' end of a truncated mRNA, SmpB protein delivers tmRNA to the A site. The ribosome switches templates from the broken mRNA to the tmRNA ORF, adding the tag sequence (AANDENYALAA in E. coli) to the C-terminus of the nascent chain. This tagged protein is then recognized and degraded by ClpXP, ClpAP, FtsH, and Tsp proteases. The ribosome terminates normally at the stop codon within the tmRNA ORF.

Python: Translation Simulation with tRNA Competition

This simulation models a ribosome translating an mRNA codon-by-codon. The elongation rate at each codon depends on the abundance of the cognate tRNA species (based on E. coli codon usage data). Rare codons with low-abundance tRNAs cause ribosome pausing, while common codons with abundant tRNAs are decoded quickly. The stochastic waiting times follow an exponential distribution, reflecting the random arrival of ternary complexes at the A site.

Translation Simulation: Ribosome Elongation with tRNA Competition

Python

Codon-by-codon translation simulator showing elongation rate variation

script.py141 lines

#!/usr/bin/env python3
"""translation_simulation.py - Ribosome translation simulator with codon-by-codon tRNA competition"""
import numpy as np
import matplotlib.pyplot as plt

# --- Genetic code dictionary ---
genetic_code = {
    'UUU': 'F', 'UUC': 'F', 'UUA': 'L', 'UUG': 'L',
    'CUU': 'L', 'CUC': 'L', 'CUA': 'L', 'CUG': 'L',
    'AUU': 'I', 'AUC': 'I', 'AUA': 'I', 'AUG': 'M',
    'GUU': 'V', 'GUC': 'V', 'GUA': 'V', 'GUG': 'V',
    'UCU': 'S', 'UCC': 'S', 'UCA': 'S', 'UCG': 'S',
    'CCU': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P',
    'ACU': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T',
    'GCU': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A',
    'UAU': 'Y', 'UAC': 'Y', 'UAA': '*', 'UAG': '*',
    'CAU': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q',
    'AAU': 'N', 'AAC': 'N', 'AAA': 'K', 'AAG': 'K',
    'GAU': 'D', 'GAC': 'D', 'GAA': 'E', 'GAG': 'E',
    'UGU': 'C', 'UGC': 'C', 'UGA': '*', 'UGG': 'W',
    'CGU': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R',
    'AGU': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R',
    'GGU': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G',
}

# E. coli codon usage frequencies (tRNA abundance proxy)
# Higher values = more abundant cognate tRNA = faster decoding
ecoli_trna_abundance = {
    'UUU': 0.58, 'UUC': 1.00, 'UUA': 0.14, 'UUG': 0.13,
    'CUU': 0.12, 'CUC': 0.10, 'CUA': 0.04, 'CUG': 1.00,
    'AUU': 0.51, 'AUC': 1.00, 'AUA': 0.07, 'AUG': 1.00,
    'GUU': 0.73, 'GUC': 0.40, 'GUA': 0.49, 'GUG': 0.37,
    'UCU': 0.59, 'UCC': 0.57, 'UCA': 0.14, 'UCG': 0.15,
    'CCU': 0.18, 'CCC': 0.13, 'CCA': 0.20, 'CCG': 1.00,
    'ACU': 0.50, 'ACC': 1.00, 'ACA': 0.14, 'ACG': 0.27,
    'GCU': 0.68, 'GCC': 0.42, 'GCA': 0.56, 'GCG': 1.00,
    'UAU': 0.59, 'UAC': 1.00,
    'CAU': 0.57, 'CAC': 1.00, 'CAA': 0.34, 'CAG': 1.00,
    'AAU': 0.49, 'AAC': 1.00, 'AAA': 1.00, 'AAG': 0.24,
    'GAU': 0.63, 'GAC': 1.00, 'GAA': 1.00, 'GAG': 0.33,
    'UGU': 0.44, 'UGC': 1.00, 'UGG': 1.00,
    'CGU': 1.00, 'CGC': 0.60, 'CGA': 0.07, 'CGG': 0.10,
    'AGU': 0.16, 'AGC': 0.46, 'AGA': 0.07, 'AGG': 0.04,
    'GGU': 1.00, 'GGC': 0.72, 'GGA': 0.13, 'GGG': 0.15,
}

# Base elongation rate (aa/s) in E. coli
BASE_RATE = 18.0  # ~15-20 aa/s

def simulate_translation(mRNA_seq, base_rate=BASE_RATE):
    """Simulate ribosome moving along mRNA codon-by-codon.
    Returns per-codon elongation times and amino acid sequence."""
    codons = [mRNA_seq[i:i+3] for i in range(0, len(mRNA_seq)-2, 3)]
    times = []
    aas = []
    cumulative_time = 0.0
    positions = []

for codon in codons:
        aa = genetic_code.get(codon, '?')
        if aa == '*':
            break
        aas.append(aa)

# tRNA competition: decoding time inversely proportional to tRNA abundance
        abundance = ecoli_trna_abundance.get(codon, 0.3)
        # Mean wait time = 1 / (base_rate * abundance)
        mean_time = 1.0 / (base_rate * abundance)
        # Add stochastic variation (exponential waiting time)
        wait = np.random.exponential(mean_time)
        cumulative_time += wait
        times.append(wait)
        positions.append(cumulative_time)

return codons[:len(aas)], aas, times, positions

# --- Simulate a sample ORF (mix of common and rare codons) ---
np.random.seed(42)

# Construct an mRNA with varying codon optimality
# Use a mix: start with common codons, then rare, then common again
common_codons = ['AUG', 'GCG', 'AAC', 'GAC', 'CUG', 'ACC', 'UUC',
                 'GGU', 'CGU', 'AAA', 'GAA', 'AUC', 'CCG', 'GCC']
rare_codons   = ['AUA', 'CUA', 'AGA', 'AGG', 'CGA', 'UUA', 'CCC',
                 'GGA', 'UCA', 'ACA', 'GGG', 'CGG', 'UCG', 'CUU']

# Build an mRNA: 15 common + 15 rare + 15 common + stop
mRNA = ''.join(common_codons[:15]) + ''.join(rare_codons[:14]) + ''.join(common_codons[:15]) + 'UAA'

codons, aas, times, cum_times = simulate_translation(mRNA)
n = len(aas)

# Calculate local elongation rate (aa/s) = 1/time
local_rates = [1.0/t for t in times]

print(f"mRNA length: {len(mRNA)} nt, {n} codons translated")
print(f"Protein: {''.join(aas[:20])}{'...' if n > 20 else ''}")
print(f"Total translation time: {cum_times[-1]:.3f} s")
print(f"Average rate: {n / cum_times[-1]:.1f} aa/s")
print(f"Min local rate: {min(local_rates):.1f} aa/s (codon {codons[local_rates.index(min(local_rates))]})")
print(f"Max local rate: {max(local_rates):.1f} aa/s (codon {codons[local_rates.index(max(local_rates))]})")

# --- Plotting ---
fig, axes = plt.subplots(3, 1, figsize=(12, 10))

# 1. Ribosome position vs time
axes[0].step(cum_times, range(1, n+1), where='post', color='#22d3ee', linewidth=1.5)
axes[0].set_xlabel('Time (s)', fontsize=11)
axes[0].set_ylabel('Codon position', fontsize=11)
axes[0].set_title('Ribosome Progress Along mRNA', fontsize=13, fontweight='bold')
axes[0].axhspan(0, 15, alpha=0.15, color='green', label='Common codons')
axes[0].axhspan(15, 29, alpha=0.15, color='red', label='Rare codons')
axes[0].axhspan(29, 44, alpha=0.15, color='green')
axes[0].legend(fontsize=9)
axes[0].grid(True, alpha=0.3)

# 2. Per-codon elongation rate
colors = ['#22c55e' if ecoli_trna_abundance.get(c, 0.3) > 0.5 else '#ef4444' for c in codons]
axes[1].bar(range(n), local_rates, color=colors, alpha=0.8, width=0.8)
axes[1].axhline(BASE_RATE, ls='--', color='yellow', alpha=0.7, label=f'Base rate ({BASE_RATE} aa/s)')
axes[1].set_xlabel('Codon position', fontsize=11)
axes[1].set_ylabel('Local rate (aa/s)', fontsize=11)
axes[1].set_title('Elongation Rate Variation by Codon', fontsize=13, fontweight='bold')
axes[1].legend(fontsize=9)
axes[1].grid(True, alpha=0.3)

# 3. tRNA abundance along the mRNA
abundances = [ecoli_trna_abundance.get(c, 0.3) for c in codons]
axes[2].bar(range(n), abundances, color='#a78bfa', alpha=0.8, width=0.8)
axes[2].axhline(0.5, ls='--', color='yellow', alpha=0.5, label='Threshold (0.5)')
axes[2].set_xlabel('Codon position', fontsize=11)
axes[2].set_ylabel('Relative tRNA abundance', fontsize=11)
axes[2].set_title('tRNA Abundance Profile Along mRNA', fontsize=13, fontweight='bold')
axes[2].legend(fontsize=9)
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('output.png', dpi=100, bbox_inches='tight')
print("\nPlot saved. Green bars = abundant tRNA codons, Red bars = rare tRNA codons.")
print("Notice ribosome slowing (stalling) in the rare-codon region.")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Fortran: Codon Usage Statistics and CAI Calculator

This Fortran program analyzes an input DNA sequence to compute codon frequencies, Relative Synonymous Codon Usage (RSCU), relative adaptiveness values, and the Codon Adaptation Index (CAI). The sample sequence is from the E. coli lacZ gene, a highly expressed gene with strong codon bias toward translationally optimal codons.

Codon Usage Statistics & CAI Analysis

Fortran

Computes codon frequencies, RSCU, relative adaptiveness, and Codon Adaptation Index for a DNA sequence

codon_usage.f90220 lines

program codon_usage_analysis
  ! ============================================================
  ! Codon Usage Statistics & Codon Adaptation Index Calculator
  ! Computes codon frequencies, RSCU, and CAI for input sequences
  ! ============================================================
  implicit none

integer, parameter :: MAX_SEQ = 5000
  character(len=MAX_SEQ) :: sequence
  character(len=3) :: codon
  character(len=1) :: aa_table(64)
  character(len=3) :: codon_table(64)
  integer :: codon_counts(64)
  real(8) :: rscu(64)        ! Relative Synonymous Codon Usage
  real(8) :: w_values(64)    ! Relative adaptiveness
  real(8) :: cai, log_cai_sum
  integer :: i, j, k, pos, n_codons, seq_len
  integer :: aa_group_start, aa_group_end, aa_group_count
  integer :: max_count_in_group
  character(len=1) :: current_aa

! Define the 64 codons in standard order (UUU..GGG mapped to DNA: TTT..GGG)
  ! We use T instead of U for DNA input
  codon_table = (/ &
    'TTT', 'TTC', 'TTA', 'TTG', 'CTT', 'CTC', 'CTA', 'CTG', &
    'ATT', 'ATC', 'ATA', 'ATG', 'GTT', 'GTC', 'GTA', 'GTG', &
    'TCT', 'TCC', 'TCA', 'TCG', 'CCT', 'CCC', 'CCA', 'CCG', &
    'ACT', 'ACC', 'ACA', 'ACG', 'GCT', 'GCC', 'GCA', 'GCG', &
    'TAT', 'TAC', 'TAA', 'TAG', 'CAT', 'CAC', 'CAA', 'CAG', &
    'AAT', 'AAC', 'AAA', 'AAG', 'GAT', 'GAC', 'GAA', 'GAG', &
    'TGT', 'TGC', 'TGA', 'TGG', 'CGT', 'CGC', 'CGA', 'CGG', &
    'AGT', 'AGC', 'AGA', 'AGG', 'GGT', 'GGC', 'GGA', 'GGG'  &
  /)

! Amino acid encoded by each codon
  aa_table = (/ &
    'F', 'F', 'L', 'L', 'L', 'L', 'L', 'L', &
    'I', 'I', 'I', 'M', 'V', 'V', 'V', 'V', &
    'S', 'S', 'S', 'S', 'P', 'P', 'P', 'P', &
    'T', 'T', 'T', 'T', 'A', 'A', 'A', 'A', &
    'Y', 'Y', '*', '*', 'H', 'H', 'Q', 'Q', &
    'N', 'N', 'K', 'K', 'D', 'D', 'E', 'E', &
    'C', 'C', '*', 'W', 'R', 'R', 'R', 'R', &
    'S', 'S', 'R', 'R', 'G', 'G', 'G', 'G'  &
  /)

! --- Sample sequence: E. coli lacZ gene fragment (first 300 nt) ---
  sequence = &
    'ATGACCATGATTACGCCAAGCTTTCCCTGTAGCGATCGCTATCGTCTGTTTACTGATGCG' // &
    'AATCCACGCTTTTTAAAGCAGTTATTGGTGCCCTTAAACGCCTGGGGTAATGACTCTCTA' // &
    'GCGAAAGGTCTGGCAGAATGCAATACCAATGTCTCTCGTGCAAAACATTCACGTTTCTTC' // &
    'GGGCACTGGTGATGCATCACCAGAAAGCTTGATACCTTCAGATGTCACGTGCAGTCGTAC' // &
    'GATCGCCCTTCCCAACAGTTGCGCAGCCTGAATGGCGAATGGCGCTTTGCCTGGTTTCCG'

seq_len = len_trim(sequence)

! Convert to uppercase (safety)
  do i = 1, seq_len
    if (ichar(sequence(i:i)) >= 97 .and. ichar(sequence(i:i)) <= 122) then
      sequence(i:i) = char(ichar(sequence(i:i)) - 32)
    end if
  end do

! --- Count codons ---
  codon_counts = 0
  n_codons = 0

do pos = 1, seq_len - 2, 3
    codon = sequence(pos:pos+2)
    do j = 1, 64
      if (codon == codon_table(j)) then
        codon_counts(j) = codon_counts(j) + 1
        n_codons = n_codons + 1
        exit
      end if
    end do
  end do

! --- Calculate RSCU (Relative Synonymous Codon Usage) ---
  ! RSCU_i = (observed_i / expected_i) = (count_i * n_synonyms) / sum_synonymous
  rscu = 0.0d0

do i = 1, 64
    if (aa_table(i) == '*') then
      rscu(i) = 0.0d0
      cycle
    end if
    current_aa = aa_table(i)
    aa_group_count = 0
    k = 0
    ! Count synonymous codons and their total
    do j = 1, 64
      if (aa_table(j) == current_aa) then
        aa_group_count = aa_group_count + 1
        k = k + codon_counts(j)
      end if
    end do
    if (k > 0) then
      rscu(i) = dble(codon_counts(i)) * dble(aa_group_count) / dble(k)
    else
      rscu(i) = 0.0d0
    end if
  end do

! --- Calculate relative adaptiveness w_i = RSCU_i / RSCU_max for each AA ---
  w_values = 0.0d0
  do i = 1, 64
    if (aa_table(i) == '*') cycle
    current_aa = aa_table(i)
    max_count_in_group = 0
    do j = 1, 64
      if (aa_table(j) == current_aa) then
        if (codon_counts(j) > max_count_in_group) then
          max_count_in_group = codon_counts(j)
        end if
      end if
    end do
    if (max_count_in_group > 0) then
      w_values(i) = dble(codon_counts(i)) / dble(max_count_in_group)
    end if
  end do

! --- Calculate CAI = geometric mean of w_i for all codons ---
  log_cai_sum = 0.0d0
  k = 0
  do pos = 1, seq_len - 2, 3
    codon = sequence(pos:pos+2)
    do j = 1, 64
      if (codon == codon_table(j)) then
        if (aa_table(j) /= '*' .and. w_values(j) > 0.0d0) then
          log_cai_sum = log_cai_sum + log(w_values(j))
          k = k + 1
        end if
        exit
      end if
    end do
  end do

if (k > 0) then
    cai = exp(log_cai_sum / dble(k))
  else
    cai = 0.0d0
  end if

! --- Output Results ---
  write(*,'(A)') '========================================================'
  write(*,'(A)') '  CODON USAGE STATISTICS & CAI ANALYSIS'
  write(*,'(A)') '  Sequence: E. coli lacZ gene fragment'
  write(*,'(A)') '========================================================'
  write(*,'(A,I5,A)') ' Sequence length: ', seq_len, ' nt'
  write(*,'(A,I5)')   ' Total codons:    ', n_codons
  write(*,'(A)')      ''
  write(*,'(A)') '--- Codon Frequency Table ---'
  write(*,'(A)') ' Codon  AA   Count   Freq    RSCU    w(i)'
  write(*,'(A)') '------  --   -----   -----   -----   -----'

do i = 1, 64
    if (codon_counts(i) > 0) then
      write(*,'(2X,A3,4X,A1,4X,I3,4X,F5.3,3X,F5.2,3X,F5.3)') &
        codon_table(i), aa_table(i), codon_counts(i), &
        dble(codon_counts(i))/dble(n_codons), rscu(i), w_values(i)
    end if
  end do

write(*,'(A)')      ''
  write(*,'(A)') '--- Codon Adaptation Index ---'
  write(*,'(A,F6.4)') ' CAI = ', cai
  write(*,'(A)')      ''

if (cai > 0.8d0) then
    write(*,'(A)') ' Interpretation: HIGH codon optimization (CAI > 0.8)'
    write(*,'(A)') ' -> Gene is likely highly expressed'
  else if (cai > 0.5d0) then
    write(*,'(A)') ' Interpretation: MODERATE codon optimization (0.5 < CAI < 0.8)'
    write(*,'(A)') ' -> Gene has moderate expression potential'
  else
    write(*,'(A)') ' Interpretation: LOW codon optimization (CAI < 0.5)'
    write(*,'(A)') ' -> Gene may be poorly expressed or foreign'
  end if

write(*,'(A)')      ''
  write(*,'(A)') '--- Amino Acid Composition ---'

! Count amino acids
  block
    character(len=1) :: aa_list(20)
    integer :: aa_counts(20), idx
    aa_list = (/ 'A','C','D','E','F','G','H','I','K','L', &
                 'M','N','P','Q','R','S','T','V','W','Y' /)
    aa_counts = 0
    do pos = 1, seq_len - 2, 3
      codon = sequence(pos:pos+2)
      do j = 1, 64
        if (codon == codon_table(j) .and. aa_table(j) /= '*') then
          do idx = 1, 20
            if (aa_table(j) == aa_list(idx)) then
              aa_counts(idx) = aa_counts(idx) + 1
              exit
            end if
          end do
          exit
        end if
      end do
    end do

write(*,'(A)') ' AA  Count  Fraction'
    write(*,'(A)') ' --  -----  --------'
    do idx = 1, 20
      if (aa_counts(idx) > 0) then
        write(*,'(2X,A1,3X,I3,4X,F5.3)') &
          aa_list(idx), aa_counts(idx), dble(aa_counts(idx))/dble(n_codons)
      end if
    end do
  end block

write(*,'(A)') ''
  write(*,'(A)') '========================================================'

end program codon_usage_analysis

Click Run to execute the Fortran code

Code will be compiled with gfortran and executed on the server

Summary: Key Factors in Translation

Phase	Prokaryotic Factors	Eukaryotic Factors
Initiation	IF1, IF2 (GTPase), IF3; SD sequence; fMet-tRNA_f	eIF1, 1A, 2, 2B, 3, 4A, 4B, 4E, 4G, 5, 5B; cap + scanning; Kozak; Met-tRNA_i
Elongation	EF-Tu (GTPase), EF-Ts (GEF), EF-G (GTPase)	eEF1A (GTPase), eEF1B (GEF), eEF2 (GTPase)
Termination	RF1 (UAA/UAG), RF2 (UAA/UGA), RF3 (GTPase)	eRF1 (all stops), eRF3 (GTPase)
Recycling	RRF + EF-G + IF3	ABCE1 (Rli1), Ligatin
Quality Control	tmRNA/SmpB, ArfA, ArfB	NMD (UPF1/2/3), NGD (Dom34/Hbs1), NSD, RQC (Ltn1/RQC2)

Antibiotics Targeting Translation

The structural differences between bacterial 70S and eukaryotic 80S ribosomes make translation a prime target for antibiotics. Many clinically important antibiotics exploit these differences.

30S Subunit Targets

Tetracyclines: Block A-site tRNA binding
Aminoglycosides (streptomycin, gentamicin): Bind 16S rRNA near A site; cause misreading by distorting decoding center (A1492/A1493 locked in flipped-out state)
Spectinomycin: Inhibits EF-G-driven translocation; binds 30S head (h34)
Kasugamycin: Blocks initiator tRNA binding to P site

50S Subunit Targets

Chloramphenicol: Binds PTC A-site crevice; blocks aminoacyl-tRNA accommodation
Macrolides (erythromycin, azithromycin): Bind in the peptide exit tunnel (near L4/L22 constriction); block elongation after 6-8 amino acids
Lincosamides (clindamycin): Overlap with macrolide binding site in PTC
Oxazolidinones (linezolid): Bind 50S A site; interfere with initiator tRNA positioning
Fusidic acid: Prevents EF-G-GDP release; blocks translocation

Polyribosomes and Translational Regulation

Polyribosomes (Polysomes)

Multiple ribosomes simultaneously translate a single mRNA
Inter-ribosome spacing: ~80-100 nt (one ribosome per ~30 codons)
Maximum packing: ~1 ribosome per 80 nt of ORF
Free polysomes: synthesize cytoplasmic/nuclear proteins
Membrane-bound polysomes (rough ER): synthesize secreted, membrane, and organellar proteins
In prokaryotes: coupled transcription-translation (RNA polymerase leads, ribosomes follow immediately)

Key Regulatory Mechanisms

mTOR pathway: Phosphorylates 4E-BP (releases eIF4E) and S6K (activates eIF4B, S6)
eIF2-alpha phosphorylation: ISR kinases (GCN2, PERK, HRI, PKR) inhibit global translation but activate ATF4 translation via uORFs
microRNAs: Recruit RISC/Argonaute to mRNA 3'-UTR; repress translation and promote decay
Iron response elements: IRE-IRP system; IRP binding to 5'-UTR IRE blocks scanning (ferritin); IRP binding to 3'-UTR stabilizes mRNA (TfR)
Ribosome heterogeneity: Specialized ribosomes with distinct rRNA/protein compositions may preferentially translate subsets of mRNAs

Derivation: Polysome Density from Initiation/Elongation Ratio

Starting from the rates of translation initiation and elongation, we derive the number of ribosomes simultaneously translating a single mRNA (polysome density).

Step 1: Define the ribosome loading rate

Ribosomes initiate translation at the 5′ end of the mRNA at rate $k_{\text{init}}$ (ribosomes per second). Once initiated, each ribosome moves along the ORF at elongation speed $v_{\text{elong}}$ (codons per second).

Step 2: Ribosome transit time

For an ORF of length $L$ codons, the time for a ribosome to traverse the entire mRNA is:

$$\tau_{\text{transit}} = \frac{L}{v_{\text{elong}}}$$

For a 300-codon ORF at 6 aa/s (eukaryotic): $\tau = 50$ s.

Step 3: Steady-state number of ribosomes per mRNA

By Little's law (queueing theory), the average number of ribosomes on an mRNA equals the loading rate times the transit time:

$$\langle N_{\text{rib}} \rangle = k_{\text{init}} \times \tau_{\text{transit}} = \frac{k_{\text{init}} \cdot L}{v_{\text{elong}}}$$

Step 4: Linear density of ribosomes

The linear density (ribosomes per codon of mRNA) is:

$$\rho = \frac{k_{\text{init}}}{v_{\text{elong}}} \quad \text{(ribosomes per codon)}$$

A ribosome occupies ~30 nt (10 codons) of mRNA, so the maximum density is $\rho_{\max} = 1/10$ per codon. Initiation cannot exceed $k_{\text{init}}^{\max} = v_{\text{elong}}/10$ without causing queuing.

Step 5: Protein production rate per mRNA

The rate of completed proteins from one mRNA equals the initiation rate (in steady state, each initiated ribosome eventually produces one protein):

$$\frac{d[\text{protein}]}{dt}\bigg|_{\text{per mRNA}} = k_{\text{init}}$$

Total cellular protein production rate: $k_{\text{init}} \times [\text{mRNA}]$.

Step 6: Numerical examples

Highly expressed mRNA in E. coli: $k_{\text{init}} \approx 1$ s$^{-1}$, $v_{\text{elong}} = 15$ aa/s, $L = 300$ codons. Then $\langle N_{\text{rib}} \rangle = 300/15 = 20$ ribosomes per mRNA, $\rho = 1/15 \approx 0.07$ per codon. Electron micrographs of polysomes confirm 10-70 ribosomes on highly expressed mRNAs. Eukaryotic average: $k_{\text{init}} \approx 0.1$ s$^{-1}$, $v = 6$ aa/s, giving $\langle N \rangle \approx 5$ ribosomes per mRNA.

Derivation: Energy Cost of Protein Synthesis

Starting from the individual steps of translation, we derive the total number of high-energy phosphate bonds consumed per amino acid incorporated into a polypeptide.

Step 1: Aminoacyl-tRNA synthesis (charging)

Aminoacyl-tRNA synthetase activates the amino acid using ATP:

$$\text{AA} + \text{ATP} \rightarrow \text{AA-AMP} + \text{PP}_i$$

The PP$_i$ is hydrolyzed by pyrophosphatase: $\text{PP}_i \rightarrow 2\text{P}_i$. This makes the reaction irreversible. Net cost: 2 high-energy phosphate bonds (ATP $\rightarrow$ AMP + 2P$_i$, equivalent to 2 ATP $\rightarrow$ 2 ADP).

Step 2: EF-Tu GTP hydrolysis (A-site delivery)

The EF-Tu-GTP-aa-tRNA ternary complex delivers the aminoacyl-tRNA to the ribosomal A site. Codon recognition triggers GTP hydrolysis:

$$\text{EF-Tu-GTP} \rightarrow \text{EF-Tu-GDP} + \text{P}_i$$

Cost: 1 high-energy phosphate bond. EF-Ts then recycles EF-Tu-GDP back to EF-Tu-GTP (no additional NTP cost).

Step 3: EF-G GTP hydrolysis (translocation)

After peptide bond formation, EF-G-GTP binds the ribosome and hydrolyzes GTP to drive translocation of the mRNA-tRNA complex by one codon:

$$\text{EF-G-GTP} \rightarrow \text{EF-G-GDP} + \text{P}_i$$

Cost: 1 high-energy phosphate bond.

Step 4: Total per amino acid

Summing all steps for one elongation cycle:

$$\text{Total} = \underbrace{2}_{\text{charging}} + \underbrace{1}_{\text{EF-Tu}} + \underbrace{1}_{\text{EF-G}} = 4 \text{ high-energy phosphate bonds per amino acid}$$

Step 5: Energy in thermodynamic terms

Each high-energy phosphate bond hydrolysis releases $\Delta G \approx -7.3$ kcal/mol under cellular conditions. The total energy invested per amino acid:

$$\Delta G_{\text{total}} = 4 \times 7.3 = 29.2 \text{ kcal/mol per amino acid}$$

Since peptide bond formation itself is only slightly endergonic ($\Delta G \approx +0.5$ kcal/mol), the process is driven far from equilibrium, making translation essentially irreversible.

Step 6: Cost of a complete protein and cellular energy budget

For a typical 300-amino-acid protein:

$$\text{Cost} = 300 \times 4 = 1{,}200 \text{ NTP equivalents} \approx 8{,}760 \text{ kcal/mol}$$

An E. coli cell growing with a 30-minute doubling time synthesizes ~$2 \times 10^6$ proteins per generation, consuming ~$2.4 \times 10^9$ ATP equivalents. This represents ~75% of cellular energy expenditure, explaining why translation is the dominant energy sink and why translational regulation is crucial for cellular economy.

← Part 5: Transcription Part 7: Protein Structure →

Share:X Reddit LinkedIn