Biochemistry/Part III/Transcription & Translation

11. Transcription & Translation

Reading time: ~50 minutes | Key topics: Central dogma, RNA polymerase, mRNA processing, the genetic code, ribosome structure, protein synthesis, post-translational modifications

The Central Dogma

Francis Crick articulated the central dogma of molecular biology in 1958, describing the directional flow of genetic information in biological systems:

DNA → RNA → Protein

(Replication)     (Transcription)     (Translation)

Transcription is the process by which the nucleotide sequence of one strand of DNA is used as a template to synthesize a complementary RNA molecule. Translation is the process by which the nucleotide sequence of mRNA directs the assembly of amino acids into a polypeptide chain on the ribosome.

Exceptions to the Central Dogma

  • Reverse transcription: Retroviruses (e.g., HIV) use reverse transcriptase to synthesize DNA from an RNA template (RNA → DNA)
  • RNA replication: RNA viruses (e.g., influenza, SARS-CoV-2) use RNA-dependent RNA polymerase to copy RNA directly (RNA → RNA)
  • Prions: Protein → Protein transmission of conformational information (no nucleic acid template)

The central dogma remains the foundational framework for understanding gene expression, even as exceptions continue to enrich our understanding of information flow in biology. Note that information transfer from protein back to nucleic acid (Protein → DNA or Protein → RNA) has never been observed.

Prokaryotic Transcription

In prokaryotes, a single RNA polymerase catalyzes the synthesis of all types of RNA (mRNA, rRNA, tRNA). The holoenzyme consists of a core enzyme plus a sigma factor:

RNA Polymerase Subunit Composition

SubunitCopiesFunction
α2Assembly, regulatory factor binding, UP element recognition
β1Catalytic center (NTP binding, phosphodiester bond formation)
β′1DNA template binding
σ1Promoter recognition and binding (released after initiation)

Core enzyme: $\alpha_2\beta\beta'$ | Holoenzyme: $\alpha_2\beta\beta'\sigma$ (total MW ~449 kDa)

Promoter Recognition

The sigma factor ($\sigma^{70}$ is the primary sigma in E. coli) recognizes two conserved promoter elements:

  • -10 region (Pribnow box): Consensus sequence TATAAT (AT-rich, facilitates strand separation)
  • -35 region: Consensus sequence TTGACA (initial contact point for sigma)

Spacing between the -35 and -10 elements is critical: optimally 17 bp. Transcription initiates at the +1 position. The transcription bubble spans approximately 17 bp of unwound DNA.

Phases of Transcription

Initiation

Holoenzyme binds promoter → closed complex → open complex (DNA melting at -10 region) → first phosphodiester bonds formed → sigma factor released after ~8-10 nt synthesized → promoter clearance.

Elongation

Core enzyme moves along template strand 3′→5′, synthesizing RNA 5′→3′ at 40-50 nucleotides per second. Proofreading by pyrophosphorolytic editing and hydrolytic editing.

Termination

Rho-independent (intrinsic): GC-rich hairpin followed by poly-U tract destabilizes the RNA-DNA hybrid. Rho-dependent: Rho helicase translocates along mRNA (5′→3′), catches up to paused polymerase, and unwinds the RNA-DNA hybrid using ATP hydrolysis.

Energetics of RNA Synthesis

Each nucleotide addition is driven by NTP hydrolysis. The overall reaction for incorporating one nucleotide:

$$\text{NTP} \rightarrow \text{NMP}_{\text{(RNA)}} + \text{PP}_i, \quad \Delta G^{\circ'} \approx -33.5 \text{ kJ/mol (with PP}_i \text{ hydrolysis)}$$

The subsequent hydrolysis of pyrophosphate (PPi) by inorganic pyrophosphatase renders the reaction essentially irreversible, driving transcription forward.

Eukaryotic Transcription

Eukaryotes use three distinct RNA polymerases, each responsible for different classes of RNA:

PolymeraseRNA ProductSensitivity
RNA Pol IrRNA (28S, 18S, 5.8S)Insensitive to α-amanitin
RNA Pol IImRNA, most snRNAs, miRNAsVery sensitive to α-amanitin
RNA Pol IIItRNA, 5S rRNA, U6 snRNAModerately sensitive to α-amanitin

General Transcription Factors for RNA Pol II

Unlike prokaryotic RNA polymerase, eukaryotic RNA Pol II cannot bind promoters directly. It requires assembly of a pre-initiation complex (PIC) composed of general transcription factors (GTFs):

  • TFIID (TBP + TAFs): TBP (TATA-binding protein) recognizes and binds the TATA box (~25 bp upstream of +1); TAFs (TBP-associated factors) recognize other promoter elements
  • TFIIB: Bridges TFIID and RNA Pol II; determines start site selection
  • TFIIF: Escorts RNA Pol II to the promoter; stabilizes Pol II-TFIIB interaction
  • TFIIE: Recruits TFIIH; modulates TFIIH helicase activity
  • TFIIH: Contains helicase (XPB, XPD subunits) for promoter melting and kinase (CDK7) for phosphorylation of the Pol II C-terminal domain (CTD)

The Mediator Complex

The Mediator is a large multi-subunit complex (~30 subunits in humans) that serves as a bridge between gene-specific transcription factors (activators/repressors) bound at enhancers and the PIC at the promoter. It integrates regulatory signals and modulates the rate of transcription initiation. The CTD of RNA Pol II cycles through phosphorylation states (Ser5-P during initiation by TFIIH/CDK7; Ser2-P during elongation by P-TEFb/CDK9) to coordinate mRNA processing events.

Post-Transcriptional Processing

Eukaryotic pre-mRNA undergoes three major processing events before export from the nucleus. These modifications occur co-transcriptionally, coordinated by the phosphorylated CTD of RNA Pol II:

5′ Capping

A 7-methylguanosine (m7G) cap is added to the 5′ end via an unusual 5′-5′ triphosphate bridge. This occurs after the first ~25-30 nucleotides are synthesized.

Functions: Protection from 5′ exonucleases, ribosome recognition during translation initiation (eIF4E binding), splicing of the first intron, nuclear export.

3′ Polyadenylation

The polyadenylation signal AAUAAA (in the pre-mRNA) is recognized by CPSF (cleavage and polyadenylation specificity factor). The pre-mRNA is cleaved ~10-30 nt downstream, and poly(A) polymerase (PAP) adds a tail of ~200 adenine residues (no template required).

Functions: mRNA stability (bound by PABP), nuclear export, translation efficiency. Poly(A) tail length decreases over time (deadenylation is a key step in mRNA decay).

RNA Splicing

Introns (intervening sequences) are removed and exons (expressed sequences) are joined by the spliceosome, a large ribonucleoprotein complex composed of five snRNPs:

U1 snRNPU2 snRNPU4 snRNPU5 snRNPU6 snRNP

Splicing proceeds via two transesterification reactions, producing a lariat intermediate. Key splice site signals: 5′ splice site (GU), branch point (A), and 3′ splice site (AG). The spliceosome is a ribozyme — catalysis is performed by the RNA components (U2 and U6 snRNAs).

Alternative splicing allows a single gene to produce multiple mRNA variants (and thus multiple protein isoforms), dramatically increasing proteome diversity. In humans, >95% of multi-exon genes undergo alternative splicing.

Self-Splicing Introns

Group I introns: Use an external guanosine nucleophile as a cofactor. Found in rRNA genes of Tetrahymena, fungal mitochondria.
Group II introns: Use an internal branch-point adenosine to form a lariat (mechanistically similar to spliceosomal splicing). Found in organellar genomes. Group II introns are likely the evolutionary ancestors of spliceosomal introns.

The Genetic Code

The genetic code is the set of rules by which the nucleotide sequence of mRNA is translated into the amino acid sequence of a protein. It was deciphered in the early 1960s through the work of Nirenberg, Matthaei, Khorana, and Holley.

$$4^3 = 64 \text{ codons for 20 amino acids + stop signals}$$

Properties of the Genetic Code

  • Triplet: Three nucleotides (a codon) specify one amino acid
  • Non-overlapping: Codons are read sequentially without sharing nucleotides
  • Comma-free: No gaps or punctuation between codons
  • Degenerate: Most amino acids are encoded by 2-6 different codons (61 sense codons for 20 amino acids)
  • Unambiguous: Each codon specifies only one amino acid
  • Nearly universal: The same code is used by virtually all organisms (exceptions: mitochondria, some ciliates like Tetrahymena, Mycoplasma)

Start and Stop Codons

Start codon: AUG (encodes methionine; fMet in prokaryotes). Sets the reading frame.
Stop codons: UAA (ochre), UAG (amber), UGA (opal/umber). Recognized by release factors, not tRNAs.

Wobble Hypothesis

Crick's wobble hypothesis (1966) explains how fewer than 61 tRNA species can decode all 61 sense codons. The first two codon-anticodon base pairs follow strict Watson-Crick rules, but the third position (3′ end of the codon, 5′ end of the anticodon) tolerates non-standard base pairing:

Inosine (I) at the 5′ anticodon position can pair with U, C, or A at the 3′ codon position. G at the 5′ anticodon can pair with U or C. This wobble pairing explains code degeneracy and why the third codon position is the most variable.

Translation: Protein Synthesis

Ribosome Structure

FeatureProkaryotesEukaryotes
Complete ribosome70S80S
Small subunit30S (16S rRNA + 21 proteins)40S (18S rRNA + 33 proteins)
Large subunit50S (23S + 5S rRNA + 31 proteins)60S (28S + 5.8S + 5S rRNA + 49 proteins)
Functional sitesA (aminoacyl), P (peptidyl), E (exit)

Aminoacyl-tRNA Synthetases

Before translation, each amino acid must be activated and attached to its cognate tRNA by aminoacyl-tRNA synthetases (one for each amino acid, 20 in total). This two-step reaction consumes 2 ATP equivalents:

$$\text{Amino acid} + \text{tRNA} + \text{ATP} \rightarrow \text{Aminoacyl-tRNA} + \text{AMP} + \text{PP}_i$$

The energy of the aminoacyl ester bond is later used to drive peptide bond formation. Synthetases have proofreading (editing) activity to ensure fidelity, with an error rate of ~1 in 10,000.

Initiation

Prokaryotes

Shine-Dalgarno sequence (5′-AGGAGG-3′) in the 5′ UTR of mRNA base-pairs with the 3′ end of 16S rRNA, positioning the AUG start codon at the P site. Initiator tRNA: fMet-tRNAfMet. Initiation factors: IF1, IF2 (GTPase), IF3.

Eukaryotes

Kozak sequence (5′-ACCAUGG-3′) surrounds the start codon. 40S subunit binds the 5′ cap (via eIF4E/eIF4G) and scans 5′→3′ until it finds the first AUG in a good Kozak context. Initiator: Met-tRNAiMet. Requires ~12 eIFs.

Elongation Cycle

The elongation cycle consists of three steps that repeat for each amino acid added:

1. Aminoacyl-tRNA delivery: EF-Tu•GTP delivers aminoacyl-tRNA to the A site. Correct codon-anticodon match triggers GTP hydrolysis, EF-Tu•GDP release, and accommodation of the tRNA. EF-Ts recycles EF-Tu.

2. Peptide bond formation: Catalyzed by the peptidyl transferase center in the 23S rRNA (large subunit). This makes the ribosome a ribozyme. The growing peptide chain is transferred from the P-site tRNA to the amino acid on the A-site tRNA.

3. Translocation: EF-G•GTP drives movement of the ribosome one codon along the mRNA (5′→3′). A-site tRNA moves to P site, P-site tRNA to E site, E-site tRNA is ejected. Costs 1 GTP.

Energy Cost of Translation

Translation is energetically expensive. The total cost per amino acid incorporated:

$$\underbrace{2 \text{ ATP}}_{\text{aminoacyl-tRNA charging}} + \underbrace{1 \text{ GTP}}_{\text{EF-Tu delivery}} + \underbrace{1 \text{ GTP}}_{\text{EF-G translocation}} = 4 \text{ high-energy phosphate bonds per residue}$$

For a 300-residue protein: ~1,200 high-energy phosphate bonds. This does not include the cost of ribosome assembly, mRNA synthesis, or initiation/termination factors.

Post-Translational Modifications & Protein Targeting

Newly synthesized polypeptides must fold correctly and often undergo covalent modifications to become functional proteins. Many proteins are also targeted to specific subcellular compartments.

Signal Peptides and the Secretory Pathway

Proteins destined for the ER, Golgi, plasma membrane, lysosomes, or secretion contain an N-terminal signal peptide (~16-30 hydrophobic residues). The signal recognition particle (SRP) binds the signal peptide as it emerges from the ribosome, halts translation, and targets the ribosome-mRNA complex to the ER membrane via the SRP receptor. Translation resumes with the polypeptide threaded through the Sec61 translocon into the ER lumen.

The Secretory Pathway:

ER (folding, N-glycosylation, disulfide bonds) → ERGIC → cis-Golgi → medial-Golgi (glycan processing) → trans-Golgi (sorting) → Plasma membrane / Secretory vesicles / Lysosomes

Glycosylation

N-linked Glycosylation

Occurs in the ER. Oligosaccharyltransferase (OST) transfers a preassembled 14-sugar oligosaccharide (Glc3Man9GlcNAc2) from dolichol phosphate to the amide nitrogen of Asn in the sequon Asn-X-Ser/Thr (where X is any amino acid except Pro).

O-linked Glycosylation

Occurs primarily in the Golgi. Sugars (commonly GalNAc) are added one at a time to the hydroxyl oxygen of Ser or Thr residues. No consensus sequence. Important in mucins and extracellular matrix proteins.

Proteolytic Processing

Many proteins are synthesized as inactive precursors (zymogens/proenzymes) that require proteolytic cleavage for activation. Examples: insulin is synthesized as preproinsulin (signal peptide removal → proinsulin → C-peptide excision → mature insulin with A and B chains linked by disulfide bonds); digestive enzymes (trypsinogen → trypsin; chymotrypsinogen → chymotrypsin); blood clotting factors (coagulation cascade).

Protein Degradation: The Ubiquitin-Proteasome Pathway

Damaged, misfolded, or regulatory proteins are tagged for destruction by the covalent attachment of ubiquitin (a 76-amino acid protein). The process requires E1 (activating), E2 (conjugating), and E3 (ligase) enzymes, consuming 1 ATP per ubiquitin attached. Polyubiquitinated proteins (chains of $\geq 4$ ubiquitins linked via Lys48) are recognized and degraded by the 26S proteasome (a barrel-shaped protease complex) into small peptides, with ubiquitin recycled.

Key Concepts Summary

Central Dogma: DNA → RNA → Protein. Exceptions include reverse transcription (retroviruses) and RNA replication (RNA viruses).

Prokaryotic Transcription: Single RNA polymerase ($\alpha_2\beta\beta'\sigma$) recognizes -10 (TATAAT) and -35 (TTGACA) promoter elements via sigma factor. Elongation at 40-50 nt/s. Termination: Rho-dependent or intrinsic.

Eukaryotic Transcription: Three RNA polymerases (I, II, III). Pol II requires GTFs (TFIID, B, F, E, H) and Mediator for initiation at TATA box promoters.

mRNA Processing: 5′ m7G cap, 3′ poly(A) tail (~200 A's), and splicing (spliceosome removes introns, joins exons). Alternative splicing increases proteome diversity.

Genetic Code: 64 codons (61 sense + 3 stop). Degenerate, unambiguous, nearly universal. Wobble at 3rd position reduces the number of tRNAs needed.

Translation: 70S (prokaryotes) / 80S (eukaryotes) ribosomes with A, P, E sites. Cost: 4 high-energy phosphate bonds per amino acid. Peptidyl transferase is a ribozyme (23S rRNA).

Post-Translational: Signal peptides direct proteins to ER. N- and O-linked glycosylation. Proteolytic activation (zymogens). Ubiquitin-proteasome degradation pathway.