11. Transcription & Translation
Reading time: ~50 minutes | Key topics: Central dogma, RNA polymerase, mRNA processing, the genetic code, ribosome structure, protein synthesis, post-translational modifications
The Central Dogma
Francis Crick articulated the central dogma of molecular biology in 1958, describing the directional flow of genetic information in biological systems:
DNA → RNA → Protein
(Replication) (Transcription) (Translation)
Transcription is the process by which the nucleotide sequence of one strand of DNA is used as a template to synthesize a complementary RNA molecule. Translation is the process by which the nucleotide sequence of mRNA directs the assembly of amino acids into a polypeptide chain on the ribosome.
Exceptions to the Central Dogma
- Reverse transcription: Retroviruses (e.g., HIV) use reverse transcriptase to synthesize DNA from an RNA template (RNA → DNA)
- RNA replication: RNA viruses (e.g., influenza, SARS-CoV-2) use RNA-dependent RNA polymerase to copy RNA directly (RNA → RNA)
- Prions: Protein → Protein transmission of conformational information (no nucleic acid template)
The central dogma remains the foundational framework for understanding gene expression, even as exceptions continue to enrich our understanding of information flow in biology. Note that information transfer from protein back to nucleic acid (Protein → DNA or Protein → RNA) has never been observed.
Prokaryotic Transcription
In prokaryotes, a single RNA polymerase catalyzes the synthesis of all types of RNA (mRNA, rRNA, tRNA). The holoenzyme consists of a core enzyme plus a sigma factor:
RNA Polymerase Subunit Composition
| Subunit | Copies | Function |
|---|---|---|
| α | 2 | Assembly, regulatory factor binding, UP element recognition |
| β | 1 | Catalytic center (NTP binding, phosphodiester bond formation) |
| β′ | 1 | DNA template binding |
| σ | 1 | Promoter recognition and binding (released after initiation) |
Core enzyme: $\alpha_2\beta\beta'$ | Holoenzyme: $\alpha_2\beta\beta'\sigma$ (total MW ~449 kDa)
Promoter Recognition
The sigma factor ($\sigma^{70}$ is the primary sigma in E. coli) recognizes two conserved promoter elements:
- -10 region (Pribnow box): Consensus sequence TATAAT (AT-rich, facilitates strand separation)
- -35 region: Consensus sequence TTGACA (initial contact point for sigma)
Spacing between the -35 and -10 elements is critical: optimally 17 bp. Transcription initiates at the +1 position. The transcription bubble spans approximately 17 bp of unwound DNA.
Phases of Transcription
Initiation
Holoenzyme binds promoter → closed complex → open complex (DNA melting at -10 region) → first phosphodiester bonds formed → sigma factor released after ~8-10 nt synthesized → promoter clearance.
Elongation
Core enzyme moves along template strand 3′→5′, synthesizing RNA 5′→3′ at 40-50 nucleotides per second. Proofreading by pyrophosphorolytic editing and hydrolytic editing.
Termination
Rho-independent (intrinsic): GC-rich hairpin followed by poly-U tract destabilizes the RNA-DNA hybrid. Rho-dependent: Rho helicase translocates along mRNA (5′→3′), catches up to paused polymerase, and unwinds the RNA-DNA hybrid using ATP hydrolysis.
Energetics of RNA Synthesis
Each nucleotide addition is driven by NTP hydrolysis. The overall reaction for incorporating one nucleotide:
The subsequent hydrolysis of pyrophosphate (PPi) by inorganic pyrophosphatase renders the reaction essentially irreversible, driving transcription forward.
Eukaryotic Transcription
Eukaryotes use three distinct RNA polymerases, each responsible for different classes of RNA:
| Polymerase | RNA Product | Sensitivity |
|---|---|---|
| RNA Pol I | rRNA (28S, 18S, 5.8S) | Insensitive to α-amanitin |
| RNA Pol II | mRNA, most snRNAs, miRNAs | Very sensitive to α-amanitin |
| RNA Pol III | tRNA, 5S rRNA, U6 snRNA | Moderately sensitive to α-amanitin |
General Transcription Factors for RNA Pol II
Unlike prokaryotic RNA polymerase, eukaryotic RNA Pol II cannot bind promoters directly. It requires assembly of a pre-initiation complex (PIC) composed of general transcription factors (GTFs):
- TFIID (TBP + TAFs): TBP (TATA-binding protein) recognizes and binds the TATA box (~25 bp upstream of +1); TAFs (TBP-associated factors) recognize other promoter elements
- TFIIB: Bridges TFIID and RNA Pol II; determines start site selection
- TFIIF: Escorts RNA Pol II to the promoter; stabilizes Pol II-TFIIB interaction
- TFIIE: Recruits TFIIH; modulates TFIIH helicase activity
- TFIIH: Contains helicase (XPB, XPD subunits) for promoter melting and kinase (CDK7) for phosphorylation of the Pol II C-terminal domain (CTD)
The Mediator Complex
The Mediator is a large multi-subunit complex (~30 subunits in humans) that serves as a bridge between gene-specific transcription factors (activators/repressors) bound at enhancers and the PIC at the promoter. It integrates regulatory signals and modulates the rate of transcription initiation. The CTD of RNA Pol II cycles through phosphorylation states (Ser5-P during initiation by TFIIH/CDK7; Ser2-P during elongation by P-TEFb/CDK9) to coordinate mRNA processing events.
Post-Transcriptional Processing
Eukaryotic pre-mRNA undergoes three major processing events before export from the nucleus. These modifications occur co-transcriptionally, coordinated by the phosphorylated CTD of RNA Pol II:
5′ Capping
A 7-methylguanosine (m7G) cap is added to the 5′ end via an unusual 5′-5′ triphosphate bridge. This occurs after the first ~25-30 nucleotides are synthesized.
Functions: Protection from 5′ exonucleases, ribosome recognition during translation initiation (eIF4E binding), splicing of the first intron, nuclear export.
3′ Polyadenylation
The polyadenylation signal AAUAAA (in the pre-mRNA) is recognized by CPSF (cleavage and polyadenylation specificity factor). The pre-mRNA is cleaved ~10-30 nt downstream, and poly(A) polymerase (PAP) adds a tail of ~200 adenine residues (no template required).
Functions: mRNA stability (bound by PABP), nuclear export, translation efficiency. Poly(A) tail length decreases over time (deadenylation is a key step in mRNA decay).
RNA Splicing
Introns (intervening sequences) are removed and exons (expressed sequences) are joined by the spliceosome, a large ribonucleoprotein complex composed of five snRNPs:
Splicing proceeds via two transesterification reactions, producing a lariat intermediate. Key splice site signals: 5′ splice site (GU), branch point (A), and 3′ splice site (AG). The spliceosome is a ribozyme — catalysis is performed by the RNA components (U2 and U6 snRNAs).
Alternative splicing allows a single gene to produce multiple mRNA variants (and thus multiple protein isoforms), dramatically increasing proteome diversity. In humans, >95% of multi-exon genes undergo alternative splicing.
Self-Splicing Introns
Group I introns: Use an external guanosine nucleophile as a cofactor. Found in rRNA genes of Tetrahymena, fungal mitochondria.
Group II introns: Use an internal branch-point adenosine to form a lariat (mechanistically similar to spliceosomal splicing). Found in organellar genomes. Group II introns are likely the evolutionary ancestors of spliceosomal introns.
The Genetic Code
The genetic code is the set of rules by which the nucleotide sequence of mRNA is translated into the amino acid sequence of a protein. It was deciphered in the early 1960s through the work of Nirenberg, Matthaei, Khorana, and Holley.
Properties of the Genetic Code
- Triplet: Three nucleotides (a codon) specify one amino acid
- Non-overlapping: Codons are read sequentially without sharing nucleotides
- Comma-free: No gaps or punctuation between codons
- Degenerate: Most amino acids are encoded by 2-6 different codons (61 sense codons for 20 amino acids)
- Unambiguous: Each codon specifies only one amino acid
- Nearly universal: The same code is used by virtually all organisms (exceptions: mitochondria, some ciliates like Tetrahymena, Mycoplasma)
Start and Stop Codons
Start codon: AUG (encodes methionine; fMet in prokaryotes). Sets the reading frame.
Stop codons: UAA (ochre), UAG (amber), UGA (opal/umber). Recognized by release factors, not tRNAs.
Wobble Hypothesis
Crick's wobble hypothesis (1966) explains how fewer than 61 tRNA species can decode all 61 sense codons. The first two codon-anticodon base pairs follow strict Watson-Crick rules, but the third position (3′ end of the codon, 5′ end of the anticodon) tolerates non-standard base pairing:
Inosine (I) at the 5′ anticodon position can pair with U, C, or A at the 3′ codon position. G at the 5′ anticodon can pair with U or C. This wobble pairing explains code degeneracy and why the third codon position is the most variable.
Translation: Protein Synthesis
Ribosome Structure
| Feature | Prokaryotes | Eukaryotes |
|---|---|---|
| Complete ribosome | 70S | 80S |
| Small subunit | 30S (16S rRNA + 21 proteins) | 40S (18S rRNA + 33 proteins) |
| Large subunit | 50S (23S + 5S rRNA + 31 proteins) | 60S (28S + 5.8S + 5S rRNA + 49 proteins) |
| Functional sites | A (aminoacyl), P (peptidyl), E (exit) | |
Aminoacyl-tRNA Synthetases
Before translation, each amino acid must be activated and attached to its cognate tRNA by aminoacyl-tRNA synthetases (one for each amino acid, 20 in total). This two-step reaction consumes 2 ATP equivalents:
The energy of the aminoacyl ester bond is later used to drive peptide bond formation. Synthetases have proofreading (editing) activity to ensure fidelity, with an error rate of ~1 in 10,000.
Initiation
Prokaryotes
Shine-Dalgarno sequence (5′-AGGAGG-3′) in the 5′ UTR of mRNA base-pairs with the 3′ end of 16S rRNA, positioning the AUG start codon at the P site. Initiator tRNA: fMet-tRNAfMet. Initiation factors: IF1, IF2 (GTPase), IF3.
Eukaryotes
Kozak sequence (5′-ACCAUGG-3′) surrounds the start codon. 40S subunit binds the 5′ cap (via eIF4E/eIF4G) and scans 5′→3′ until it finds the first AUG in a good Kozak context. Initiator: Met-tRNAiMet. Requires ~12 eIFs.
Elongation Cycle
The elongation cycle consists of three steps that repeat for each amino acid added:
1. Aminoacyl-tRNA delivery: EF-Tu•GTP delivers aminoacyl-tRNA to the A site. Correct codon-anticodon match triggers GTP hydrolysis, EF-Tu•GDP release, and accommodation of the tRNA. EF-Ts recycles EF-Tu.
2. Peptide bond formation: Catalyzed by the peptidyl transferase center in the 23S rRNA (large subunit). This makes the ribosome a ribozyme. The growing peptide chain is transferred from the P-site tRNA to the amino acid on the A-site tRNA.
3. Translocation: EF-G•GTP drives movement of the ribosome one codon along the mRNA (5′→3′). A-site tRNA moves to P site, P-site tRNA to E site, E-site tRNA is ejected. Costs 1 GTP.
Energy Cost of Translation
Translation is energetically expensive. The total cost per amino acid incorporated:
For a 300-residue protein: ~1,200 high-energy phosphate bonds. This does not include the cost of ribosome assembly, mRNA synthesis, or initiation/termination factors.
Post-Translational Modifications & Protein Targeting
Newly synthesized polypeptides must fold correctly and often undergo covalent modifications to become functional proteins. Many proteins are also targeted to specific subcellular compartments.
Signal Peptides and the Secretory Pathway
Proteins destined for the ER, Golgi, plasma membrane, lysosomes, or secretion contain an N-terminal signal peptide (~16-30 hydrophobic residues). The signal recognition particle (SRP) binds the signal peptide as it emerges from the ribosome, halts translation, and targets the ribosome-mRNA complex to the ER membrane via the SRP receptor. Translation resumes with the polypeptide threaded through the Sec61 translocon into the ER lumen.
The Secretory Pathway:
ER (folding, N-glycosylation, disulfide bonds) → ERGIC → cis-Golgi → medial-Golgi (glycan processing) → trans-Golgi (sorting) → Plasma membrane / Secretory vesicles / Lysosomes
Glycosylation
N-linked Glycosylation
Occurs in the ER. Oligosaccharyltransferase (OST) transfers a preassembled 14-sugar oligosaccharide (Glc3Man9GlcNAc2) from dolichol phosphate to the amide nitrogen of Asn in the sequon Asn-X-Ser/Thr (where X is any amino acid except Pro).
O-linked Glycosylation
Occurs primarily in the Golgi. Sugars (commonly GalNAc) are added one at a time to the hydroxyl oxygen of Ser or Thr residues. No consensus sequence. Important in mucins and extracellular matrix proteins.
Proteolytic Processing
Many proteins are synthesized as inactive precursors (zymogens/proenzymes) that require proteolytic cleavage for activation. Examples: insulin is synthesized as preproinsulin (signal peptide removal → proinsulin → C-peptide excision → mature insulin with A and B chains linked by disulfide bonds); digestive enzymes (trypsinogen → trypsin; chymotrypsinogen → chymotrypsin); blood clotting factors (coagulation cascade).
Protein Degradation: The Ubiquitin-Proteasome Pathway
Damaged, misfolded, or regulatory proteins are tagged for destruction by the covalent attachment of ubiquitin (a 76-amino acid protein). The process requires E1 (activating), E2 (conjugating), and E3 (ligase) enzymes, consuming 1 ATP per ubiquitin attached. Polyubiquitinated proteins (chains of $\geq 4$ ubiquitins linked via Lys48) are recognized and degraded by the 26S proteasome (a barrel-shaped protease complex) into small peptides, with ubiquitin recycled.
Key Concepts Summary
Central Dogma: DNA → RNA → Protein. Exceptions include reverse transcription (retroviruses) and RNA replication (RNA viruses).
Prokaryotic Transcription: Single RNA polymerase ($\alpha_2\beta\beta'\sigma$) recognizes -10 (TATAAT) and -35 (TTGACA) promoter elements via sigma factor. Elongation at 40-50 nt/s. Termination: Rho-dependent or intrinsic.
Eukaryotic Transcription: Three RNA polymerases (I, II, III). Pol II requires GTFs (TFIID, B, F, E, H) and Mediator for initiation at TATA box promoters.
mRNA Processing: 5′ m7G cap, 3′ poly(A) tail (~200 A's), and splicing (spliceosome removes introns, joins exons). Alternative splicing increases proteome diversity.
Genetic Code: 64 codons (61 sense + 3 stop). Degenerate, unambiguous, nearly universal. Wobble at 3rd position reduces the number of tRNAs needed.
Translation: 70S (prokaryotes) / 80S (eukaryotes) ribosomes with A, P, E sites. Cost: 4 high-energy phosphate bonds per amino acid. Peptidyl transferase is a ribozyme (23S rRNA).
Post-Translational: Signal peptides direct proteins to ER. N- and O-linked glycosylation. Proteolytic activation (zymogens). Ubiquitin-proteasome degradation pathway.