Are the courses on CoursesHub.World free?

Yes, all courses on CoursesHub.World are completely free and open access. We believe in democratizing education and making university-level science courses available to everyone worldwide.

What subjects are covered on CoursesHub.World?

CoursesHub.World offers courses in physics (Quantum Mechanics, General Relativity, QFT, Plasma Physics, Cosmology), biology (Molecular Biology, Cell Physiology, Pharmacology), and earth sciences (Oceanography, Atmospheric Science, Climatology).

What level are the courses?

Our courses are designed at the graduate and advanced undergraduate level. They include rigorous mathematical derivations and are suitable for physics students, researchers, and serious self-learners.

Where do the video lectures come from?

Our 400+ video lectures come from world-renowned sources including MIT OpenCourseWare, Stanford, lectures by Nobel laureates, and other leading universities and educators.

Biochemistry/Part III/Transcription & Translation

11. Transcription & Translation

Reading time: ~50 minutes | Key topics: Central dogma, RNA polymerase, mRNA processing, the genetic code, ribosome structure, protein synthesis, post-translational modifications

The Central Dogma

Francis Crick articulated the central dogma of molecular biology in 1958, describing the directional flow of genetic information in biological systems:

DNA → RNA → Protein

(Replication) (Transcription) (Translation)

Transcription is the process by which the nucleotide sequence of one strand of DNA is used as a template to synthesize a complementary RNA molecule. Translation is the process by which the nucleotide sequence of mRNA directs the assembly of amino acids into a polypeptide chain on the ribosome.

Exceptions to the Central Dogma

Reverse transcription: Retroviruses (e.g., HIV) use reverse transcriptase to synthesize DNA from an RNA template (RNA → DNA)
RNA replication: RNA viruses (e.g., influenza, SARS-CoV-2) use RNA-dependent RNA polymerase to copy RNA directly (RNA → RNA)
Prions: Protein → Protein transmission of conformational information (no nucleic acid template)

The central dogma remains the foundational framework for understanding gene expression, even as exceptions continue to enrich our understanding of information flow in biology. Note that information transfer from protein back to nucleic acid (Protein → DNA or Protein → RNA) has never been observed.

Prokaryotic Transcription

In prokaryotes, a single RNA polymerase catalyzes the synthesis of all types of RNA (mRNA, rRNA, tRNA). The holoenzyme consists of a core enzyme plus a sigma factor:

RNA Polymerase Subunit Composition

Subunit	Copies	Function
α	2	Assembly, regulatory factor binding, UP element recognition
β	1	Catalytic center (NTP binding, phosphodiester bond formation)
β′	1	DNA template binding
σ	1	Promoter recognition and binding (released after initiation)

Core enzyme: $\alpha_2\beta\beta'$ | Holoenzyme: $\alpha_2\beta\beta'\sigma$ (total MW ~449 kDa)

Promoter Recognition

The sigma factor ($\sigma^{70}$ is the primary sigma in E. coli) recognizes two conserved promoter elements:

-10 region (Pribnow box): Consensus sequence TATAAT (AT-rich, facilitates strand separation)
-35 region: Consensus sequence TTGACA (initial contact point for sigma)

Spacing between the -35 and -10 elements is critical: optimally 17 bp. Transcription initiates at the +1 position. The transcription bubble spans approximately 17 bp of unwound DNA.

Phases of Transcription

Initiation

Holoenzyme binds promoter → closed complex → open complex (DNA melting at -10 region) → first phosphodiester bonds formed → sigma factor released after ~8-10 nt synthesized → promoter clearance.

Elongation

Core enzyme moves along template strand 3′→5′, synthesizing RNA 5′→3′ at 40-50 nucleotides per second. Proofreading by pyrophosphorolytic editing and hydrolytic editing.

Termination

Rho-independent (intrinsic): GC-rich hairpin followed by poly-U tract destabilizes the RNA-DNA hybrid. Rho-dependent: Rho helicase translocates along mRNA (5′→3′), catches up to paused polymerase, and unwinds the RNA-DNA hybrid using ATP hydrolysis.

Energetics of RNA Synthesis

Each nucleotide addition is driven by NTP hydrolysis. The overall reaction for incorporating one nucleotide:

$$\text{NTP} \rightarrow \text{NMP}_{\text{(RNA)}} + \text{PP}_i, \quad \Delta G^{\circ'} \approx -33.5 \text{ kJ/mol (with PP}_i \text{ hydrolysis)}$$

The subsequent hydrolysis of pyrophosphate (PP_i) by inorganic pyrophosphatase renders the reaction essentially irreversible, driving transcription forward.

Eukaryotic Transcription

Eukaryotes use three distinct RNA polymerases, each responsible for different classes of RNA:

Polymerase	RNA Product	Sensitivity
RNA Pol I	rRNA (28S, 18S, 5.8S)	Insensitive to α-amanitin
RNA Pol II	mRNA, most snRNAs, miRNAs	Very sensitive to α-amanitin
RNA Pol III	tRNA, 5S rRNA, U6 snRNA	Moderately sensitive to α-amanitin

General Transcription Factors for RNA Pol II

Unlike prokaryotic RNA polymerase, eukaryotic RNA Pol II cannot bind promoters directly. It requires assembly of a pre-initiation complex (PIC) composed of general transcription factors (GTFs):

TFIID (TBP + TAFs): TBP (TATA-binding protein) recognizes and binds the TATA box (~25 bp upstream of +1); TAFs (TBP-associated factors) recognize other promoter elements
TFIIB: Bridges TFIID and RNA Pol II; determines start site selection
TFIIF: Escorts RNA Pol II to the promoter; stabilizes Pol II-TFIIB interaction
TFIIE: Recruits TFIIH; modulates TFIIH helicase activity
TFIIH: Contains helicase (XPB, XPD subunits) for promoter melting and kinase (CDK7) for phosphorylation of the Pol II C-terminal domain (CTD)

The Mediator Complex

The Mediator is a large multi-subunit complex (~30 subunits in humans) that serves as a bridge between gene-specific transcription factors (activators/repressors) bound at enhancers and the PIC at the promoter. It integrates regulatory signals and modulates the rate of transcription initiation. The CTD of RNA Pol II cycles through phosphorylation states (Ser5-P during initiation by TFIIH/CDK7; Ser2-P during elongation by P-TEFb/CDK9) to coordinate mRNA processing events.

Post-Transcriptional Processing

Eukaryotic pre-mRNA undergoes three major processing events before export from the nucleus. These modifications occur co-transcriptionally, coordinated by the phosphorylated CTD of RNA Pol II:

5′ Capping

A 7-methylguanosine (m7G) cap is added to the 5′ end via an unusual 5′-5′ triphosphate bridge. This occurs after the first ~25-30 nucleotides are synthesized.

Functions: Protection from 5′ exonucleases, ribosome recognition during translation initiation (eIF4E binding), splicing of the first intron, nuclear export.

3′ Polyadenylation

The polyadenylation signal AAUAAA (in the pre-mRNA) is recognized by CPSF (cleavage and polyadenylation specificity factor). The pre-mRNA is cleaved ~10-30 nt downstream, and poly(A) polymerase (PAP) adds a tail of ~200 adenine residues (no template required).

Functions: mRNA stability (bound by PABP), nuclear export, translation efficiency. Poly(A) tail length decreases over time (deadenylation is a key step in mRNA decay).

RNA Splicing

Introns (intervening sequences) are removed and exons (expressed sequences) are joined by the spliceosome, a large ribonucleoprotein complex composed of five snRNPs:

U1 snRNPU2 snRNPU4 snRNPU5 snRNPU6 snRNP

Splicing proceeds via two transesterification reactions, producing a lariat intermediate. Key splice site signals: 5′ splice site (GU), branch point (A), and 3′ splice site (AG). The spliceosome is a ribozyme — catalysis is performed by the RNA components (U2 and U6 snRNAs).

Alternative splicing allows a single gene to produce multiple mRNA variants (and thus multiple protein isoforms), dramatically increasing proteome diversity. In humans, >95% of multi-exon genes undergo alternative splicing.

Self-Splicing Introns

Group I introns: Use an external guanosine nucleophile as a cofactor. Found in rRNA genes of Tetrahymena, fungal mitochondria.
Group II introns: Use an internal branch-point adenosine to form a lariat (mechanistically similar to spliceosomal splicing). Found in organellar genomes. Group II introns are likely the evolutionary ancestors of spliceosomal introns.

The Genetic Code

The genetic code is the set of rules by which the nucleotide sequence of mRNA is translated into the amino acid sequence of a protein. It was deciphered in the early 1960s through the work of Nirenberg, Matthaei, Khorana, and Holley.

$$4^3 = 64 \text{ codons for 20 amino acids + stop signals}$$

Properties of the Genetic Code

Triplet: Three nucleotides (a codon) specify one amino acid
Non-overlapping: Codons are read sequentially without sharing nucleotides
Comma-free: No gaps or punctuation between codons
Degenerate: Most amino acids are encoded by 2-6 different codons (61 sense codons for 20 amino acids)
Unambiguous: Each codon specifies only one amino acid
Nearly universal: The same code is used by virtually all organisms (exceptions: mitochondria, some ciliates like Tetrahymena, Mycoplasma)

Start and Stop Codons

Start codon: AUG (encodes methionine; fMet in prokaryotes). Sets the reading frame.
Stop codons: UAA (ochre), UAG (amber), UGA (opal/umber). Recognized by release factors, not tRNAs.

Wobble Hypothesis

Crick's wobble hypothesis (1966) explains how fewer than 61 tRNA species can decode all 61 sense codons. The first two codon-anticodon base pairs follow strict Watson-Crick rules, but the third position (3′ end of the codon, 5′ end of the anticodon) tolerates non-standard base pairing:

Inosine (I) at the 5′ anticodon position can pair with U, C, or A at the 3′ codon position. G at the 5′ anticodon can pair with U or C. This wobble pairing explains code degeneracy and why the third codon position is the most variable.

Translation: Protein Synthesis

Ribosome Structure

Feature	Prokaryotes	Eukaryotes
Complete ribosome	70S	80S
Small subunit	30S (16S rRNA + 21 proteins)	40S (18S rRNA + 33 proteins)
Large subunit	50S (23S + 5S rRNA + 31 proteins)	60S (28S + 5.8S + 5S rRNA + 49 proteins)
Functional sites	A (aminoacyl), P (peptidyl), E (exit)

Aminoacyl-tRNA Synthetases

Before translation, each amino acid must be activated and attached to its cognate tRNA by aminoacyl-tRNA synthetases (one for each amino acid, 20 in total). This two-step reaction consumes 2 ATP equivalents:

$$\text{Amino acid} + \text{tRNA} + \text{ATP} \rightarrow \text{Aminoacyl-tRNA} + \text{AMP} + \text{PP}_i$$

The energy of the aminoacyl ester bond is later used to drive peptide bond formation. Synthetases have proofreading (editing) activity to ensure fidelity, with an error rate of ~1 in 10,000.

Initiation

Prokaryotes

Shine-Dalgarno sequence (5′-AGGAGG-3′) in the 5′ UTR of mRNA base-pairs with the 3′ end of 16S rRNA, positioning the AUG start codon at the P site. Initiator tRNA: fMet-tRNA^fMet. Initiation factors: IF1, IF2 (GTPase), IF3.

Eukaryotes

Kozak sequence (5′-ACCAUGG-3′) surrounds the start codon. 40S subunit binds the 5′ cap (via eIF4E/eIF4G) and scans 5′→3′ until it finds the first AUG in a good Kozak context. Initiator: Met-tRNA_i^Met. Requires ~12 eIFs.

Elongation Cycle

The elongation cycle consists of three steps that repeat for each amino acid added:

1. Aminoacyl-tRNA delivery: EF-Tu•GTP delivers aminoacyl-tRNA to the A site. Correct codon-anticodon match triggers GTP hydrolysis, EF-Tu•GDP release, and accommodation of the tRNA. EF-Ts recycles EF-Tu.

2. Peptide bond formation: Catalyzed by the peptidyl transferase center in the 23S rRNA (large subunit). This makes the ribosome a ribozyme. The growing peptide chain is transferred from the P-site tRNA to the amino acid on the A-site tRNA.

3. Translocation: EF-G•GTP drives movement of the ribosome one codon along the mRNA (5′→3′). A-site tRNA moves to P site, P-site tRNA to E site, E-site tRNA is ejected. Costs 1 GTP.

Energy Cost of Translation

Translation is energetically expensive. The total cost per amino acid incorporated:

$$\underbrace{2 \text{ ATP}}_{\text{aminoacyl-tRNA charging}} + \underbrace{1 \text{ GTP}}_{\text{EF-Tu delivery}} + \underbrace{1 \text{ GTP}}_{\text{EF-G translocation}} = 4 \text{ high-energy phosphate bonds per residue}$$

For a 300-residue protein: ~1,200 high-energy phosphate bonds. This does not include the cost of ribosome assembly, mRNA synthesis, or initiation/termination factors.

Post-Translational Modifications & Protein Targeting

Newly synthesized polypeptides must fold correctly and often undergo covalent modifications to become functional proteins. Many proteins are also targeted to specific subcellular compartments.

Signal Peptides and the Secretory Pathway

Proteins destined for the ER, Golgi, plasma membrane, lysosomes, or secretion contain an N-terminal signal peptide (~16-30 hydrophobic residues). The signal recognition particle (SRP) binds the signal peptide as it emerges from the ribosome, halts translation, and targets the ribosome-mRNA complex to the ER membrane via the SRP receptor. Translation resumes with the polypeptide threaded through the Sec61 translocon into the ER lumen.

The Secretory Pathway:

ER (folding, N-glycosylation, disulfide bonds) → ERGIC → cis-Golgi → medial-Golgi (glycan processing) → trans-Golgi (sorting) → Plasma membrane / Secretory vesicles / Lysosomes

Glycosylation

N-linked Glycosylation

Occurs in the ER. Oligosaccharyltransferase (OST) transfers a preassembled 14-sugar oligosaccharide (Glc₃Man₉GlcNAc₂) from dolichol phosphate to the amide nitrogen of Asn in the sequon Asn-X-Ser/Thr (where X is any amino acid except Pro).

O-linked Glycosylation

Occurs primarily in the Golgi. Sugars (commonly GalNAc) are added one at a time to the hydroxyl oxygen of Ser or Thr residues. No consensus sequence. Important in mucins and extracellular matrix proteins.

Proteolytic Processing

Many proteins are synthesized as inactive precursors (zymogens/proenzymes) that require proteolytic cleavage for activation. Examples: insulin is synthesized as preproinsulin (signal peptide removal → proinsulin → C-peptide excision → mature insulin with A and B chains linked by disulfide bonds); digestive enzymes (trypsinogen → trypsin; chymotrypsinogen → chymotrypsin); blood clotting factors (coagulation cascade).

Protein Degradation: The Ubiquitin-Proteasome Pathway

Damaged, misfolded, or regulatory proteins are tagged for destruction by the covalent attachment of ubiquitin (a 76-amino acid protein). The process requires E1 (activating), E2 (conjugating), and E3 (ligase) enzymes, consuming 1 ATP per ubiquitin attached. Polyubiquitinated proteins (chains of $\geq 4$ ubiquitins linked via Lys48) are recognized and degraded by the 26S proteasome (a barrel-shaped protease complex) into small peptides, with ubiquitin recycled.

Key Concepts Summary

Central Dogma: DNA → RNA → Protein. Exceptions include reverse transcription (retroviruses) and RNA replication (RNA viruses).

Prokaryotic Transcription: Single RNA polymerase ($\alpha_2\beta\beta'\sigma$) recognizes -10 (TATAAT) and -35 (TTGACA) promoter elements via sigma factor. Elongation at 40-50 nt/s. Termination: Rho-dependent or intrinsic.

Eukaryotic Transcription: Three RNA polymerases (I, II, III). Pol II requires GTFs (TFIID, B, F, E, H) and Mediator for initiation at TATA box promoters.

mRNA Processing: 5′ m7G cap, 3′ poly(A) tail (~200 A's), and splicing (spliceosome removes introns, joins exons). Alternative splicing increases proteome diversity.

Genetic Code: 64 codons (61 sense + 3 stop). Degenerate, unambiguous, nearly universal. Wobble at 3rd position reduces the number of tRNAs needed.

Translation: 70S (prokaryotes) / 80S (eukaryotes) ribosomes with A, P, E sites. Cost: 4 high-energy phosphate bonds per amino acid. Peptidyl transferase is a ribozyme (23S rRNA).

Post-Translational: Signal peptides direct proteins to ER. N- and O-linked glycosylation. Proteolytic activation (zymogens). Ubiquitin-proteasome degradation pathway.

← DNA Replication & Repair Carbohydrate Chemistry →