Part 3: DNA Replication
Copying the Genome
DNA replication is the molecular mechanism by which a cell duplicates its entire genome prior to division. In E. coli, the 4.6 Mb circular chromosome is replicated in approximately 40 minutes at 37 Β°C, requiring coordinated action of more than 30 different proteins at each replication fork. Human cells face an even greater challenge: replicating 6.4 billion base pairs during each S phase, accomplished by firing tens of thousands of replication origins in a precisely regulated temporal program.
The fundamental chemistry is deceptively simple β a nucleophilic attack by the 3β²-OH of the growing chain on the Ξ±-phosphorus of the incoming dNTP, releasing pyrophosphate. Yet the machinery that ensures this reaction occurs with extraordinary speed, accuracy, and coordination represents one of the most sophisticated molecular systems in biology.
1. Semi-Conservative Replication
Three Competing Models (1953β1958)
After Watson and Crick proposed the double-helix structure in 1953, three models for how DNA might replicate were debated:
Conservative
The parental duplex remains intact; an entirely new duplex is synthesized. After one round, one molecule is entirely βoldβ and one is entirely βnew.β
Semi-Conservative
Each strand serves as a template. After one round, both daughter molecules contain one old strand and one new strand. This is the correct model.
Dispersive
Old and new DNA are interspersed throughout both strands of both daughter molecules. Blocks of old and new DNA alternate.
The MeselsonβStahl Experiment (1958)
Matthew Meselson and Franklin Stahl designed what is often called βthe most beautiful experiment in biology.β Their approach exploited the density difference between DNA containing the heavy nitrogen isotope 15N and the normal light isotope 14N.
Step 1: Label with 15N
E. coli cells were grown for many generations in medium containing 15NH4Cl as the sole nitrogen source. After β₯14 doublings, virtually all nitrogen in the DNA was15N, making the DNA βheavyβ (density β 1.724 g/cmΒ³ vs. 1.710 g/cmΒ³ for 14N-DNA).
Step 2: Transfer to 14N Medium
Cells were transferred to medium with 14NH4Cl. All newly synthesized DNA would now incorporate only 14N (βlightβ).
Step 3: CsCl Equilibrium Density Gradient Centrifugation
DNA was extracted at each generation and centrifuged in a CsCl gradient at ~140,000 g for 20+ hours. DNA migrates to its isopycnic position β the point where the buoyant density of CsCl equals the density of the DNA. UV absorption photography at 260 nm revealed distinct bands.
Results
Single heavy band (HH) β all DNA is 15N/15N.
Single band at intermediate density (HL) β each molecule has one 15N strand and one 14N strand. This rules out conservative replication (which would show one heavy + one light band).
Two bands: 50% intermediate (HL) and 50% light (LL). This rules out dispersive replication (which would show a single band shifting progressively toward light density).
HL fraction remains constant (2 out of 2n molecules), while LL fraction increases. The ratio of HL:LL = 1:(2nβ1 β 1).
Quantitative prediction: After n generations of growth in 14N medium, there will be 2n total DNA molecules. Exactly 2 molecules will be hybrid (HL) β one from each original parental strand β and (2n β 2) will be fully light (LL). The fraction of hybrid DNA = \(\frac{2}{2^n} = 2^{1-n}\), which approaches zero as n increases.
2. Replication Fork Architecture
The Asymmetry Problem
All known DNA polymerases synthesize in the 5β²β3β² direction only, yet the two template strands at a replication fork run antiparallel. This creates a fundamental asymmetry: one strand (the leading strand) can be synthesized continuously in the direction of fork movement, while the other (the lagging strand) must be synthesized discontinuously as a series of short fragments in the direction opposite to fork movement.
Replication Fork Diagram:
Fork movement βββββββββββββββββββΊ
5'βββββββββββββββββββββββββββββββββββββββ3' (template)
3'ββββββββββββββββββββββββΊ Leading strand
βββPol IIIβββ (continuous, 5'β3')
Helicase ββββ
β
3'βββββββββββββββββββββββββββββββββββββββ5' (template)
ββββ ββββ ββββ ββββ Lagging strand
OF4 OF3 OF2 OF1 (Okazaki fragments)
Leading Strand
- Synthesized 5β²β3β² continuously
- Same direction as helicase/fork movement
- Requires only a single RNA primer at the origin
- Synthesized by Pol III (prokaryotic) or Pol Ξ΅ (eukaryotic)
- High processivity due to sliding clamp association
- Rate: ~1,000 nt/s in E. coli, ~50 nt/s in human cells
Lagging Strand
- Synthesized 5β²β3β² but in short fragments
- Opposite direction to fork movement
- Requires repeated priming events (every 1β2 kb in prokaryotes)
- Synthesized by Pol III (prokaryotic) or Pol Ξ΄ (eukaryotic)
- Must cycle on and off the template (βtrombone modelβ)
- Okazaki fragments joined by DNA ligase after primer removal
Okazaki Fragments
Discovered by Reiji and Tsuneko Okazaki in 1968, these short DNA fragments are the transient intermediates of lagging-strand synthesis. Their size differs dramatically between organisms:
1,000β2,000 nt
Prokaryotic Okazaki fragments (E. coli)
100β200 nt
Eukaryotic Okazaki fragments
Each Okazaki fragment begins with a ~10 nt RNA primer synthesized by primase. In E. coli, DNA Pol I removes RNA primers via its 5β²β3β² exonuclease activity and fills the gap with DNA. In eukaryotes, RNase H1 and FEN1 (flap endonuclease 1) perform this role. Finally, DNA ligase seals the nick by forming a phosphodiester bond, consuming one ATP (eukaryotes) or NAD+ (prokaryotes).
3. Initiation of Replication
Prokaryotic Initiation: E. coli oriC
E. coli has a single, well-defined origin of replication called oriC, spanning 245 bp. This compact sequence contains all the information needed to direct replication initiation:
DnaA Boxes (5 copies, 9-mer consensus: TTATCCACA)
DnaA protein (52 kDa) binds cooperatively to these sequences in its ATP-bound form (DnaA-ATP). About 20β30 DnaA monomers assemble into a right-handed helical filament that wraps the origin DNA, introducing positive superhelical strain. This destabilizes the adjacent AT-rich region. DnaA-ADP is inactive; the conversion of DnaA-ATP to DnaA-ADP after initiation (RIDA: regulatory inactivation of DnaA) helps ensure once-per-cell-cycle firing.
AT-Rich Region (3 copies of 13-mer: GATCTNTTNTTTT)
The three 13-mer repeats are rich in A-T base pairs, which have only two hydrogen bonds (vs. three for G-C). DnaA-induced unwinding begins here, creating a single-stranded βopen complexβ or βDnaA bubbleβ spanning ~28 bp.
Helicase Loading
DnaC (helicase loader) delivers DnaB (hexameric helicase) to the single-stranded region. Two DnaB hexamers are loaded, one on each strand, oriented to travel in opposite directions (5β²β3β² along the strand they encircle). DnaC then dissociates upon ATP hydrolysis. DnaB translocates and recruits DnaG primase, forming the primosome.
Regulation: Ensuring Once-Per-Cell-Cycle Firing
- β’ SeqA sequestration: Newly replicated GATC sites in oriC are hemimethylated. SeqA protein binds hemimethylated GATC, blocking DnaA access for ~1/3 of the cell cycle until Dam methylase restores full methylation.
- β’ RIDA: The Hda protein stimulates DnaA-ATP hydrolysis to DnaA-ADP on the Ξ²-clamp, inactivating DnaA after initiation.
- β’ datA titration: The datA locus (near oriC) contains many DnaA-binding sites that sequester DnaA-ATP, reducing its effective concentration.
Eukaryotic Initiation: Origin Licensing and Firing
Eukaryotic chromosomes are much larger and linear, requiring thousands of replication origins to complete S phase in a reasonable time. Human cells fire ~30,000β50,000 origins per S phase. The system is controlled by a two-step mechanism that separates licensing (G1 phase) from firing (S phase), ensuring each origin fires at most once per cell cycle.
Step 1: Origin Licensing (G1 Phase) β Pre-RC Assembly
The Origin Recognition Complex (ORC), a six-subunit AAA+ ATPase (Orc1β6), binds replication origins throughout the cell cycle. In budding yeast, origins are defined by the 11-bp ARS Consensus Sequence (ACS); in metazoans, origin specification is more complex and may involve chromatin context rather than strict sequence specificity.
During G1, two licensing factors β Cdc6 (AAA+ ATPase) and Cdt1 β are recruited to ORC-bound origins. Together, they load the MCM2-7 hexameric helicase as a head-to-head double hexamer encircling double-stranded DNA. This βpre-replicative complexβ (pre-RC) licenses the origin for firing. The MCM double hexamer is catalytically inactive at this stage.
Step 2: Origin Firing (S Phase) β CMG Helicase Activation
S-phase CDK (Cyclin A/EβCdk2) and DDK (Dbf4-dependent kinase Cdc7) phosphorylate MCM subunits and other factors. This triggers recruitment of Cdc45 and the GINS complex (Sld5, Psf1, Psf2, Psf3) to form the active CMG helicase (Cdc45βMCMβGINS).
The double hexamer splits into two single CMG helicases that travel in opposite directions, each encircling the leading-strand template (3β²β5β² translocation). DNA Pol Ξ±-primase synthesizes the first RNA-DNA primer, Pol Ξ΅ takes over the leading strand, and Pol Ξ΄ synthesizes the lagging strand.
Preventing Re-Replication
- β’ Geminin binds and inhibits Cdt1 during S, G2, and M phases.
- β’ S-CDK phosphorylates Orc1, Cdc6, and Cdt1, targeting them for ubiquitin-mediated degradation (SCFSkp2) or nuclear export.
- β’ CDK activity must drop (at mitotic exit) before new pre-RCs can assemble, creating a strict temporal separation between licensing and firing.
4. Elongation Enzymes
DNA Polymerases
E. coli DNA Polymerase III Holoenzyme
The Pol III holoenzyme is a 900 kDa assembly and the principal replicative polymerase in E. coli. It consists of 10 different subunits organized into three functional modules:
αΡθ Core
- Ξ± subunit (dnaE): 5β²β3β² polymerase activity. Belongs to the C-family of DNA polymerases. Catalyzes phosphodiester bond formation.
- Ξ΅ subunit (dnaQ): 3β²β5β² proofreading exonuclease. Removes misincorporated nucleotides, improving fidelity ~100-fold.
- ΞΈ subunit (holE): Stabilizes Ξ΅, enhancing its exonuclease activity ~2β3-fold.
Ξ² Sliding Clamp
- Ring-shaped homodimer (2 Γ 40.6 kDa)
- Encircles DNA; inner diameter ~35 Γ , outer ~80 Γ
- Slides freely along duplex DNA
- Tethers Pol III core to template, increasing processivity from ~10 nt to >50,000 nt
- Each monomer has 3 domains with identical topology despite no sequence similarity
Ξ³/Ο Clamp Loader
- Ο3δδβ²ΟΟ complex (or Ξ³3δδβ²ΟΟ)
- AAA+ ATPase machine that opens the Ξ² clamp ring and loads it onto primer-template junctions
- Ο subunit also binds DnaB helicase, physically coupling polymerase to helicase
- ATP binding opens clamp; ATP hydrolysis closes clamp on DNA and ejects the loader
Eukaryotic Replicative Polymerases
Pol Ξ±βPrimase Complex
Four-subunit complex (p180, p68, p58, p48). The p48 subunit is the primase that synthesizes an ~8β12 nt RNA primer. Pol Ξ± (p180) then extends this with ~20β30 nt of DNA, creating a ~30β40 nt RNA-DNA primer. Pol Ξ± lacks 3β²β5β² proofreading exonuclease and has low processivity β it functions only in initiation, not bulk synthesis.
Pol Ξ΅ (Leading Strand)
Four-subunit B-family polymerase (Pol2, Dpb2, Dpb3, Dpb4). The catalytic subunit Pol2 (256 kDa) has both polymerase and 3β²β5β² exonuclease domains. Extremely high fidelity. Travels with the CMG helicase at the front of the replisome. Processivity is enhanced by the PCNA sliding clamp.
Pol Ξ΄ (Lagging Strand)
Three-subunit B-family polymerase (Pol3, Pol31, Pol32). Has 3β²β5β² proofreading exonuclease. Extends Okazaki fragments after Pol Ξ±-primase. Also participates in mismatch repair, base excision repair, and nucleotide excision repair. Interacts with PCNA via a PIP-box motif.
PCNA Sliding Clamp & RFC Clamp Loader
PCNA (Proliferating Cell Nuclear Antigen) is a homotrimer forming a ring structurally analogous to the bacterial Ξ² clamp. Each monomer has two domains, so the trimer has pseudo-6-fold symmetry matching the Ξ² clamp dimer. PCNA is a platform for many DNA-processing enzymes (polymerases, ligase, FEN1, mismatch repair factors). RFC (Replication Factor C) is the pentameric AAA+ clamp loader (RFC1β5) that loads PCNA onto primer-template junctions in an ATP-dependent manner.
Replicative Helicases
DnaB (E. coli)
- Homohexameric ring (6 Γ 52 kDa)
- Translocates 5β²β3β² on the lagging-strand template
- Encircles the lagging-strand template strand
- Powered by ATP hydrolysis (~100β1000 nt/s unwinding)
- Each subunit cycles through ATP, ADP, and empty states (βrotaryβ or βhand-over-handβ mechanism)
- Recruits DnaG primase to form the primosome
MCM2-7 / CMG (Eukaryotic)
- Heterohexameric ring of MCM2, 3, 4, 5, 6, 7 (each ~100 kDa)
- Translocates 3β²β5β² on the leading-strand template
- Encircles the leading-strand template strand (opposite polarity to DnaB)
- Active as CMG complex: Cdc45βMCM2-7βGINS
- Powered by ATP hydrolysis through a conserved AAA+ motor
- Loaded as double hexamer (head-to-head) in G1; activated in S phase
Primase
DNA polymerases cannot initiate synthesis de novo β they require a pre-existing 3β²-OH to extend. Primases are specialized RNA polymerases that synthesize short RNA primers (~10 nucleotides) to provide this 3β²-OH.
DnaG Primase (E. coli)
65.6 kDa single-subunit enzyme. Synthesizes ~10β12 nt RNA primers. Recruited to the replication fork by direct interaction with DnaB helicase. Transiently active: synthesizes a primer, then dissociates to allow Pol III to begin Okazaki fragment extension.
Pol Ξ±βPrimase (Eukaryotic)
Four-subunit complex integrating primase and polymerase activities. The p48 primase subunit synthesizes ~8β12 nt RNA, then hands off to the p180 Pol Ξ± subunit for ~20 nt DNA extension. This RNA-DNA hybrid primer is used by both Pol Ξ΅ (leading) and Pol Ξ΄ (lagging). Associates with the replisome through Ctf4/AND-1 trimer.
Single-Strand DNA Binding Proteins
Helicase unwinding produces single-stranded DNA (ssDNA) that is vulnerable to nuclease attack, secondary structure formation, and chemical damage. SSB proteins coat ssDNA to prevent these problems.
SSB (E. coli)
Homotetramer (4 Γ 18.8 kDa). Each subunit has an OB-fold domain that binds ssDNA. Binds cooperatively in two modes: (SSB)35 wraps 35 nt around two subunits (limited cooperativity) and (SSB)65 wraps 65 nt around all four subunits (highly cooperative, forms long filaments). Destabilizes hairpins and secondary structures.
RPA (Eukaryotic)
Heterotrimeric complex: RPA70 (70 kDa), RPA32 (32 kDa), RPA14 (14 kDa). Contains six OB-fold domains total (four in RPA70, one in RPA32, one in RPA14). Binds ssDNA with very high affinity (Kd ~10β10 M). Also plays essential roles in DNA repair, recombination, and checkpoint signaling. Phosphorylation of RPA32 by ATR/ATM kinases modulates its interactions during the DNA damage response.
5. Termination of Replication
Prokaryotic Termination: ter Sites and Tus
In E. coli, replication terminates in a broad region opposite oriC. This region contains 10 ter sites (TerAβTerJ), each a 23-bp sequence that binds the Tus protein (terminus utilization substance, 36 kDa monomer).
The Tusβter complex acts as a polar trap: it allows a replication fork approaching from one direction to pass through but blocks a fork arriving from the opposite direction. This is a βmousetrapβ mechanism β the approaching helicase (DnaB) unfolds a critical cytosine in the ter sequence, which then locks into a pocket on Tus, creating a nearly irreversible block. The two sets of ter sites are oriented to create a βfork trapβ that ensures forks converge within this region regardless of which fork arrives first.
When forks meet, the remaining gap is filled, ligated, and the interlinked (catenated) daughter chromosomes are separated by Topoisomerase IV, a type II topoisomerase that passes one duplex through another.
Eukaryotic Termination
Eukaryotes lack defined termination sequences. Instead, converging forks simply meet and merge. The process involves:
- β’ Fork convergence: When two CMG helicases from adjacent replicons approach each other, the remaining unreplicated DNA between them shrinks until the forks meet.
- β’ CMG unloading: Upon meeting, CMG helicases transition onto dsDNA. The SCFDia2 ubiquitin ligase (in yeast) or CRL2Lrr1 (in metazoans) ubiquitylates MCM7, triggering p97/VCP/Cdc48 segregase to extract CMG from chromatin.
- β’ Gap filling and ligation: Remaining single-strand gaps are filled by Pol Ξ΄ and sealed by DNA ligase I.
- β’ Decatenation: Topoisomerase IIΞ± (Topo II) resolves any catenanes (interlocking daughter duplexes) by passing one double helix through a transient double-strand break in the other. This is essential before chromosome segregation in mitosis.
6. Telomere Replication and the End Problem
The End-Replication Problem
Linear chromosomes present a unique challenge: when the RNA primer at the very 5β² end of each lagging strand is removed, there is no upstream 3β²-OH available for a polymerase to fill the resulting gap. Consequently, with each round of replication, the daughter chromosome produced by lagging-strand synthesis is shorter than the parent β by approximately 50β200 bp per division in human cells. This was first recognized by James Watson (1972) and independently by Alexei Olovnikov (1971).
Parent: 5'ββββββββββββββββββββββββββββββββββββββ3'
3'ββββββββββββββββββββββββββββββββββββββ5'
After replication:
Leading: 5'ββββββββββββββββββββββββββββββββββββββ3' (complete)
3'ββββββββββββββββββββββββββββββββββββββ5' (new, complete)
Lagging: 5'βββββββββββββββββββββββββββββββββ3' (new, SHORTER)
3'ββββββββββββββββββββββββββββββββββββββ5' (template, full)
^^^^
Gap from primer removal
Telomerase
Telomerase is a specialized ribonucleoprotein reverse transcriptase that extends telomeric DNA to counteract the end-replication problem. It was discovered by Carol Greider and Elizabeth Blackburn in 1985 in Tetrahymena (Nobel Prize, 2009).
TERT (Telomerase Reverse Transcriptase)
The catalytic protein subunit (~127 kDa in humans). Contains a reverse transcriptase domain that synthesizes telomeric DNA using the RNA template. Also contains an N-terminal TEN domain for DNA binding and a C-terminal extension (CTE) for processivity. Repressed in most somatic cells; reactivated in ~85β90% of human cancers.
TERC (Telomerase RNA Component)
451 nt in humans (hTR). Contains an 11-nt template region (5β²-CUAACCCUAAC-3β²) complementary to the human telomeric repeat (TTAGGG). After extending the 3β² overhang by one repeat, telomerase translocates to realign the template for the next addition cycle. Also contains structural domains for TERT binding, Cajal body localization (CAB box), and H/ACA motif for stability.
Mechanism: Telomerase binds the 3β² single-stranded overhang of the telomere (human: 50β300 nt G-rich overhang). Using its RNA template, it extends the G-rich strand by adding TTAGGG repeats. After extension, conventional lagging-strand machinery (Pol Ξ±-primase, Pol Ξ΄) fills in the complementary C-rich strand, followed by primer removal and ligation. The net result is telomere length maintenance.
Shelterin Complex
Telomeres are protected from being recognized as DNA double-strand breaks by a six-protein complex called shelterin:
TRF1
Binds double-stranded TTAGGG repeats. Regulates telomere length (negative regulator). Homodimer.
TRF2
Binds ds-TTAGGG. Essential for t-loop formation and suppression of ATM-dependent DNA damage response. Homodimer.
POT1
Binds single-stranded G-rich overhang. Suppresses ATR signaling. Regulates telomerase access.
TPP1
Bridges POT1 to TIN2. TEL patch on TPP1 recruits telomerase to telomeres. Enhances telomerase processivity.
TIN2
Central hub connecting TRF1, TRF2, and TPP1. Stabilizes the entire shelterin complex on the telomere.
RAP1
Recruited by TRF2. Inhibits NHEJ at telomeres. Involved in transcriptional regulation at subtelomeric regions.
The 3β² G-rich overhang invades the duplex telomeric DNA to form a displacement loop (t-loop), creating a βD-loopβ at the invasion point. This structure, mediated by TRF2, sequesters the chromosome end and prevents activation of DNA damage signaling pathways (ATM, ATR) and inappropriate repair by NHEJ or HR.
Alternative Lengthening of Telomeres (ALT)
Approximately 10β15% of cancers maintain telomeres without telomerase, using a recombination-based mechanism called ALT. Key features:
- β’ Highly heterogeneous telomere lengths (from <3 kb to >50 kb)
- β’ ALT-associated PML bodies (APBs) where recombination occurs
- β’ Depends on break-induced replication (BIR) and homologous recombination
- β’ Often associated with mutations in ATRX/DAXX chromatin remodeling complex
- β’ Extrachromosomal telomeric circles (C-circles) serve as a biomarker
Hayflick Limit and Cellular Senescence
In 1961, Leonard Hayflick observed that human fibroblasts in culture can divide only ~50β70 times before entering irreversible growth arrest (senescence). This Hayflick limit is directly linked to telomere shortening:
- β’ Somatic cells: ~50β200 bp lost per division
- β’ Critical telomere length (~4β6 kb) triggers p53/Rb senescence pathway
- β’ If p53/Rb are inactivated, cells bypass senescence but enter βcrisisβ with massive genomic instability and cell death
- β’ Rare survivors that activate telomerase or ALT become immortalized (cancer)
- β’ Germ cells and stem cells express low levels of telomerase, extending replicative lifespan
7. Replication Fidelity
The overall error rate of DNA replication is remarkably low: approximately 10β10 errors per base pair per cell division. This extraordinary accuracy is achieved through three successive layers of quality control, each contributing multiplicatively:
10β5
Base Selection
DNA polymerase active site discriminates correct from incorrect dNTPs using geometric complementarity (Watson-Crick geometry), hydrogen bonding, base stacking, and an βinduced fitβ conformational change. The fingers domain closes around the nascent base pair; incorrect pairs cause steric clashes that slow the forward reaction.
10β2
Proofreading (3β²β5β² Exonuclease)
Misincorporation slows polymerization and increases the probability that the primer terminus melts into the 3β²β5β² exonuclease active site (~30 Γ away in Pol III). The mispaired terminal nucleotide is excised, and the correct nucleotide is then incorporated. Improves fidelity ~100-fold.
10β3
Mismatch Repair (MMR)
Post-replicative scanning by MutS (mismatch recognition), MutL (molecular matchmaker), and MutH (strand discrimination via hemimethylation in E. coli). In eukaryotes, MSH2βMSH6 (MutSΞ±) recognizes mismatches and small loops; MLH1βPMS2 (MutLΞ±) coordinates excision and resynthesis. Strand discrimination uses PCNA orientation and nicks in the nascent strand. Improves fidelity ~1000-fold.
Combined Fidelity Calculation
The three mechanisms act sequentially, so their error rates multiply:
For the human genome (6.4 Γ 109 bp), this predicts approximately 0.64 mutations per cell division β roughly consistent with measured somatic mutation rates of ~0.5β1.0 per cell division.
8. Mathematical Framework
Replication Fork Kinetics
Fork Progression Rate
The position of a replication fork as a function of time, assuming constant speed:
where \(v_{\text{fork}}\) is the fork velocity (~1000 nt/s in E. coli, ~20β50 nt/s in human cells), and \(x_0\) is the origin position.
Time to Replicate a Genome
For a circular chromosome with bidirectional replication from a single origin:
For E. coli: L = 4.6 Γ 106 bp, v = 1000 bp/s gives T = 2300 s β 38 min.
For eukaryotes with N origins firing simultaneously:
For human cells: L = 6.4 Γ 109 bp, v = 30 bp/s, N β 30,000 gives T β 3600 s β 1 hour. Actual S phase is 6β8 hours because origins fire asynchronously in a temporal program.
Polymerase Processivity
Processivity (P) is the average number of nucleotides incorporated per binding event:
where \(k_{\text{pol}}\) is the polymerization rate, \(k_{\text{off}}\) is the dissociation rate, and \(p_{\text{step}}\) is the probability of stepping forward per catalytic cycle. With the Ξ² sliding clamp, \(k_{\text{off}}\) is drastically reduced, yielding P > 50,000 nt.
Derivation: Replication Fork Speed and Processivity
Starting from the Michaelis-Menten framework for DNA polymerase catalysis, we derive the effective fork velocity and processivity.
Step 1: Define the polymerase catalytic cycle
DNA polymerase binds a dNTP substrate (S) and catalyzes incorporation. The enzyme-substrate interaction follows:
$$E + S \underset{k_{-1}}{\overset{k_1}{\rightleftharpoons}} ES \xrightarrow{k_{\text{cat}}} E + \text{DNA}_{n+1} + \text{PP}_i$$
Step 2: Apply steady-state approximation to ES complex
Setting $d[\text{ES}]/dt = 0$:
$$k_1[E][S] = (k_{-1} + k_{\text{cat}})[\text{ES}]$$
The Michaelis constant is $K_M = (k_{-1} + k_{\text{cat}})/k_1$.
Step 3: Nucleotide incorporation rate
The rate of nucleotide incorporation (fork velocity in nt/s) at saturating dNTP concentration:
$$v_{\text{fork}} = \frac{k_{\text{cat}}[\text{dNTP}]}{K_M + [\text{dNTP}]} \xrightarrow{[\text{dNTP}] \gg K_M} k_{\text{cat}}$$
For E. coli Pol III: $k_{\text{cat}} \approx 1000 \text{ s}^{-1}$, $K_M \approx 10\text{--}50 \text{ }\mu\text{M}$, and intracellular [dNTP] $\approx 100\text{--}300 \text{ }\mu\text{M}$, so the enzyme operates near $V_{\max}$.
Step 4: Catalytic efficiency as a specificity measure
The catalytic efficiency $k_{\text{cat}}/K_M$ sets the second-order rate constant for substrate capture:
$$\frac{k_{\text{cat}}}{K_M} = \frac{k_1 \cdot k_{\text{cat}}}{k_{-1} + k_{\text{cat}}}$$
For correct dNTPs, $k_{\text{cat}}/K_M \sim 10^8 \text{ M}^{-1}\text{s}^{-1}$ (near diffusion limit). For incorrect dNTPs, $k_{\text{cat}}/K_M \sim 10^3\text{--}10^4 \text{ M}^{-1}\text{s}^{-1}$, giving a discrimination factor of $\sim 10^{4}\text{--}10^{5}$.
Step 5: Processivity from competing rates
At each step, the polymerase either extends (rate $k_{\text{pol}}$) or dissociates (rate $k_{\text{off}}$). The probability of stepping forward:
$$p_{\text{step}} = \frac{k_{\text{pol}}}{k_{\text{pol}} + k_{\text{off}}}$$
Step 6: Average processivity as a geometric series
The mean number of nucleotides incorporated before dissociation is a geometric distribution:
$$P = \sum_{n=0}^{\infty} n \cdot p_{\text{step}}^n(1 - p_{\text{step}}) = \frac{p_{\text{step}}}{1 - p_{\text{step}}} = \frac{k_{\text{pol}}}{k_{\text{off}}}$$
Without the sliding clamp: $k_{\text{off}} \sim 1\text{ s}^{-1}$, giving $P \sim 10\text{--}50$ nt. With the $\beta$ clamp: $k_{\text{off}} \sim 10^{-2}\text{ s}^{-1}$, giving $P > 50{,}000$ nt.
Error Rate Formalism
The probability of a mutation surviving at a given position:
where \(\epsilon_{\text{ins}} \approx 10^{-5}\) is the misincorporation frequency, \(f_{\text{proof}} \approx 0.99\) is the fraction corrected by proofreading, and \(f_{\text{MMR}} \approx 0.999\) is the fraction corrected by mismatch repair. This gives \(\mu \approx 10^{-5} \times 0.01 \times 0.001 = 10^{-10}\).
Derivation: Combined Fidelity from Three Error-Correction Layers
Starting from the principle that replication errors must escape three independent, sequential quality-control checkpoints to become permanent mutations.
Step 1: Define the misincorporation frequency
DNA polymerase selects dNTPs based on Watson-Crick geometry and induced fit. The insertion error rate is the ratio of catalytic efficiencies for wrong vs. right nucleotides:
$$\epsilon_{\text{ins}} = \frac{(k_{\text{cat}}/K_M)_{\text{wrong}}}{(k_{\text{cat}}/K_M)_{\text{right}}} \approx 10^{-4}\text{--}10^{-5}$$
Step 2: Proofreading exonuclease correction
A misincorporated nucleotide distorts the primer terminus geometry, slowing the forward polymerization rate and increasing the probability of the 3β²-end melting into the exonuclease active site. Let $f_{\text{proof}}$ be the fraction of errors removed:
$$f_{\text{proof}} = \frac{k_{\text{exo}}^{\text{mismatch}}}{k_{\text{exo}}^{\text{mismatch}} + k_{\text{pol}}^{\text{mismatch}}} \approx 0.99$$
The error rate after proofreading: $\epsilon_{\text{ins}} \times (1 - f_{\text{proof}})$.
Step 3: Post-replicative mismatch repair (MMR)
MutS/MutL/MutH (prokaryotes) or MSH2-MSH6/MLH1-PMS2 (eukaryotes) scan newly replicated DNA. Errors that escaped proofreading are recognized as mismatches. Let $f_{\text{MMR}}$ be the fraction corrected by MMR:
$$f_{\text{MMR}} \approx 0.999 \quad \text{(removes 99.9\% of remaining mismatches)}$$
Step 4: Multiply escape probabilities
Since the three mechanisms act sequentially and independently, the probability that an error survives all three layers is the product of escape probabilities:
$$\mu = \epsilon_{\text{ins}} \times (1 - f_{\text{proof}}) \times (1 - f_{\text{MMR}})$$
Step 5: Substitute numerical values
$$\mu = 10^{-5} \times (1 - 0.99) \times (1 - 0.999) = 10^{-5} \times 10^{-2} \times 10^{-3} = 10^{-10} \text{ per bp per division}$$
Step 6: Predict mutations per genome per division
For the human diploid genome ($G = 6.4 \times 10^9$ bp):
$$M = \mu \times G = 10^{-10} \times 6.4 \times 10^9 \approx 0.64 \text{ mutations per cell division}$$
This is remarkably consistent with the observed somatic mutation rate of ~0.5-1.0 mutations per division measured by whole-genome sequencing of clonal lineages.
Telomere Shortening Dynamics
Telomere length after n divisions without telomerase:
With telomerase activity adding \(\delta\) bp per division:
Senescence occurs when \(L(n) \leq L_{\text{crit}}\), giving the Hayflick limit:
For human somatic cells: L0 β 10,000 bp, Lcrit β 4,000 bp, ΞL β 50β200 bp, Ξ΄ = 0. This gives n β 30β120 divisions, consistent with the Hayflick limit of ~50β70 doublings.
Derivation: Telomere Shortening and the Hayflick Limit
Starting from the end-replication problem: conventional DNA polymerase cannot replicate the very 3β² end of a linear chromosome because it requires a primer upstream of the terminus.
Step 1: The end-replication problem
After each S phase, the lagging strand template loses the RNA primer region at its 5β² end. Additionally, 5β²β3β² exonuclease processing of the C-rich strand generates the 3β² G-rich overhang. Together, these cause a net loss of $\Delta L$ base pairs per division (typically 50-200 bp in human somatic cells).
Step 2: Linear shortening model (no telomerase)
If shortening is constant per division, telomere length after $n$ divisions is a simple arithmetic sequence:
$$L(n) = L_0 - \Delta L \cdot n$$
where $L_0$ is the initial telomere length (typically ~10,000 bp at birth in humans).
Step 3: Include telomerase activity
If telomerase adds $\delta$ bp per division (partial activity in stem cells, full in germ/cancer cells), the net shortening per division is $(\Delta L - \delta)$:
$$L(n) = L_0 - (\Delta L - \delta) \cdot n$$
When $\delta = \Delta L$, telomere length is maintained indefinitely (immortal cells). When $\delta < \Delta L$, shortening continues but at a reduced rate.
Step 4: Define the critical length for senescence
When telomere length drops below a critical threshold $L_{\text{crit}}$ (~4,000-6,000 bp), uncapped chromosome ends are recognized as DNA damage. This activates the ATM/ATR-p53-p21 DDR pathway, triggering irreversible cell cycle arrest (senescence).
Step 5: Solve for the Hayflick limit
Set $L(n_{\text{Hayflick}}) = L_{\text{crit}}$ and solve:
$$L_0 - (\Delta L - \delta) \cdot n_{\text{Hayflick}} = L_{\text{crit}}$$
$$n_{\text{Hayflick}} = \frac{L_0 - L_{\text{crit}}}{\Delta L - \delta}$$
Step 6: Numerical estimate for human somatic cells
For normal somatic cells ($\delta = 0$):
$$n_{\text{Hayflick}} = \frac{10{,}000 - 4{,}000}{100 - 0} = 60 \text{ divisions}$$
Using $\Delta L = 50$ bp: $n = 120$; using $\Delta L = 200$ bp: $n = 30$. The observed Hayflick limit of ~50-70 doublings for human fibroblasts falls within this range, confirming telomere shortening as the molecular clock of replicative senescence.
Okazaki Fragment Frequency
The number of Okazaki fragments produced per replicon of length L:
For E. coli: L = 4.6 Γ 106 bp, average Okazaki fragment = 1,500 nt gives ~3,067 fragments per chromosome per replication. For a human cell with ~50 nt/s fork speed and 150 nt average Okazaki length: ~6.4 Γ 109 / (2 Γ 150) β 21 million Okazaki fragments per S phase.
Derivation: Okazaki Fragment Length Distribution from Stochastic Priming
Starting from the stochastic nature of primase-DnaB interaction on the lagging strand template, we derive the expected distribution of Okazaki fragment lengths.
Step 1: Model priming as a Poisson process
Primase associates transiently with DnaB helicase at the replication fork. Each priming event is stochastic, with a constant probability $\lambda$ of initiating a new primer per unit length of single-stranded template exposed. This is a memoryless (Poisson) process along the DNA coordinate.
Step 2: Inter-priming distance follows an exponential distribution
The distance between consecutive priming events (which determines Okazaki fragment length) follows an exponential distribution:
$$P(\ell) = \lambda \cdot e^{-\lambda \ell}$$
where $\ell$ is the fragment length and $1/\lambda$ is the mean inter-priming distance.
Step 3: Mean and variance of fragment length
For the exponential distribution:
$$\langle \ell \rangle = \frac{1}{\lambda}, \quad \text{Var}(\ell) = \frac{1}{\lambda^2}, \quad \text{CV} = \frac{\sigma}{\mu} = 1$$
For E. coli: $\langle \ell \rangle \approx 1{,}000\text{--}2{,}000$ nt, so $\lambda \approx 5 \times 10^{-4}\text{--}10^{-3}$ per nt.
Step 4: Correction for minimum fragment length
In reality, a minimum time is required for primer synthesis (~1 s for a 10-nt RNA primer). During this time, the fork advances $v_{\text{fork}} \times t_{\text{primer}}$ nt, setting a minimum fragment length $\ell_{\min}$. The corrected distribution is a shifted exponential:
$$P(\ell) = \lambda \cdot e^{-\lambda(\ell - \ell_{\min})} \quad \text{for } \ell \geq \ell_{\min}$$
Step 5: Number of fragments per replicon
For a replicon of length $L$ replicated by two forks, the lagging strand on each fork produces:
$$N_{\text{OF}} = \frac{L/2}{\langle \ell \rangle} = \frac{L \cdot \lambda}{2}$$
For E. coli: $N = 4.6 \times 10^6 / (2 \times 1{,}500) \approx 1{,}533$ fragments per fork, ~3,067 total per replication.
Step 6: Eukaryotic refinement
In eukaryotes, Okazaki fragments are shorter (~150-200 nt) due to nucleosome spacing constraints. The priming rate $\lambda$ is higher, and fragment length correlates with the nucleosome repeat length (~165-200 bp), suggesting that chromatin structure imposes a periodic modulation on the priming probability, deviating from the pure exponential model toward a more peaked (gamma-like) distribution.
Python Simulation: Replication Fork Progression
This simulation models replication fork progression with kinetic parameters from E. coli. It tracks leading strand (continuous synthesis), lagging strand (discontinuous Okazaki fragment generation), and the gap between them. The output includes fork position over time, Okazaki fragment length distribution, and the dynamic leadingβlagging gap.
Model Parameters
- β’ Fork speed (helicase): 1,000 nt/s
- β’ Lagging strand polymerase: 800 nt/s (accounts for cycling overhead)
- β’ Mean Okazaki fragment: 1,500 Β± 300 nt
- β’ Primer synthesis time: 1.0 s (~10 nt RNA primer)
- β’ Total replication: 50,000 bp segment
Replication Fork Kinetics Simulator
PythonModels leading/lagging strand synthesis with Okazaki fragment generation and fork gap dynamics
Click Run to execute the Python code
Code will be executed with Python 3 on the server
Fortran Computation: Telomere Shortening Model
This Fortran program models telomere length dynamics over 120 cell divisions under four biological scenarios: normal somatic cells (no telomerase), partial telomerase activity (stem-like), full telomerase (cancer/germ cells), and the ALT pathway. The simulation tracks when each cell type reaches the Hayflick senescence threshold and crisis.
Scenarios Modeled
Telomere Shortening Dynamics (4 Scenarios)
FortranModels telomere attrition, telomerase compensation, and ALT pathway over cell divisions
Click Run to execute the Fortran code
Code will be compiled with gfortran and executed on the server
Summary: Prokaryotic vs. Eukaryotic Replication
| Feature | Prokaryotic (E. coli) | Eukaryotic |
|---|---|---|
| Origin(s) | Single (oriC, 245 bp) | Multiple (~30,000β50,000 in human) |
| Initiator | DnaA protein | ORC β Cdc6 β Cdt1 β MCM loading |
| Helicase | DnaB (5β²β3β² on lagging template) | CMG/MCM2-7 (3β²β5β² on leading template) |
| Primase | DnaG | Pol Ξ±βprimase complex |
| Leading strand Pol | Pol III core | Pol Ξ΅ |
| Lagging strand Pol | Pol III core | Pol Ξ΄ |
| Sliding clamp | Ξ² clamp (homodimer) | PCNA (homotrimer) |
| Clamp loader | Ξ³/Ο complex | RFC (RFC1β5) |
| SSB | SSB (homotetramer) | RPA (heterotrimer) |
| Okazaki fragments | 1,000β2,000 nt | 100β200 nt |
| Primer removal | Pol I (5β²β3β² exo) | RNase H1 + FEN1 |
| Fork speed | ~1,000 nt/s | ~20β50 nt/s |
| Termination | ter/Tus fork trap | Fork convergence + Topo II decatenation |
| Ligase cofactor | NAD+ | ATP |
Key Equations Summary
Replication time (single origin, bidirectional):
Meselson-Stahl: hybrid fraction after n generations:
Replication fidelity (three-layer model):
Hayflick limit from telomere dynamics:
Mutations per genome per division: