2.2 The Double Helix
The Watson-Crick Model (1953)
James Watson and Francis Crick proposed the double helix structure of DNA in their landmark one-page paper in Nature (April 25, 1953). They integrated three key lines of evidence:
- 1. Chargaff's rules: [A] = [T] and [G] = [C], implying complementary base pairing.
- 2. X-ray diffraction data: Rosalind Franklin's Photo 51 (taken with Raymond Gosling) showed the characteristic "X" pattern of a helix, revealing the 3.4 nm repeat (pitch) and 2.0 nm diameter.
- 3. Chemical knowledge: Jerry Donohue corrected Watson on the tautomeric forms of the bases (keto, not enol), enabling the correct hydrogen bonding scheme.
"It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material."
-- Watson & Crick, Nature 171, 737-738 (1953)
X-ray Diffraction Evidence: Photo 51
Rosalind Franklin's Photo 51 (May 1952) is one of the most important images in the history of science. The X-ray diffraction pattern of B-form DNA fibers revealed:
What Photo 51 Showed
- X-shaped pattern: Diagnostic of a helical structure (predicted by Cochran, Crick, and Vand in 1952)
- Layer line spacing: 3.4 nm pitch (distance per complete turn)
- Strong 10th layer line: 0.34 nm rise per residue (3.4/10 = 0.34)
- Missing 4th layer line: Indicated two strands (not one) offset by 3/8 of the pitch
- Diamond-shaped gaps: Major and minor groove widths
Bragg's Law
X-ray diffraction is governed by Bragg's law, relating the spacing of repeating structures (d) to the diffraction angle ($\theta$):
$$n\lambda = 2d\sin\theta$$
where n is the order of diffraction, $\lambda$ is the X-ray wavelength (typically Cu K$\alpha$ = 1.54 Angstrom), and d is the spacing between diffracting planes.
Historical note: Franklin was close to solving the structure herself. Her February 1953 manuscript (submitted before Watson and Crick's paper) described a two-chain helix with the phosphates on the outside. Watson saw Photo 51 (shown to him by Wilkins without Franklin's knowledge), which was critical for the final model. Franklin died of ovarian cancer in 1958 at age 37 and was not included in the 1962 Nobel Prize.
Structural Parameters
B-DNA: The Standard Form
- Diameter: 2.0 nm (20 Angstrom)
- Rise per bp: 0.34 nm (3.4 Angstrom)
- Bases per turn: 10.5 (in solution; 10.0 in fiber)
- Helical pitch: 3.6 nm (36 Angstrom)
- Twist angle: 34.3 degrees per bp (360/10.5)
- Major groove: 22 Angstrom wide, 8.5 Angstrom deep
- Minor groove: 12 Angstrom wide, 7.5 Angstrom deep
- Handedness: Right-handed
- Sugar pucker: C2'-endo
- Glycosidic bond: Anti conformation
B-DNA is the predominant form under physiological conditions (~92% relative humidity, low salt). The length of B-DNA can be calculated as:
$$L = N \times 0.34 \text{ nm}$$
where N is the number of base pairs. Human genome: 3.2 x 10$^9$ bp x 0.34 nm = ~1.09 m per haploid set.
Antiparallel Strands
5'---ATCGATCG---3'
||||||||
3'---TAGCTAGC---5'
The two strands run in opposite directions (5' to 3' and 3' to 5'). This antiparallel arrangement is essential for Watson-Crick base pairing geometry and has profound consequences for replication: one strand (leading) is synthesized continuously, while the other (lagging) must be synthesized in discontinuous Okazaki fragments because DNA polymerases can only synthesize in the 5' to 3' direction.
Watson-Crick Base Pairing Energetics
A-T Base Pair
- 2 hydrogen bonds (N6-H...O4 and N1...H-N3)
- H-bond energy: ~7 kJ/mol (total for the pair)
- Propeller twist: ~11 degrees
- AT-rich regions are easier to melt (denature)
- Found at replication origins (easier strand separation)
- TATA box in promoters exploits lower stability
G-C Base Pair
- 3 hydrogen bonds (O6...H-N4, N1-H...N3, N2-H...O2)
- H-bond energy: ~11 kJ/mol (total for the pair)
- Propeller twist: ~12 degrees
- GC-rich regions are more thermally stable
- CpG islands near promoters in mammals
- Thermophilic organisms tend to have higher GC content
Nearest-Neighbor Model for DNA Stability
DNA duplex stability depends not just on GC content but on the specific sequence of adjacent base pairs ("nearest-neighbor" or "stacking" interactions). The free energy of duplex formation is calculated as a sum of nearest-neighbor parameters:
$$\Delta G°_{37} = \sum_{i=1}^{N-1} \Delta G°_i(\text{n.n.}) + \Delta G°_{\text{init}}$$
where each $\Delta G°_i$ is the free energy contribution of the i-th nearest-neighbor doublet, and $\Delta G°_{\text{init}}$ accounts for helix initiation. The 10 unique nearest-neighbor parameters (Santa Lucia, 1998) range from:
Derivation: Nearest-Neighbor Thermodynamics for DNA Stability
The nearest-neighbor (NN) model predicts DNA duplex stability from sequence by summing contributions of each adjacent base pair step. Here we derive the framework.
Step 1: The physical basis -- base stacking dominates
DNA stability arises primarily from stacking interactions between adjacent base pairs, not from hydrogen bonding alone. The free energy of stacking depends on the identity of both the i-th and (i+1)-th base pair, hence the "nearest-neighbor" model.
Step 2: Decompose total free energy into nearest-neighbor steps
For a duplex of N base pairs, there are N-1 nearest-neighbor steps. The total standard free energy of formation is:
$$\Delta G°_{37} = \sum_{i=1}^{N-1} \Delta G°_{i,i+1}(\text{n.n.}) + \Delta G°_{\text{init}}$$
where $\Delta G°_{\text{init}}$ accounts for the helix initiation penalty (~+2 to +4 kJ/mol, depending on terminal base pairs).
Step 3: Similarly decompose enthalpy and entropy
Since $\Delta G° = \Delta H° - T\Delta S°$, we can independently sum:
$$\Delta H°_{total} = \sum_{i=1}^{N-1} \Delta H°_{i,i+1} + \Delta H°_{\text{init}}$$
$$\Delta S°_{total} = \sum_{i=1}^{N-1} \Delta S°_{i,i+1} + \Delta S°_{\text{init}}$$
Step 4: Count the unique nearest-neighbor parameters
With 4 bases, there are $4 \times 4 = 16$ possible nearest-neighbor doublets. However, due to the complementarity constraint (e.g., 5'-AG-3'/3'-TC-5' is the same step as 5'-GA-3'/3'-CT-5' read from the other strand), only 10 unique parameters are needed. These were measured by SantaLucia (1998) using melting experiments on synthetic oligonucleotides.
Step 5: Compute Tm from the NN parameters
Combining with the two-state Tm equation derived above:
$$T_m = \frac{\sum \Delta H°_{nn} + \Delta H°_{init}}{\sum \Delta S°_{nn} + \Delta S°_{init} + R\ln(C_T/4)}$$
Step 6: Worked example -- 5'-ATCG-3' / 3'-TAGC-5'
Three NN steps: AT/TA, TC/AG, CG/GC. Using SantaLucia parameters:
$$\Delta G°_{37} = \Delta G°_{\text{AT/TA}} + \Delta G°_{\text{TC/AG}} + \Delta G°_{\text{CG/GC}} + \Delta G°_{\text{init}}$$
$$= (-5.7) + (-6.6) + (-9.8) + (+3.6) = -18.5 \text{ kJ/mol}$$
More negative $\Delta G°$ = more stable duplex. This is why GC-rich sequences are more stable: the GC/GC and CG/GC steps have the most favorable stacking energies.
Most stable: GC/GC = $\Delta G°_{37} = -9.8$ kJ/mol
Next: CG/CG = $\Delta G°_{37} = -8.6$ kJ/mol
GG/CC = $\Delta G°_{37} = -8.0$ kJ/mol
Least stable: AA/TT = $\Delta G°_{37} = -5.7$ kJ/mol
AT/AT = $\Delta G°_{37} = -5.7$ kJ/mol
TA/TA = $\Delta G°_{37} = -4.7$ kJ/mol (least stable!)
Major and Minor Grooves
The double helix has two grooves of different widths, arising because the glycosidic bonds of a base pair are not diametrically opposite. The grooves are critical for protein-DNA recognition:
Major Groove (22 Angstrom wide)
- More accessible to proteins
- Contains a unique pattern of H-bond donors (D) and acceptors (A) for each base pair:
- AT: A-D-A-M (M = methyl of thymine)
- TA: M-A-D-A
- GC: A-A-D-H (H = hydrogen)
- CG: H-D-A-A
- All 4 base pairs are distinguishable in the major groove
- Transcription factors, restriction enzymes, and most DNA-binding proteins read sequence here
Minor Groove (12 Angstrom wide)
- Narrower, less accessible
- AT and TA are distinguishable from GC and CG, but AT and TA are NOT distinguishable from each other (nor GC from CG)
- Contains fewer sequence-specific contacts
- Small molecules bind here: netropsin, distamycin, Hoechst 33258 (prefer AT-rich sequences)
- TATA-binding protein (TBP) bends DNA via minor groove
- Histone-DNA contacts primarily through minor groove
Key insight: The major groove contains sufficient information to uniquely identify each base pair without opening the helix. This is why most sequence-specific DNA-binding proteins (transcription factors, restriction enzymes) interact primarily through the major groove. The minor groove is used when bending or distortion of DNA is required.
DNA Melting (Denaturation)
Melting Temperature (Tm) and Hyperchromicity
When double-stranded DNA is heated, the two strands separate (denature or "melt"). This transition can be monitored by UV absorbance at 260 nm: single-stranded DNA absorbs ~37% more UV light than double-stranded DNA (hyperchromic effect), because base stacking in the duplex reduces UV absorption.
The melting temperature (Tm) is the temperature at which 50% of the DNA is denatured. For short oligonucleotides (14-20 bp):
$$T_m = 2(n_{AT}) + 4(n_{GC}) \text{ °C}$$
where $n_{AT}$ and $n_{GC}$ are the number of AT and GC base pairs (Wallace rule, rough approximation).
For longer DNA ($>$100 bp), the Marmur-Doty equation relates Tm to GC content:
$$T_m = 69.3 + 0.41 \times (\%GC) \text{ °C}$$
(in 0.15M NaCl + 0.015M Na-citrate, SSC buffer)
Derivation: DNA Melting Temperature from Two-State Thermodynamics
We derive the relationship between Tm and thermodynamic parameters for a non-self-complementary duplex using the two-state (all-or-nothing) model.
Step 1: Define the two-state equilibrium
For duplex melting: $\text{AB (duplex)} \rightleftharpoons \text{A (ss)} + \text{B (ss)}$. The equilibrium constant is:
$$K = \frac{[\text{A}][\text{B}]}{[\text{AB}]}$$
Step 2: Express K in terms of total strand concentration
Let $C_T$ = total strand concentration. At the melting temperature, the fraction of strands in duplex form is $\alpha = 1/2$. For non-self-complementary strands with equal concentrations $C_T/2$ each:
$$[\text{A}] = [\text{B}] = \frac{C_T}{4}, \quad [\text{AB}] = \frac{C_T}{4}$$
$$K_{T_m} = \frac{(C_T/4)(C_T/4)}{C_T/4} = \frac{C_T}{4}$$
Step 3: Apply the van't Hoff equation
The equilibrium constant relates to thermodynamic parameters through:
$$\Delta G° = \Delta H° - T\Delta S° = -RT\ln K$$
Step 4: At T = Tm, substitute K = CT/4
$$\Delta H° - T_m \Delta S° = -RT_m \ln\frac{C_T}{4}$$
Step 5: Solve for Tm
$$\Delta H° = T_m\left(\Delta S° + R\ln\frac{C_T}{4}\right)$$
$$\boxed{T_m = \frac{\Delta H°}{\Delta S° + R\ln(C_T/4)}}$$
Step 6: Rearrange for the 1/Tm linear plot
Taking the reciprocal and separating terms gives the experimentally useful form:
$$\frac{1}{T_m} = \frac{R}{\Delta H°}\ln\frac{C_T}{4} + \frac{\Delta S°}{\Delta H°}$$
Plotting $1/T_m$ vs. $\ln(C_T)$ yields a straight line with slope $R/\Delta H°$. This is the standard method for extracting thermodynamic parameters from concentration-dependent melting experiments. Typical values for a 20-mer: $\Delta H° \approx -150$ kJ/mol, $\Delta S° \approx -420$ J/(mol K).
Factors Affecting Tm
- GC content: Higher GC = higher Tm (3 H-bonds vs 2, and better stacking)
- Ionic strength: Higher salt stabilizes duplex (shields phosphate repulsion). Each 10-fold increase in [Na+] raises Tm by ~16.6 C.
- Mismatches: Each 1% mismatch lowers Tm by ~1-1.5 C
- Formamide: Each 1% formamide lowers Tm by ~0.72 C (used in hybridization)
- DNA length: Shorter duplexes have lower Tm (end-fraying effects)
- pH: Extremes of pH (below 3 or above 11) cause denaturation by disrupting base pairing
Derivation: Hyperchromicity from Beer-Lambert Law and Electronic Coupling
The 37% increase in UV absorbance upon DNA denaturation (hyperchromic effect) provides the basis for monitoring melting transitions. Here we derive why base stacking suppresses UV absorption.
Step 1: Beer-Lambert Law for individual nucleotides
For a solution of isolated nucleotides at concentration c with path length l:
$$A = \varepsilon c l$$
where $\varepsilon$ is the molar extinction coefficient. At 260 nm, individual nucleotides have $\varepsilon \approx 8,000$--$15,000$ M$^{-1}$cm$^{-1}$.
Step 2: Electronic coupling between stacked bases
When bases are stacked in a duplex, the $\pi \to \pi^*$ transition dipoles of adjacent bases interact through exciton coupling. The interaction Hamiltonian between two stacked chromophores is:
$$H_{12} = \frac{\vec{\mu}_1 \cdot \vec{\mu}_2 - 3(\vec{\mu}_1 \cdot \hat{r})(\vec{\mu}_2 \cdot \hat{r})}{4\pi\varepsilon_0 r^3}$$
where $\vec{\mu}_1, \vec{\mu}_2$ are the transition dipole moments and r is their separation (~3.4 Angstrom).
Step 3: Exciton splitting creates two states
The coupling splits the excited state into two exciton states with energies $E_{\pm} = E_0 \pm H_{12}$. For parallel stacked bases, the oscillator strength redistributes: the lower-energy transition gains intensity while the higher-energy transition (at 260 nm) loses intensity.
Step 4: The effective extinction coefficient of duplex DNA
The observed extinction coefficient of duplex DNA is reduced (hypochromic) compared to the sum of individual bases:
$$\varepsilon_{duplex} = \varepsilon_{ss} \cdot (1 - h)$$
where h is the hypochromicity factor ($h \approx 0.27$ for B-DNA at 260 nm).
Step 5: Hyperchromicity upon denaturation
When the duplex denatures, stacking is disrupted and the full extinction coefficient is recovered. The relative increase is:
$$\text{Hyperchromicity} = \frac{A_{ss} - A_{ds}}{A_{ds}} = \frac{\varepsilon_{ss} - \varepsilon_{duplex}}{\varepsilon_{duplex}} = \frac{h}{1-h}$$
$$= \frac{0.27}{0.73} \approx 0.37 = 37\%$$
Step 6: Monitoring the melting transition
The fraction of denatured DNA at temperature T can be expressed as:
$$f_d(T) = \frac{A_{260}(T) - A_{260}^{ds}}{A_{260}^{ss} - A_{260}^{ds}}$$
At $T = T_m$, $f_d = 0.5$. The sharpness of the transition (cooperativity) depends on the enthalpy: a two-state transition with $\Delta H° = -400$ kJ/mol produces a transition width of only ~10 C, making Tm measurement precise.
Derivation: DNA Persistence Length from the Worm-Like Chain Model
The worm-like chain (WLC) model describes DNA as a continuous flexible rod characterized by its persistence length $L_p$. We derive the mean-square end-to-end distance.
Step 1: Define the tangent-tangent correlation
For a worm-like chain, the correlation between tangent vectors $\hat{t}(s)$ at positions s and s' along the contour decays exponentially:
$$\langle \hat{t}(s) \cdot \hat{t}(s') \rangle = e^{-|s - s'|/L_p}$$
$L_p$ is the persistence length: the characteristic distance over which the chain "remembers" its direction. For B-DNA, $L_p \approx 50$ nm (~150 bp).
Step 2: Write the end-to-end vector as an integral
The end-to-end vector $\vec{R}$ for a chain of contour length L is:
$$\vec{R} = \int_0^L \hat{t}(s) \, ds$$
Step 3: Compute the mean-square end-to-end distance
$$\langle R^2 \rangle = \langle \vec{R} \cdot \vec{R} \rangle = \int_0^L \int_0^L \langle \hat{t}(s) \cdot \hat{t}(s') \rangle \, ds \, ds' = \int_0^L \int_0^L e^{-|s-s'|/L_p} \, ds \, ds'$$
Step 4: Evaluate the double integral
Let $u = s - s'$. Using the symmetry of the integrand and computing:
$$\langle R^2 \rangle = 2L_p \int_0^L \left(1 - e^{-(L-s)/L_p}\right) ds$$
Evaluating this integral yields:
$$\boxed{\langle R^2 \rangle = 2L_p L\left[1 - \frac{L_p}{L}\left(1 - e^{-L/L_p}\right)\right]}$$
Step 5: Verify limiting cases
Rigid rod ($L \ll L_p$): Taylor expanding $e^{-L/L_p} \approx 1 - L/L_p + L^2/(2L_p^2)$ gives $\langle R^2 \rangle \approx L^2$ (as expected for a straight rod).
Flexible coil ($L \gg L_p$): The exponential term vanishes, giving $\langle R^2 \rangle \approx 2L_p L$. This is the random walk result with step size (Kuhn length) $b = 2L_p \approx 100$ nm.
Step 6: Biological significance -- DNA bending and nucleosome wrapping
The bending energy for curving DNA to radius R over length L is:
$$E_{bend} = \frac{L_p \cdot k_BT}{2} \cdot \frac{L}{R^2}$$
For nucleosome wrapping (R = 4.2 nm, L = 50 nm, $L_p = 50$ nm): $E_{bend} \approx 70 \, k_BT$. This enormous bending penalty is overcome by ~14 histone-DNA contact points and electrostatic interactions, explaining why nucleosome assembly requires histone chaperones and ATP-dependent remodeling.
Van't Hoff Analysis
For a two-state melting transition (helix to coil), the enthalpy can be determined from the shape of the melting curve using the van't Hoff equation:
$$\Delta H_{vH} = \frac{4RT_m^2}{\Delta T} \cdot f_{\max}$$
where $\Delta T$ is the width of the transition and $f_{\max}$ is the maximum slope of the fraction melted vs. temperature curve. For a true two-state transition,$\Delta H_{vH}$ should agree with calorimetric $\Delta H_{cal}$. The ratio $\Delta H_{vH}/\Delta H_{cal} < 1$ indicates intermediates (non-two-state).
Alternatively, from concentration-dependent Tm measurements (for non-self-complementary duplexes):
$$\frac{1}{T_m} = \frac{R}{\Delta H°}\ln(C_T/4) + \frac{\Delta S°}{\Delta H°}$$
Plotting 1/Tm vs. ln(CT) gives a line with slope R/$\Delta H°$ and intercept$\Delta S°/\Delta H°$.
Renaturation Kinetics: Cot Curves
When denatured DNA is slowly cooled, complementary strands reassociate (renature). The kinetics of renaturation follow second-order kinetics:
$$\frac{C}{C_0} = \frac{1}{1 + k_2 C_0 t}$$
where C is the concentration of single-stranded DNA at time t, C₀ is the initial concentration, and k₂ is the second-order rate constant. The product C₀t (pronounced "cot") at which half the DNA has reannealed is called Cot$_{1/2}$:
$$\text{Cot}_{1/2} = \frac{1}{k_2}$$
Cot$_{1/2}$ is proportional to genome complexity (the total length of unique sequence). Plotting the fraction reassociated vs. log(Cot) gives a Cot curve. For eukaryotic genomes, multiple transitions are observed:
- Fast component (low Cot$_{1/2}$): Highly repetitive sequences (satellite DNA, Alu elements). ~10-15% of human genome.
- Intermediate component: Moderately repetitive sequences (rRNA genes, transposable elements). ~25-40%.
- Slow component (high Cot$_{1/2}$): Unique (single-copy) sequences, including most protein-coding genes. ~40-50%.
Chargaff's Rules and Base Pairing
Chargaff's observation that [A] = [T] and [G] = [C] was one of the three critical clues that led to the double helix model. The rules are a direct consequence of the Watson-Crick base pairing:
$$\frac{[\text{A}] + [\text{G}]}{[\text{T}] + [\text{C}]} = 1 \quad \text{(purines = pyrimidines)}$$
Human
%GC = 40.9
Tm ~ 86 C (in SSC)
E. coli
%GC = 50.8
Tm ~ 90 C (in SSC)
M. tuberculosis
%GC = 65.6
Tm ~ 96 C (in SSC)
Python: DNA Melting Curve and Cot Analysis
DNA Melting Curves & Cot Analysis
PythonSimulate DNA melting profiles and renaturation kinetics
Click Run to execute the Python code
Code will be executed with Python 3 on the server