2.2 The Double Helix

The Watson-Crick Model (1953)

James Watson and Francis Crick proposed the double helix structure of DNA in their landmark one-page paper in Nature (April 25, 1953). They integrated three key lines of evidence:

1. Chargaff's rules: [A] = [T] and [G] = [C], implying complementary base pairing.
2. X-ray diffraction data: Rosalind Franklin's Photo 51 (taken with Raymond Gosling) showed the characteristic "X" pattern of a helix, revealing the 3.4 nm repeat (pitch) and 2.0 nm diameter.
3. Chemical knowledge: Jerry Donohue corrected Watson on the tautomeric forms of the bases (keto, not enol), enabling the correct hydrogen bonding scheme.

"It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material."

-- Watson & Crick, Nature 171, 737-738 (1953)

X-ray Diffraction Evidence: Photo 51

Rosalind Franklin's Photo 51 (May 1952) is one of the most important images in the history of science. The X-ray diffraction pattern of B-form DNA fibers revealed:

What Photo 51 Showed

X-shaped pattern: Diagnostic of a helical structure (predicted by Cochran, Crick, and Vand in 1952)
Layer line spacing: 3.4 nm pitch (distance per complete turn)
Strong 10th layer line: 0.34 nm rise per residue (3.4/10 = 0.34)
Missing 4th layer line: Indicated two strands (not one) offset by 3/8 of the pitch
Diamond-shaped gaps: Major and minor groove widths

Bragg's Law

X-ray diffraction is governed by Bragg's law, relating the spacing of repeating structures (d) to the diffraction angle ($\theta$):

$$n\lambda = 2d\sin\theta$$

where n is the order of diffraction, $\lambda$ is the X-ray wavelength (typically Cu K$\alpha$ = 1.54 Angstrom), and d is the spacing between diffracting planes.

Historical note: Franklin was close to solving the structure herself. Her February 1953 manuscript (submitted before Watson and Crick's paper) described a two-chain helix with the phosphates on the outside. Watson saw Photo 51 (shown to him by Wilkins without Franklin's knowledge), which was critical for the final model. Franklin died of ovarian cancer in 1958 at age 37 and was not included in the 1962 Nobel Prize.

Structural Parameters

B-DNA: The Standard Form

Diameter: 2.0 nm (20 Angstrom)
Rise per bp: 0.34 nm (3.4 Angstrom)
Bases per turn: 10.5 (in solution; 10.0 in fiber)
Helical pitch: 3.6 nm (36 Angstrom)
Twist angle: 34.3 degrees per bp (360/10.5)

Major groove: 22 Angstrom wide, 8.5 Angstrom deep
Minor groove: 12 Angstrom wide, 7.5 Angstrom deep
Handedness: Right-handed
Sugar pucker: C2'-endo
Glycosidic bond: Anti conformation

B-DNA is the predominant form under physiological conditions (~92% relative humidity, low salt). The length of B-DNA can be calculated as:

$$L = N \times 0.34 \text{ nm}$$

where N is the number of base pairs. Human genome: 3.2 x 10$^9$ bp x 0.34 nm = ~1.09 m per haploid set.

Antiparallel Strands

5'---ATCGATCG---3'

||||||||

3'---TAGCTAGC---5'

The two strands run in opposite directions (5' to 3' and 3' to 5'). This antiparallel arrangement is essential for Watson-Crick base pairing geometry and has profound consequences for replication: one strand (leading) is synthesized continuously, while the other (lagging) must be synthesized in discontinuous Okazaki fragments because DNA polymerases can only synthesize in the 5' to 3' direction.

Watson-Crick Base Pairing Energetics

A-T Base Pair

2 hydrogen bonds (N6-H...O4 and N1...H-N3)
H-bond energy: ~7 kJ/mol (total for the pair)
Propeller twist: ~11 degrees
AT-rich regions are easier to melt (denature)
Found at replication origins (easier strand separation)
TATA box in promoters exploits lower stability

G-C Base Pair

3 hydrogen bonds (O6...H-N4, N1-H...N3, N2-H...O2)
H-bond energy: ~11 kJ/mol (total for the pair)
Propeller twist: ~12 degrees
GC-rich regions are more thermally stable
CpG islands near promoters in mammals
Thermophilic organisms tend to have higher GC content

Nearest-Neighbor Model for DNA Stability

DNA duplex stability depends not just on GC content but on the specific sequence of adjacent base pairs ("nearest-neighbor" or "stacking" interactions). The free energy of duplex formation is calculated as a sum of nearest-neighbor parameters:

$$\Delta G°_{37} = \sum_{i=1}^{N-1} \Delta G°_i(\text{n.n.}) + \Delta G°_{\text{init}}$$

where each $\Delta G°_i$ is the free energy contribution of the i-th nearest-neighbor doublet, and $\Delta G°_{\text{init}}$ accounts for helix initiation. The 10 unique nearest-neighbor parameters (Santa Lucia, 1998) range from:

Derivation: Nearest-Neighbor Thermodynamics for DNA Stability

The nearest-neighbor (NN) model predicts DNA duplex stability from sequence by summing contributions of each adjacent base pair step. Here we derive the framework.

Step 1: The physical basis -- base stacking dominates

DNA stability arises primarily from stacking interactions between adjacent base pairs, not from hydrogen bonding alone. The free energy of stacking depends on the identity of both the i-th and (i+1)-th base pair, hence the "nearest-neighbor" model.

Step 2: Decompose total free energy into nearest-neighbor steps

For a duplex of N base pairs, there are N-1 nearest-neighbor steps. The total standard free energy of formation is:

$$\Delta G°_{37} = \sum_{i=1}^{N-1} \Delta G°_{i,i+1}(\text{n.n.}) + \Delta G°_{\text{init}}$$

where $\Delta G°_{\text{init}}$ accounts for the helix initiation penalty (~+2 to +4 kJ/mol, depending on terminal base pairs).

Step 3: Similarly decompose enthalpy and entropy

Since $\Delta G° = \Delta H° - T\Delta S°$, we can independently sum:

$$\Delta H°_{total} = \sum_{i=1}^{N-1} \Delta H°_{i,i+1} + \Delta H°_{\text{init}}$$

$$\Delta S°_{total} = \sum_{i=1}^{N-1} \Delta S°_{i,i+1} + \Delta S°_{\text{init}}$$

Step 4: Count the unique nearest-neighbor parameters

With 4 bases, there are $4 \times 4 = 16$ possible nearest-neighbor doublets. However, due to the complementarity constraint (e.g., 5'-AG-3'/3'-TC-5' is the same step as 5'-GA-3'/3'-CT-5' read from the other strand), only 10 unique parameters are needed. These were measured by SantaLucia (1998) using melting experiments on synthetic oligonucleotides.

Step 5: Compute Tm from the NN parameters

Combining with the two-state Tm equation derived above:

$$T_m = \frac{\sum \Delta H°_{nn} + \Delta H°_{init}}{\sum \Delta S°_{nn} + \Delta S°_{init} + R\ln(C_T/4)}$$

Step 6: Worked example -- 5'-ATCG-3' / 3'-TAGC-5'

Three NN steps: AT/TA, TC/AG, CG/GC. Using SantaLucia parameters:

$$\Delta G°_{37} = \Delta G°_{\text{AT/TA}} + \Delta G°_{\text{TC/AG}} + \Delta G°_{\text{CG/GC}} + \Delta G°_{\text{init}}$$

$$= (-5.7) + (-6.6) + (-9.8) + (+3.6) = -18.5 \text{ kJ/mol}$$

More negative $\Delta G°$ = more stable duplex. This is why GC-rich sequences are more stable: the GC/GC and CG/GC steps have the most favorable stacking energies.

Most stable: GC/GC = $\Delta G°_{37} = -9.8$ kJ/mol

Next: CG/CG = $\Delta G°_{37} = -8.6$ kJ/mol

GG/CC = $\Delta G°_{37} = -8.0$ kJ/mol

Least stable: AA/TT = $\Delta G°_{37} = -5.7$ kJ/mol

AT/AT = $\Delta G°_{37} = -5.7$ kJ/mol

TA/TA = $\Delta G°_{37} = -4.7$ kJ/mol (least stable!)

Major and Minor Grooves

The double helix has two grooves of different widths, arising because the glycosidic bonds of a base pair are not diametrically opposite. The grooves are critical for protein-DNA recognition:

Major Groove (22 Angstrom wide)

More accessible to proteins
Contains a unique pattern of H-bond donors (D) and acceptors (A) for each base pair:
AT: A-D-A-M (M = methyl of thymine)
TA: M-A-D-A
GC: A-A-D-H (H = hydrogen)
CG: H-D-A-A
All 4 base pairs are distinguishable in the major groove
Transcription factors, restriction enzymes, and most DNA-binding proteins read sequence here

Minor Groove (12 Angstrom wide)

Narrower, less accessible
AT and TA are distinguishable from GC and CG, but AT and TA are NOT distinguishable from each other (nor GC from CG)
Contains fewer sequence-specific contacts
Small molecules bind here: netropsin, distamycin, Hoechst 33258 (prefer AT-rich sequences)
TATA-binding protein (TBP) bends DNA via minor groove
Histone-DNA contacts primarily through minor groove

Key insight: The major groove contains sufficient information to uniquely identify each base pair without opening the helix. This is why most sequence-specific DNA-binding proteins (transcription factors, restriction enzymes) interact primarily through the major groove. The minor groove is used when bending or distortion of DNA is required.

DNA Melting (Denaturation)

Melting Temperature (Tm) and Hyperchromicity

When double-stranded DNA is heated, the two strands separate (denature or "melt"). This transition can be monitored by UV absorbance at 260 nm: single-stranded DNA absorbs ~37% more UV light than double-stranded DNA (hyperchromic effect), because base stacking in the duplex reduces UV absorption.

The melting temperature (Tm) is the temperature at which 50% of the DNA is denatured. For short oligonucleotides (14-20 bp):

$$T_m = 2(n_{AT}) + 4(n_{GC}) \text{ °C}$$

where $n_{AT}$ and $n_{GC}$ are the number of AT and GC base pairs (Wallace rule, rough approximation).

For longer DNA ($>$100 bp), the Marmur-Doty equation relates Tm to GC content:

$$T_m = 69.3 + 0.41 \times (\%GC) \text{ °C}$$

(in 0.15M NaCl + 0.015M Na-citrate, SSC buffer)

Derivation: DNA Melting Temperature from Two-State Thermodynamics

We derive the relationship between Tm and thermodynamic parameters for a non-self-complementary duplex using the two-state (all-or-nothing) model.

Step 1: Define the two-state equilibrium

For duplex melting: $\text{AB (duplex)} \rightleftharpoons \text{A (ss)} + \text{B (ss)}$. The equilibrium constant is:

$$K = \frac{[\text{A}][\text{B}]}{[\text{AB}]}$$

Step 2: Express K in terms of total strand concentration

Let $C_T$ = total strand concentration. At the melting temperature, the fraction of strands in duplex form is $\alpha = 1/2$. For non-self-complementary strands with equal concentrations $C_T/2$ each:

$$[\text{A}] = [\text{B}] = \frac{C_T}{4}, \quad [\text{AB}] = \frac{C_T}{4}$$

$$K_{T_m} = \frac{(C_T/4)(C_T/4)}{C_T/4} = \frac{C_T}{4}$$

Step 3: Apply the van't Hoff equation

The equilibrium constant relates to thermodynamic parameters through:

$$\Delta G° = \Delta H° - T\Delta S° = -RT\ln K$$

Step 4: At T = Tm, substitute K = CT/4

$$\Delta H° - T_m \Delta S° = -RT_m \ln\frac{C_T}{4}$$

Step 5: Solve for Tm

$$\Delta H° = T_m\left(\Delta S° + R\ln\frac{C_T}{4}\right)$$

$$\boxed{T_m = \frac{\Delta H°}{\Delta S° + R\ln(C_T/4)}}$$

Step 6: Rearrange for the 1/Tm linear plot

Taking the reciprocal and separating terms gives the experimentally useful form:

$$\frac{1}{T_m} = \frac{R}{\Delta H°}\ln\frac{C_T}{4} + \frac{\Delta S°}{\Delta H°}$$

Plotting $1/T_m$ vs. $\ln(C_T)$ yields a straight line with slope $R/\Delta H°$. This is the standard method for extracting thermodynamic parameters from concentration-dependent melting experiments. Typical values for a 20-mer: $\Delta H° \approx -150$ kJ/mol, $\Delta S° \approx -420$ J/(mol K).

Factors Affecting Tm

GC content: Higher GC = higher Tm (3 H-bonds vs 2, and better stacking)
Ionic strength: Higher salt stabilizes duplex (shields phosphate repulsion). Each 10-fold increase in [Na+] raises Tm by ~16.6 C.
Mismatches: Each 1% mismatch lowers Tm by ~1-1.5 C
Formamide: Each 1% formamide lowers Tm by ~0.72 C (used in hybridization)
DNA length: Shorter duplexes have lower Tm (end-fraying effects)
pH: Extremes of pH (below 3 or above 11) cause denaturation by disrupting base pairing

Derivation: Hyperchromicity from Beer-Lambert Law and Electronic Coupling

The 37% increase in UV absorbance upon DNA denaturation (hyperchromic effect) provides the basis for monitoring melting transitions. Here we derive why base stacking suppresses UV absorption.

Step 1: Beer-Lambert Law for individual nucleotides

For a solution of isolated nucleotides at concentration c with path length l:

$$A = \varepsilon c l$$

where $\varepsilon$ is the molar extinction coefficient. At 260 nm, individual nucleotides have $\varepsilon \approx 8,000$--$15,000$ M$^{-1}$cm$^{-1}$.

Step 2: Electronic coupling between stacked bases

When bases are stacked in a duplex, the $\pi \to \pi^*$ transition dipoles of adjacent bases interact through exciton coupling. The interaction Hamiltonian between two stacked chromophores is:

$$H_{12} = \frac{\vec{\mu}_1 \cdot \vec{\mu}_2 - 3(\vec{\mu}_1 \cdot \hat{r})(\vec{\mu}_2 \cdot \hat{r})}{4\pi\varepsilon_0 r^3}$$

where $\vec{\mu}_1, \vec{\mu}_2$ are the transition dipole moments and r is their separation (~3.4 Angstrom).

Step 3: Exciton splitting creates two states

The coupling splits the excited state into two exciton states with energies $E_{\pm} = E_0 \pm H_{12}$. For parallel stacked bases, the oscillator strength redistributes: the lower-energy transition gains intensity while the higher-energy transition (at 260 nm) loses intensity.

Step 4: The effective extinction coefficient of duplex DNA

The observed extinction coefficient of duplex DNA is reduced (hypochromic) compared to the sum of individual bases:

$$\varepsilon_{duplex} = \varepsilon_{ss} \cdot (1 - h)$$

where h is the hypochromicity factor ($h \approx 0.27$ for B-DNA at 260 nm).

Step 5: Hyperchromicity upon denaturation

When the duplex denatures, stacking is disrupted and the full extinction coefficient is recovered. The relative increase is:

$$\text{Hyperchromicity} = \frac{A_{ss} - A_{ds}}{A_{ds}} = \frac{\varepsilon_{ss} - \varepsilon_{duplex}}{\varepsilon_{duplex}} = \frac{h}{1-h}$$

$$= \frac{0.27}{0.73} \approx 0.37 = 37\%$$

Step 6: Monitoring the melting transition

The fraction of denatured DNA at temperature T can be expressed as:

$$f_d(T) = \frac{A_{260}(T) - A_{260}^{ds}}{A_{260}^{ss} - A_{260}^{ds}}$$

At $T = T_m$, $f_d = 0.5$. The sharpness of the transition (cooperativity) depends on the enthalpy: a two-state transition with $\Delta H° = -400$ kJ/mol produces a transition width of only ~10 C, making Tm measurement precise.

Derivation: DNA Persistence Length from the Worm-Like Chain Model

The worm-like chain (WLC) model describes DNA as a continuous flexible rod characterized by its persistence length $L_p$. We derive the mean-square end-to-end distance.

Step 1: Define the tangent-tangent correlation

For a worm-like chain, the correlation between tangent vectors $\hat{t}(s)$ at positions s and s' along the contour decays exponentially:

$$\langle \hat{t}(s) \cdot \hat{t}(s') \rangle = e^{-|s - s'|/L_p}$$

$L_p$ is the persistence length: the characteristic distance over which the chain "remembers" its direction. For B-DNA, $L_p \approx 50$ nm (~150 bp).

Step 2: Write the end-to-end vector as an integral

The end-to-end vector $\vec{R}$ for a chain of contour length L is:

$$\vec{R} = \int_0^L \hat{t}(s) \, ds$$

Step 3: Compute the mean-square end-to-end distance

$$\langle R^2 \rangle = \langle \vec{R} \cdot \vec{R} \rangle = \int_0^L \int_0^L \langle \hat{t}(s) \cdot \hat{t}(s') \rangle \, ds \, ds' = \int_0^L \int_0^L e^{-|s-s'|/L_p} \, ds \, ds'$$

Step 4: Evaluate the double integral

Let $u = s - s'$. Using the symmetry of the integrand and computing:

$$\langle R^2 \rangle = 2L_p \int_0^L \left(1 - e^{-(L-s)/L_p}\right) ds$$

Evaluating this integral yields:

$$\boxed{\langle R^2 \rangle = 2L_p L\left[1 - \frac{L_p}{L}\left(1 - e^{-L/L_p}\right)\right]}$$

Step 5: Verify limiting cases

Rigid rod ($L \ll L_p$): Taylor expanding $e^{-L/L_p} \approx 1 - L/L_p + L^2/(2L_p^2)$ gives $\langle R^2 \rangle \approx L^2$ (as expected for a straight rod).

Flexible coil ($L \gg L_p$): The exponential term vanishes, giving $\langle R^2 \rangle \approx 2L_p L$. This is the random walk result with step size (Kuhn length) $b = 2L_p \approx 100$ nm.

Step 6: Biological significance -- DNA bending and nucleosome wrapping

The bending energy for curving DNA to radius R over length L is:

$$E_{bend} = \frac{L_p \cdot k_BT}{2} \cdot \frac{L}{R^2}$$

For nucleosome wrapping (R = 4.2 nm, L = 50 nm, $L_p = 50$ nm): $E_{bend} \approx 70 \, k_BT$. This enormous bending penalty is overcome by ~14 histone-DNA contact points and electrostatic interactions, explaining why nucleosome assembly requires histone chaperones and ATP-dependent remodeling.

Van't Hoff Analysis

For a two-state melting transition (helix to coil), the enthalpy can be determined from the shape of the melting curve using the van't Hoff equation:

$$\Delta H_{vH} = \frac{4RT_m^2}{\Delta T} \cdot f_{\max}$$

where $\Delta T$ is the width of the transition and $f_{\max}$ is the maximum slope of the fraction melted vs. temperature curve. For a true two-state transition,$\Delta H_{vH}$ should agree with calorimetric $\Delta H_{cal}$. The ratio $\Delta H_{vH}/\Delta H_{cal} < 1$ indicates intermediates (non-two-state).

Alternatively, from concentration-dependent Tm measurements (for non-self-complementary duplexes):

$$\frac{1}{T_m} = \frac{R}{\Delta H°}\ln(C_T/4) + \frac{\Delta S°}{\Delta H°}$$

Plotting 1/Tm vs. ln(CT) gives a line with slope R/$\Delta H°$ and intercept$\Delta S°/\Delta H°$.

Renaturation Kinetics: Cot Curves

When denatured DNA is slowly cooled, complementary strands reassociate (renature). The kinetics of renaturation follow second-order kinetics:

$$\frac{C}{C_0} = \frac{1}{1 + k_2 C_0 t}$$

where C is the concentration of single-stranded DNA at time t, C₀ is the initial concentration, and k₂ is the second-order rate constant. The product C₀t (pronounced "cot") at which half the DNA has reannealed is called Cot$_{1/2}$:

$$\text{Cot}_{1/2} = \frac{1}{k_2}$$

Cot$_{1/2}$ is proportional to genome complexity (the total length of unique sequence). Plotting the fraction reassociated vs. log(Cot) gives a Cot curve. For eukaryotic genomes, multiple transitions are observed:

Fast component (low Cot$_{1/2}$): Highly repetitive sequences (satellite DNA, Alu elements). ~10-15% of human genome.
Intermediate component: Moderately repetitive sequences (rRNA genes, transposable elements). ~25-40%.
Slow component (high Cot$_{1/2}$): Unique (single-copy) sequences, including most protein-coding genes. ~40-50%.

Chargaff's Rules and Base Pairing

Chargaff's observation that [A] = [T] and [G] = [C] was one of the three critical clues that led to the double helix model. The rules are a direct consequence of the Watson-Crick base pairing:

$$\frac{[\text{A}] + [\text{G}]}{[\text{T}] + [\text{C}]} = 1 \quad \text{(purines = pyrimidines)}$$

Human

%GC = 40.9

Tm ~ 86 C (in SSC)

E. coli

%GC = 50.8

Tm ~ 90 C (in SSC)

M. tuberculosis

%GC = 65.6

Tm ~ 96 C (in SSC)

Python: DNA Melting Curve and Cot Analysis

DNA Melting Curves & Cot Analysis

Python

Simulate DNA melting profiles and renaturation kinetics

script.py152 lines

#!/usr/bin/env python3
"""DNA melting curve simulation and Cot analysis."""
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

# === DNA Melting Curve ===
print("=== DNA Melting Temperature Calculator ===")

def marmur_doty_tm(gc_percent, salt_M=0.195):
    """Marmur-Doty equation for long DNA."""
    # Base Tm in SSC (0.195 M Na+)
    tm = 69.3 + 0.41 * gc_percent
    # Salt correction (von Ahsen, 2001)
    if salt_M != 0.195:
        tm += 16.6 * np.log10(salt_M / 0.195)
    return tm

organisms = {
    'P. falciparum': 19.4,
    'S. cerevisiae': 38.3,
    'Human': 40.9,
    'E. coli': 50.8,
    'T. thermophilus': 69.4,
    'M. tuberculosis': 65.6,
}

print(f"{'Organism':<20} {'%GC':>6} {'Tm (°C)':>10} {'Tm (0.5M NaCl)':>16}")
for org, gc in sorted(organisms.items(), key=lambda x: x[1]):
    tm_ssc = marmur_doty_tm(gc)
    tm_high = marmur_doty_tm(gc, salt_M=0.5)
    print(f"{org:<20} {gc:>6.1f} {tm_ssc:>10.1f} {tm_high:>16.1f}")

# Simulate melting curves
def melting_curve(temp, tm, width=5.0):
    """Sigmoid melting curve: fraction denatured vs temperature."""
    return 1 / (1 + np.exp(-(temp - tm) / width))

temps = np.linspace(50, 110, 500)

fig, axes = plt.subplots(1, 3, figsize=(18, 5))
fig.patch.set_facecolor('#0f172a')

# Plot 1: Melting curves for different GC contents
ax1 = axes[0]
ax1.set_facecolor('#0f172a')
colors = ['#ef4444', '#f59e0b', '#22c55e', '#3b82f6', '#8b5cf6', '#ec4899']
for (org, gc), color in zip(sorted(organisms.items(), key=lambda x: x[1]),
                             colors):
    tm = marmur_doty_tm(gc)
    frac = melting_curve(temps, tm)
    # A260 increases ~37% upon denaturation
    a260 = 1.0 + 0.37 * frac
    ax1.plot(temps, a260, color=color, linewidth=2,
             label=f'{org} ({gc:.0f}%GC)')

ax1.set_xlabel('Temperature (°C)', color='white')
ax1.set_ylabel('Relative A₂₆₀', color='white')
ax1.set_title('DNA Melting Curves (Hyperchromicity)',
              color='white', fontweight='bold')
legend1 = ax1.legend(fontsize=7, facecolor='#1e293b', edgecolor='gray')
for t in legend1.get_texts():
    t.set_color('white')
ax1.tick_params(colors='white')
for spine in ax1.spines.values():
    spine.set_color('gray')

# Plot 2: Tm vs GC content
ax2 = axes[1]
ax2.set_facecolor('#0f172a')
gc_range = np.linspace(15, 75, 100)
tm_vals = [marmur_doty_tm(gc) for gc in gc_range]
ax2.plot(gc_range, tm_vals, color='#2dd4bf', linewidth=2)
for (org, gc), color in zip(sorted(organisms.items(), key=lambda x: x[1]),
                             colors):
    tm = marmur_doty_tm(gc)
    ax2.scatter(gc, tm, color=color, s=80, zorder=5, edgecolors='white')
    ax2.annotate(org, (gc, tm), textcoords="offset points",
                 xytext=(5, 8), fontsize=6, color='white')

ax2.set_xlabel('GC Content (%)', color='white')
ax2.set_ylabel('Tm (°C)', color='white')
ax2.set_title('Tm vs GC Content (Marmur-Doty)',
              color='white', fontweight='bold')
ax2.tick_params(colors='white')
for spine in ax2.spines.values():
    spine.set_color('gray')

# Plot 3: Cot curves
ax3 = axes[2]
ax3.set_facecolor('#0f172a')

def cot_curve(cot, cot_half):
    """Fraction reassociated = 1 - 1/(1 + k2*C0*t)."""
    return 1 - 1 / (1 + cot / cot_half)

cot_values = np.logspace(-3, 5, 1000)

# Eukaryotic genome components
# Highly repetitive (fast), moderate, unique (slow)
frac_fast = 0.12   # 12% highly repetitive
frac_mod = 0.30    # 30% moderately repetitive
frac_slow = 0.58   # 58% unique

cot_half_fast = 0.01
cot_half_mod = 5.0
cot_half_slow = 1000.0

# Total reassociation
total = (frac_fast * cot_curve(cot_values, cot_half_fast) +
         frac_mod * cot_curve(cot_values, cot_half_mod) +
         frac_slow * cot_curve(cot_values, cot_half_slow))

ax3.semilogx(cot_values, total, color='#f472b6', linewidth=2.5,
             label='Human genome (total)')
ax3.semilogx(cot_values, frac_fast * cot_curve(cot_values, cot_half_fast),
             color='#ef4444', linewidth=1, linestyle='--',
             label=f'Repetitive ({frac_fast*100:.0f}%)')
ax3.semilogx(cot_values, frac_fast + frac_mod * cot_curve(cot_values, cot_half_mod),
             color='#f59e0b', linewidth=1, linestyle='--', alpha=0.5)
ax3.axhline(y=0.5, color='white', linestyle=':', alpha=0.3)

# E. coli (single component)
ecoli_cot = cot_curve(cot_values, 4.0)
ax3.semilogx(cot_values, ecoli_cot, color='#22c55e', linewidth=2,
             label='E. coli (single-copy)')

ax3.set_xlabel('Cot (mol·s/L)', color='white')
ax3.set_ylabel('Fraction Reassociated', color='white')
ax3.set_title('Cot Curves: Renaturation Kinetics',
              color='white', fontweight='bold')
legend3 = ax3.legend(fontsize=7, facecolor='#1e293b', edgecolor='gray',
                     loc='upper left')
for t in legend3.get_texts():
    t.set_color('white')
ax3.tick_params(colors='white')
for spine in ax3.spines.values():
    spine.set_color('gray')
ax3.set_ylim(-0.05, 1.05)

plt.tight_layout()
plt.savefig('output.png', dpi=150, bbox_inches='tight', facecolor='#0f172a')

print("\n=== Cot Analysis Summary ===")
print(f"E. coli: Cot_1/2 = 4.0 (genome = 4.6 Mb, single component)")
print(f"Human genome components:")
print(f"  Highly repetitive: {frac_fast*100:.0f}%, Cot_1/2 = {cot_half_fast}")
print(f"  Moderately repetitive: {frac_mod*100:.0f}%, Cot_1/2 = {cot_half_mod}")
print(f"  Unique sequences: {frac_slow*100:.0f}%, Cot_1/2 = {cot_half_slow}")
print("\nPlots saved: melting curves, Tm vs GC, and Cot curves.")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

← Nucleotide Structure DNA Conformations →

Share:X Reddit LinkedIn