2.1 Nucleotide Structure
The Building Blocks of Nucleic Acids
Nucleotides are the monomers that make up nucleic acids. Each nucleotide consists of three components: a nitrogenous base, a five-carbon (pentose) sugar, and one or more phosphate groups. A nucleoside lacks the phosphate group (base + sugar only). Understanding nucleotide chemistry is essential for comprehending DNA/RNA structure, replication, transcription, and many pharmacological applications.
Naming convention: Nucleoside = base + sugar (e.g., adenosine, deoxyadenosine). Nucleotide = nucleoside + phosphate (e.g., adenosine 5'-monophosphate = AMP). The prefix "deoxy-" indicates deoxyribose sugar (DNA); without it, ribose sugar (RNA) is implied.
Nitrogenous Bases: Purine and Pyrimidine Chemistry
Purines (Bicyclic, Two-Ring System)
Purines consist of a six-membered pyrimidine ring fused to a five-membered imidazole ring. The parent compound purine has the molecular formula C₅H₄N₄. The two biologically important purines are:
Adenine (A) -- 6-aminopurine
- Formula: C₅H₅N₅
- MW: 135.13 g/mol
- Has an amino group (-NH₂) at position 6
- Pairs with T (DNA) or U (RNA) via 2 H-bonds
- pKa: 4.2 (N1 protonation)
- $\lambda_{\max}$ = 260 nm ($\varepsilon$ = 15,400 M$^{-1}$cm$^{-1}$)
- Found in ATP, NAD+, FAD, coenzyme A, SAM
Guanine (G) -- 2-amino-6-oxopurine
- Formula: C₅H₅N₅O
- MW: 151.13 g/mol
- Has a carbonyl (C=O) at position 6 and amino (-NH₂) at position 2
- Pairs with C via 3 H-bonds (strongest Watson-Crick pair)
- pKa: 3.3 (N7 protonation), 9.2 (N1 deprotonation)
- $\lambda_{\max}$ = 246 nm, 275 nm
- Found in GTP (signal transduction, G-proteins), cGMP
Purine numbering: Atoms in the six-membered ring are numbered 1-6 (N1, C2, N3, C4, C5, C6), and in the five-membered ring: N7, C8, N9. The glycosidic bond connects N9 to the C1' of the sugar. Purines can adopt syn or anti conformations about this bond.
Pyrimidines (Monocyclic, Single-Ring System)
Pyrimidines consist of a single six-membered ring containing two nitrogen atoms (at positions 1 and 3). The parent compound pyrimidine has the molecular formula C₄H₄N₂. Three biologically important pyrimidines:
Cytosine (C)
- 4-amino-2-oxopyrimidine
- C₄H₅N₃O, MW: 111.10
- Pairs with G (3 H-bonds)
- pKa: 4.5 (N3 protonation)
- Can be methylated to 5-methylcytosine (epigenetic mark)
- Prone to deamination → uracil (mutagenic!)
Thymine (T)
- 5-methyluracil (DNA only)
- C₅H₆N₂O₂, MW: 126.11
- Pairs with A (2 H-bonds)
- pKa: 9.9 (N3 deprotonation)
- 5-methyl group distinguishes T from U
- UV damage: thymine dimers (T=T)
Uracil (U)
- 2,4-dioxopyrimidine (RNA only)
- C₄H₄N₂O₂, MW: 112.09
- Pairs with A (2 H-bonds)
- pKa: 9.2 (N3 deprotonation)
- DNA uses T instead of U to detect cytosine deamination
- Uracil in DNA is removed by uracil-DNA glycosylase
Pyrimidine numbering: N1, C2, N3, C4, C5, C6. The glycosidic bond connects N1 to C1' of the sugar. Pyrimidines almost exclusively adopt the anti conformation in double-helical DNA.
Tautomeric Forms
Bases can exist in different tautomeric forms. The dominant forms at physiological pH are the amino (not imino) and keto (not enol) forms. Rare tautomers can cause mispairing during replication, leading to mutations:
Amino/Imino Tautomerism
Adenine and cytosine: the rare imino form allows A to mispair with C, and C to mispair with A. The equilibrium constant for tautomerization is approximately:
$$K_{\text{taut}} \approx 10^{-4} \text{ to } 10^{-5}$$
Keto/Enol Tautomerism
Guanine and thymine: the rare enol form allows G to mispair with T, and T to mispair with G. This contributes to the spontaneous mutation rate of approximately:
$$\mu \approx 10^{-8} \text{ to } 10^{-10} \text{ per bp per replication}$$
Pentose Sugar and Sugar Pucker
Deoxyribose (DNA)
2-deoxy-D-ribose. H at 2' position. The absence of the 2'-OH makes DNA more chemically stable than RNA (resistant to alkaline hydrolysis).
This is why DNA was selected by evolution as the long-term genetic storage molecule.
Ribose (RNA)
D-ribose. OH at 2' position. The 2'-OH enables RNA to catalyze reactions (ribozymes) and makes RNA susceptible to alkaline hydrolysis.
Half-life of RNA in 0.1M NaOH at 37 C: minutes. DNA: essentially indefinite.
Sugar Pucker Conformations
The five-membered furanose ring is not planar -- one atom is displaced from the plane of the other four. This "puckering" critically affects the overall geometry of the nucleic acid helix:
C2'-endo (South, S-type)
- C2' atom displaced toward the same side as C5' and the base
- Distance between adjacent phosphates: ~7.0 Angstrom
- Predominant in B-DNA (the standard form in cells)
- Rise per bp: 3.4 Angstrom, 10.5 bp per turn
- Phase angle (P): ~140--185 degrees
C3'-endo (North, N-type)
- C3' atom displaced toward the same side as C5' and the base
- Distance between adjacent phosphates: ~5.9 Angstrom
- Predominant in A-DNA and RNA duplexes
- Rise per bp: 2.6 Angstrom, 11 bp per turn
- Phase angle (P): ~0--36 degrees
The 2'-OH of ribose sterically favors the C3'-endo conformation, which is why RNA duplexes adopt A-form geometry rather than B-form. This difference in sugar pucker is the fundamental structural reason for the distinct helical forms of DNA and RNA.
Derivation: Sugar Pucker Energetics -- Pseudorotation Analysis
The five-membered furanose ring cannot be planar due to eclipsing strain. Its puckering is described by the pseudorotation formalism of Altona and Sundaralingam (1972).
Step 1: Define the endocyclic torsion angles
The five-membered ring has five endocyclic torsion angles $\nu_0, \nu_1, \nu_2, \nu_3, \nu_4$ defined around the ring bonds C1'-C2', C2'-C3', C3'-C4', C4'-O4', O4'-C1'. For a planar ring, all $\nu_j = 0$, but this incurs maximum eclipsing strain (~40 kJ/mol).
Step 2: Parameterize puckering with pseudorotation phase angle P and amplitude
The torsion angles are expressed as a function of two parameters -- the pseudorotation phase angle P and the puckering amplitude $\nu_{max}$:
$$\nu_j = \nu_{max} \cos(P + 4\pi j / 5), \quad j = 0, 1, 2, 3, 4$$
Step 3: Calculate P from the torsion angles
$$\tan P = \frac{(\nu_4 + \nu_1) - (\nu_3 + \nu_0)}{2\nu_2(\sin 36° + \sin 72°)}$$
P ranges from 0 to 360 degrees and specifies which atom is maximally displaced from the mean plane.
Step 4: Map P to the conformational wheel
The pseudorotation wheel maps P to sugar pucker conformations. Key positions: C3'-endo (North): P = 0-36 degrees. C2'-endo (South): P = 144-180 degrees. These two minima are separated by energy barriers at the O4'-endo (East, P ~ 90 degrees) and O4'-exo (West, P ~ 270 degrees) conformations.
Step 5: Energy as a function of pseudorotation
The potential energy surface has two minima connected by barriers of ~4-8 kJ/mol:
$$E(P) \approx E_0 + \frac{V_b}{2}\left[1 - \cos\left(\frac{2\pi(P - P_{\min})}{180°}\right)\right]$$
For deoxyribose: $V_b \approx 4$ kJ/mol, so both N and S conformers are populated at room temperature ($k_BT \approx 2.5$ kJ/mol). The ratio follows the Boltzmann distribution: $K = e^{-\Delta E/k_BT}$.
Step 6: The 2'-OH gauche effect tips the balance in RNA
In ribose, the 2'-OH group creates a strong gauche effect with the 3'-substituent. The gauche interaction energy stabilizes the C3'-endo (North) conformer by ~4-6 kJ/mol relative to C2'-endo. This shifts the equilibrium: deoxyribose is ~60:40 S:N, while ribose is ~90:10 N:S. This is why RNA universally adopts A-form geometry (C3'-endo), with its characteristic shorter phosphate-phosphate distance (~5.9 vs ~7.0 Angstrom) and wider, shallower minor groove.
The Glycosidic Bond
The N-glycosidic bond links the C1' of the sugar to N9 of purines or N1 of pyrimidines. The torsion angle about this bond ($\chi$) determines the relative orientation of the base to the sugar:
Anti Conformation
Base is oriented away from the sugar. This is the standard conformation in B-DNA and A-DNA. $\chi$ = 180 degrees +/- 90 degrees for purines, 180 degrees +/- 90 degrees for pyrimidines.
Syn Conformation
Base is oriented over the sugar. Occurs primarily in purines (less steric hindrance). Required for Z-DNA (left-handed helix). $\chi$ = 0 degrees +/- 90 degrees. Guanine in Z-DNA adopts syn.
Pyrimidines strongly prefer anti due to steric clash between the O2 carbonyl and the sugar in the syn orientation. Purines can more readily adopt syn because their five-membered ring faces the sugar, reducing steric conflict.
Derivation: Glycosidic Bond Rotation -- Anti vs Syn Energy Barriers
The glycosidic torsion angle $\chi$ determines whether a nucleotide adopts the anti or syn conformation. Here we analyze the rotational energy profile and the physical basis for conformational preferences.
Step 1: Define the glycosidic torsion angle $\chi$
For purines: $\chi$ is defined by O4'-C1'-N9-C4. For pyrimidines: $\chi$ is defined by O4'-C1'-N1-C2. The anti conformation has $\chi \approx 180° \pm 90°$ and syn has $\chi \approx 0° \pm 90°$.
Step 2: Model the rotational energy as a Fourier series
The torsional potential about $\chi$ is approximated by:
$$V(\chi) = \frac{V_1}{2}(1 + \cos\chi) + \frac{V_2}{2}(1 - \cos 2\chi) + \frac{V_3}{2}(1 + \cos 3\chi) + V_{nb}(\chi)$$
where $V_1, V_2, V_3$ are torsional barrier heights and $V_{nb}$ captures non-bonded (steric + electrostatic) interactions between the base and sugar.
Step 3: Steric analysis for purines vs pyrimidines
In the syn conformation, the six-membered ring of pyrimidines positions the O2 carbonyl directly over the sugar ring at a distance of ~2.5 Angstrom, well within the van der Waals contact distance. The steric repulsion energy approximated by the Lennard-Jones potential gives:
$$V_{steric} \propto \left(\frac{\sigma}{r}\right)^{12} \quad \text{(steep repulsion at short range)}$$
For purines in syn, the smaller five-membered ring faces the sugar, reducing the steric clash significantly.
Step 4: Energy barriers from computational studies
The barrier heights for anti $\to$ syn interconversion from molecular mechanics calculations:
Purines (deoxyadenosine): $\Delta E_{anti \to syn}^{\ddagger} \approx 8\text{--}12$ kJ/mol, $\Delta E_{anti-syn} \approx 2\text{--}4$ kJ/mol (anti slightly favored)
Pyrimidines (thymidine): $\Delta E_{anti \to syn}^{\ddagger} \approx 15\text{--}25$ kJ/mol, $\Delta E_{anti-syn} \approx 10\text{--}15$ kJ/mol (anti strongly favored)
Step 5: Population ratio from Boltzmann distribution
The fraction of molecules in the syn conformation at equilibrium:
$$\frac{N_{syn}}{N_{anti}} = e^{-\Delta E/k_BT}$$
For purines ($\Delta E \approx 3$ kJ/mol, T = 300 K): ratio $\approx 0.30$ (significant syn population). For pyrimidines ($\Delta E \approx 12$ kJ/mol): ratio $\approx 0.008$ (negligible syn population).
Step 6: Biological consequence -- Z-DNA requires syn purines
In Z-DNA, purines adopt the syn conformation while pyrimidines remain anti, creating the alternating syn-anti pattern characteristic of the left-handed helix. This is why Z-DNA forms preferentially in alternating purine-pyrimidine sequences (especially d(CG)$_n$): only purines can readily adopt syn with a modest energy penalty. The free energy cost of Z-DNA formation is approximately:
$$\Delta G_{B \to Z} \approx +2.5 \text{ kJ/mol per bp (in NaCl) to } -1.5 \text{ kJ/mol per bp (in high salt or negative supercoiling)}$$
Phosphodiester Bond Formation
The phosphodiester bond links the 3'-OH of one nucleotide to the 5'-phosphate of the next, creating the sugar-phosphate backbone of DNA and RNA. This is a condensation reaction that releases pyrophosphate (PPi):
$$\text{NMP}_n + \text{NTP} \xrightarrow{\text{polymerase}} \text{NMP}_{n+1} + \text{PP}_i$$
$$\Delta G°' \approx -3.4 \text{ kJ/mol (polymerization)}$$
$$\text{PP}_i + \text{H}_2\text{O} \xrightarrow{\text{pyrophosphatase}} 2\text{P}_i \quad \Delta G°' = -33.5 \text{ kJ/mol}$$
The overall reaction is driven forward by the subsequent hydrolysis of pyrophosphate by inorganic pyrophosphatase, making the net reaction highly exergonic ($\Delta G°' \approx -36.9$ kJ/mol). This "two-step" energy coupling ensures that polymerization is essentially irreversible under cellular conditions.
Properties of the Phosphodiester Backbone
- Negative charge: Each phosphodiester carries one negative charge at physiological pH (pKa of remaining P-OH ~ 1). This gives DNA/RNA a uniform negative charge density of ~2e$^-$ per base pair.
- Directionality: The backbone has inherent 5' to 3' directionality. By convention, sequences are written 5' → 3'.
- Stability: The phosphodiester bond is kinetically stable (half-life of DNA backbone in water at pH 7: ~30 million years). However, RNA is ~100x less stable due to 2'-OH-mediated hydrolysis.
- Hydrolysis mechanism in RNA: The 2'-OH attacks the adjacent phosphodiester bond in an intramolecular nucleophilic attack, forming a 2',3'-cyclic phosphate intermediate. This is why RNA is labile under basic conditions.
Nucleotide Analogs and Therapeutics
Modified nucleotides are critical tools in molecular biology research and powerful drugs in medicine. They work by mimicking natural nucleotides but disrupting normal nucleic acid function:
Dideoxynucleotides (ddNTPs) -- Sanger Sequencing
Lack both 2'-OH and 3'-OH. When incorporated by DNA polymerase, they terminate the growing chain because no 3'-OH is available for the next phosphodiester bond. Used in Sanger sequencing with fluorescent labels (each ddNTP -- ddATP, ddCTP, ddGTP, ddTTP -- labeled with a different fluorophore for capillary electrophoresis detection).
Azidothymidine (AZT / Zidovudine) -- HIV Treatment
A thymidine analog with an azido group (-N₃) replacing the 3'-OH. First FDA-approved antiretroviral drug (1987). Selectively inhibits HIV reverse transcriptase (which incorporates AZT-TP with ~100x higher affinity than human DNA polymerases), causing chain termination of viral DNA.$K_i \approx 0.005 \mu$M for HIV-RT vs. $K_i \approx 2 \mu$M for human DNA pol $\alpha$.
Other Important Analogs
- Acyclovir: Guanosine analog lacking most of the sugar ring. Activated by viral thymidine kinase. Treats herpes simplex virus (HSV).
- Remdesivir: Adenosine analog (C-nucleoside). Delayed chain terminator of viral RNA-dependent RNA polymerase. Used against SARS-CoV-2.
- 5-Fluorouracil (5-FU): Uracil analog that inhibits thymidylate synthase, blocking dTMP production. Anticancer drug.
- 6-Mercaptopurine: Purine analog that inhibits de novo purine synthesis. Used in leukemia treatment.
- BrdU (5-Bromodeoxyuridine): Thymidine analog incorporated into DNA. Used to label replicating cells and study cell proliferation.
Nucleotide Metabolism
Cells synthesize nucleotides through two pathways:
De Novo Synthesis
- Purines: Built atom-by-atom on ribose-5-phosphate. Precursors: glycine, glutamine, aspartate, CO₂, N10-formyl-THF. IMP (inosine monophosphate) is the branch point to AMP and GMP. ~6 ATP equivalents per purine.
- Pyrimidines: Ring assembled first (as orotate), then attached to ribose. Precursors: carbamoyl phosphate (from CO₂ + glutamine + 2 ATP) + aspartate. UMP is the precursor to CTP and dTMP.
- Regulation: Feedback inhibition at committed steps. PRPP amidotransferase (purines) inhibited by AMP, GMP. ATCase (pyrimidines) inhibited by CTP.
Salvage Pathways
- Purpose: Recycle free bases and nucleosides from nucleic acid degradation. More energy-efficient than de novo synthesis.
- Key enzymes: HGPRT (hypoxanthine-guanine phosphoribosyltransferase) converts hypoxanthine → IMP and guanine → GMP. APRT converts adenine → AMP.
- Clinical: HGPRT deficiency causes Lesch-Nyhan syndrome (severe gout, self-mutilation, neurological dysfunction) due to purine overproduction.
- dNTP synthesis: Ribonucleotide reductase (RNR) converts NDPs → dNDPs. The rate-limiting step for DNA synthesis. Regulated allosterically to balance dNTP pools.
$$\text{NDP} \xrightarrow{\text{ribonucleotide reductase}} \text{dNDP}$$
RNR uses a radical mechanism. The enzyme has specificity and activity sites that ensure balanced production of all four dNTPs. Imbalanced dNTP pools increase mutation rates.
Chargaff's Rules
Erwin Chargaff (1950) discovered key regularities in DNA base composition:
[A] = [T]
Adenine equals Thymine
[G] = [C]
Guanine equals Cytosine
Implications of Chargaff's Rules
- Total purines = total pyrimidines: [A] + [G] = [T] + [C]
- The ratio [A+T]/[G+C] varies between species (e.g., human = 1.52, E. coli = 0.93)
- These rules apply to double-stranded DNA only (not single-stranded DNA or RNA)
- Chargaff's second parity rule: the rules approximately hold for each single strand as well, which reflects the statistical properties of large genomes
- GC content is often used to characterize genomes: %GC = ([G]+[C]) / ([A]+[T]+[G]+[C]) x 100
This provided key evidence for complementary base pairing in the double helix model proposed by Watson and Crick.
Python: Nucleotide Composition Analysis
Nucleotide Composition & Chargaff's Rules
PythonAnalyze DNA base composition and verify Chargaff's rules
Click Run to execute the Python code
Code will be executed with Python 3 on the server