2.5 Chromatin Structure
Packaging the Genome
Human DNA (~2 meters per cell, ~6.4 billion bp diploid) must fit into a nucleus (~10 $\mu$m diameter). This requires approximately 10,000-fold linear compaction, achieved through hierarchical packaging into chromatin. The basic repeating unit is the nucleosome, first visualized by electron microscopy by Olins and Olins (1974) as "beads on a string."
The packaging problem in numbers: If the DNA of a single human cell were stretched end-to-end, it would be ~2 meters long. Yet it must fit inside a nucleus with a volume of ~500 $\mu$m$^3$. The total length of DNA in all ~37 trillion cells in a human body would reach from Earth to the Sun and back ~300 times.
The Nucleosome Core Particle
The high-resolution crystal structure of the nucleosome core particle (NCP) was solved at 2.8 Angstrom by Luger et al. (1997), revealing the detailed architecture of this fundamental unit of chromatin:
DNA Component
- Length: 147 bp of DNA (precisely, not ~146)
- Wrapping: 1.67 left-handed superhelical turns
- Superhelical pitch: 2.39 nm
- DNA diameter at NCP: ~11 nm
- Curvature: DNA is bent ~4.2 degrees per bp
- Minor groove contacts: 14 contact points where minor groove faces the histone octamer (every ~10 bp)
- Preferred sequences: AA/TT dinucleotides where minor groove faces inward, GC where it faces outward
Histone Octamer
- Composition: 2 copies each of H2A, H2B, H3, H4
- Assembly: (H3-H4)₂ tetramer + 2x (H2A-H2B) dimers
- Total mass: ~108 kDa (octamer) + ~100 kDa (DNA)
- Histone fold: 3 $\alpha$-helices connected by 2 loops ($\alpha1$-L1-$\alpha2$-L2-$\alpha3$)
- Handshake motif: H3-H4 and H2A-H2B form heterodimers via this motif
- Net charge: Histones are highly basic (rich in Lys, Arg; pI ~ 11). ~142 positive charges interact with ~294 negative charges on 147 bp DNA
- Histone tails: N-terminal tails (20-35 aa) extend beyond the NCP; sites for post-translational modifications
Energetics of Nucleosome Assembly
Wrapping DNA around the histone octamer involves a balance between the energetic cost of bending DNA and the favorable electrostatic and hydrogen bonding interactions:
$$\Delta G_{\text{wrap}} = \Delta G_{\text{bend}} + \Delta G_{\text{electrostatic}} + \Delta G_{\text{H-bond}}$$
The bending energy cost for wrapping 147 bp of DNA around the octamer is approximately:
$$E_{\text{bend}} = \frac{L_p \cdot k_B T}{2} \int_0^L \kappa^2(s) \, ds \approx 60\text{--}80 \, k_BT$$
where $L_p \approx 50$ nm is the DNA persistence length, $\kappa$ is the local curvature, and L = 50 nm (147 bp x 0.34 nm). The net free energy of wrapping is approximately $\Delta G \approx -40 \, k_BT$ (favorable), meaning the electrostatic and contact interactions more than compensate for the bending penalty.
Nucleosome breathing: Nucleosomes are not static. The DNA spontaneously unwraps and rewraps from the ends on a timescale of ~10-50 ms (site exposure model). This transient accessibility is critical for allowing transcription factors and repair enzymes to access nucleosomal DNA without requiring complete nucleosome disassembly.
Linker DNA and Histone H1
Linker DNA
- Length: ~20-80 bp between nucleosome cores (species/tissue dependent)
- Nucleosome repeat length (NRL): 147 bp (core) + linker
- Yeast: NRL ~ 165 bp (short linker, ~18 bp)
- Human neurons: NRL ~ 200 bp (linker ~53 bp)
- Sea urchin sperm: NRL ~ 240 bp (long linker ~93 bp)
- More accessible to nucleases than core DNA
Linker Histone H1
- Structure: Globular domain (winged-helix fold) + long C-terminal tail
- Binding: Contacts the nucleosome dyad and ~20 bp of entering/exiting linker DNA
- Stoichiometry: ~1 H1 per nucleosome (varies: 0.5-1.0)
- Function: Stabilizes higher-order chromatin folding
- Constrains entry/exit angle of linker DNA
- 8 somatic variants (H1.1--H1.5, H1.0, H1X, H1T) with distinct roles
- Stem-cell chromatin has less H1 (more open)
MNase digestion: Micrococcal nuclease (MNase) preferentially cuts linker DNA. Limited digestion produces a "nucleosome ladder" on gel electrophoresis: bands at ~147 bp (mononucleosome), ~300 bp (dinucleosome), ~450 bp (trinucleosome), etc. This was the original evidence for the repeating nucleosome structure (Noll, 1974).
Levels of Chromatin Compaction
B-form DNA, 0.34 nm per bp, 10.5 bp per turn
Nucleosomes connected by linker DNA. Visible in low-salt EM.
Higher-order folding of nucleosome array. Existence in vivo debated.
CTCF/cohesin-mediated loops of 10 nm fiber. TADs.
Chromosome territories in interphase nucleus.
Maximum compaction. Condensin-mediated. Visible by light microscopy.
Compaction ratio: The compaction ratio at each level can be calculated from the ratio of DNA contour length to the physical length of the chromatin fiber. For the nucleosome:
$$\text{Compaction} = \frac{147 \times 0.34 \text{ nm}}{11 \text{ nm}} \approx 4.5\text{-fold}$$
Including linker DNA (NRL ~ 200 bp), the compaction is ~200 x 0.34 / 11 = ~6.2-fold per nucleosome.
Derivation: DNA Packing Ratio -- From 2 nm Fiber to Metaphase Chromosome
We systematically derive the ~10,000-fold compaction required to fit 2 meters of DNA into a ~10 $\mu$m nucleus, calculating the contribution at each hierarchical level.
Step 1: Start with naked B-DNA contour length
The human diploid genome has $N = 6.4 \times 10^9$ bp. The contour length of B-DNA is:
$$L_{DNA} = N \times 0.34 \text{ nm} = 6.4 \times 10^9 \times 3.4 \times 10^{-10} \text{ m} \approx 2.18 \text{ m}$$
Diameter: 2 nm. This must fit into a nucleus of ~10 $\mu$m = $10^{-5}$ m diameter.
Step 2: Level 1 -- Nucleosome (beads on a string, 10 nm fiber)
Each nucleosome wraps 147 bp (contour length = $147 \times 0.34 = 50.0$ nm) into a disk of diameter ~11 nm and height ~5.5 nm. With NRL = 200 bp:
$$\text{Compaction}_1 = \frac{200 \times 0.34 \text{ nm}}{11 \text{ nm}} \approx 6.2\text{-fold}$$
Effective length after Level 1: $2.18 \text{ m} / 6.2 \approx 0.35$ m = 35 cm.
Step 3: Level 2 -- Chromatin fiber folding (~40-fold total)
Whether through a 30 nm fiber or irregular folding, the 10 nm fiber is further compacted by an additional ~6-7 fold:
$$\text{Compaction}_2 \approx 6\text{--}7\text{-fold additional}$$
$$\text{Cumulative} = 6.2 \times 7 \approx 40\text{-fold}$$
Effective length: ~5.5 cm. Fiber diameter ~30 nm.
Step 4: Level 3 -- Chromatin loops (~1,000-fold total)
CTCF/cohesin-mediated loops of 50-200 kb bring distant regions together. Each loop compresses ~100 kb of 30 nm fiber into a ~300 nm diameter domain:
$$\text{Compaction}_3 \approx 25\text{-fold additional}$$
$$\text{Cumulative} = 40 \times 25 \approx 1{,}000\text{-fold}$$
Effective length: ~2 mm. This is the interphase chromatin state.
Step 5: Level 4 -- Metaphase condensation (~10,000-fold total)
Condensin complexes further compact the loop domains into the characteristic metaphase chromosome shape. An additional ~10-fold condensation:
$$\text{Total compaction} \approx 1{,}000 \times 10 = 10{,}000\text{-fold}$$
$$L_{metaphase} = \frac{2.18 \text{ m}}{10{,}000} = 218 \, \mu\text{m}$$
Distributed across 46 chromosomes: ~$218/46 \approx 4.7$ $\mu$m average per chromosome (consistent with observed metaphase chromosome lengths of 2-10 $\mu$m).
Step 6: Volume verification
As a check, the volume of DNA itself is approximately:
$$V_{DNA} = \pi r^2 L = \pi (1 \text{ nm})^2 (2.18 \times 10^9 \text{ nm}) \approx 6.8 \times 10^9 \text{ nm}^3 = 6.8 \, \mu\text{m}^3$$
The nuclear volume is ~$\frac{4}{3}\pi(5)^3 \approx 524$ $\mu$m$^3$. So DNA occupies ~1.3% of the nuclear volume by itself, but with histones, the total chromatin volume is ~10-20% of the nucleus. This leaves space for the nucleoplasm, nuclear bodies, and RNA processing machinery.
Derivation: Linking Number Topology -- Lk = Tw + Wr
The linking number equation is one of the most important relationships in DNA biology. Here we derive it from the mathematics of topology applied to the double helix.
Step 1: Define linking number for two closed curves
Consider two closed curves C1 and C2 in 3D space (representing the two strands of a closed circular DNA). The Gauss linking integral defines their linking number:
$$Lk = \frac{1}{4\pi} \oint_{C_1} \oint_{C_2} \frac{(\vec{r}_1 - \vec{r}_2) \cdot (d\vec{r}_1 \times d\vec{r}_2)}{|\vec{r}_1 - \vec{r}_2|^3}$$
This integral counts the algebraic number of times one curve passes through the surface bounded by the other. It is always an integer for closed curves.
Step 2: Decompose into twist and writhe
White (1969) and Fuller (1971) showed that Lk can be decomposed into two geometric contributions. Twist (Tw) measures the winding of one strand around the helical axis, and writhe (Wr) measures the coiling of the helical axis itself in space.
Step 3: Define twist mathematically
Let $\vec{t}(s)$ be the tangent to the helical axis and $\vec{u}(s)$ be a unit vector pointing from the axis to strand 1. Twist is:
$$Tw = \frac{1}{2\pi} \int_0^L \left(\vec{u} \times \frac{d\vec{u}}{ds}\right) \cdot \vec{t} \, ds$$
For relaxed B-DNA: $Tw = N/10.5$ where N is the number of base pairs.
Step 4: Define writhe as a self-linking integral
Writhe is the Gauss integral of the helical axis curve with itself:
$$Wr = \frac{1}{4\pi} \oint \oint \frac{(\vec{r}(s_1) - \vec{r}(s_2)) \cdot \left(\frac{d\vec{r}}{ds_1} \times \frac{d\vec{r}}{ds_2}\right)}{|\vec{r}(s_1) - \vec{r}(s_2)|^3} \, ds_1 \, ds_2$$
Wr = 0 for a planar curve. Wr is positive for right-handed coiling of the axis and negative for left-handed.
Step 5: White's theorem -- the topological invariance
White's theorem (1969) proves that for any smooth ribbon (two nearby closed curves), the Gauss linking number decomposes exactly as:
$$\boxed{Lk = Tw + Wr}$$
The key insight: Lk is a topological invariant (cannot change without cutting a strand), while Tw and Wr are geometric quantities that can interconvert freely. This means you can change the shape of DNA (converting twist to writhe) without changing Lk.
Step 6: Application to nucleosomal DNA
Each nucleosome wraps DNA in ~1.67 left-handed toroidal turns. The linking number change per nucleosome is:
$$\Delta Lk_{nuc} = \Delta Tw + \Delta Wr \approx (-0.2) + (-0.8) = -1.0$$
For the human genome (~30 million nucleosomes): total $\Delta Lk \approx -30 \times 10^6$. When nucleosomes are removed (e.g., during replication or transcription), this stored negative supercoiling is released, facilitating strand separation. This is a fundamental mechanism for regulating DNA accessibility in chromatin.
The 30 nm Fiber Debate
The existence and structure of the 30 nm fiber has been one of the most debated topics in chromatin biology. Two main models were proposed:
Solenoid Model (Finch & Klug, 1976)
- Consecutive nucleosomes follow each other helically
- ~6 nucleosomes per turn of the solenoid
- Linker DNA bends between adjacent nucleosomes
- One-start helix (single continuous stack)
- Requires H1 to stabilize
- Supported by: low-salt EM of chromatin fibers
Zigzag/Two-Start Model (Woodcock, 1994)
- Alternating nucleosomes interdigitate in a zigzag
- N and N+2 nucleosomes are adjacent (not N and N+1)
- Linker DNA is relatively straight (crosses the fiber)
- Two-start helix (two intertwined stacks)
- Supported by: tetranucleosome crystal structure (2005)
Current View: No Regular 30 nm Fiber In Vivo?
Cryo-EM tomography of intact nuclei (Eltsov et al., 2008; Ou et al., 2017 using ChromEMT) found no evidence for regular 30 nm fibers in vivo. Instead, chromatin appears to exist as a disordered, interdigitated 10 nm fiber ("polymer melt" model). The 30 nm fiber may be an in vitro artifact of specific salt and reconstitution conditions. Current models suggest that interphase chromatin is organized as irregular chains of nucleosomes that fold into loops and domains through protein-mediated interactions, not through regular hierarchical coiling.
Chromatin Loops, TADs, and Chromosome Territories
Chromatin Loops: CTCF and Cohesin
The 3D organization of chromatin is largely determined by protein-mediated loops. The key players are:
CTCF (CCCTC-binding factor)
- 11-zinc-finger protein that binds specific DNA motifs
- ~55,000--80,000 binding sites in human genome
- Functions as an insulator protein
- Loop anchors typically have convergent CTCF motifs (→ ←)
- Blocks enhancer-promoter communication when between them
- Orientation-dependent: inverting CTCF sites disrupts loops
Cohesin Complex
- Ring-shaped SMC complex (SMC1, SMC3, RAD21, SA1/2)
- Topologically entraps DNA (can hold two DNA segments)
- Loop extrusion model: cohesin is loaded and extrudes DNA bidirectionally until blocked by convergent CTCF
- ATPase-driven motor activity (~1 kb/s extrusion rate)
- WAPL releases cohesin (turnover time ~20 min)
- Also essential for sister chromatid cohesion in mitosis
Loop Extrusion Model
The current leading model for how chromatin loops form: cohesin is loaded onto chromatin by the NIPBL loader, then actively extrudes DNA through its ring. Extrusion continues bidirectionally (~1 kb/s) until cohesin encounters convergently oriented CTCF sites, which act as barriers. This creates stable loops of ~100 kb--2 Mb. The model explains why CTCF orientation matters (only convergent sites block extrusion) and why loop domains are lost when cohesin or CTCF is depleted (demonstrated by auxin-inducible degron experiments).
Topologically Associating Domains (TADs)
TADs are megabase-scale chromatin domains within which genomic loci interact frequently, with relatively few contacts across TAD boundaries. Discovered by Hi-C (Dixon et al., 2012):
- Size: Median ~880 kb in mammals (range: 100 kb -- 5 Mb)
- Number: ~2,000--4,000 TADs per mammalian genome
- Conservation: TAD boundaries are ~75% conserved between human and mouse
- Boundaries: Enriched in CTCF, cohesin, housekeeping genes, tRNA genes, SINE elements
- Function: Constrain enhancer-promoter interactions. Disruption of TAD boundaries can cause developmental disease (e.g., limb malformations from boundary disruptions at the WNT6/IHH/EPHA4 locus).
A/B compartments: At a larger scale (~1-10 Mb), the genome is partitioned into A compartments (gene-rich, active, euchromatic) and B compartments (gene-poor, inactive, heterochromatic). A compartments preferentially interact with other A compartments, and B with B. Compartmentalization persists even when cohesin is depleted (unlike TADs), suggesting it arises from phase separation or polymer-polymer interactions rather than loop extrusion.
Chromosome Territories
Each chromosome occupies a discrete volume within the interphase nucleus (Cremer & Cremer, 2001). Key features:
- Gene-rich chromosomes (e.g., chr19) tend to be interior
- Gene-poor chromosomes (e.g., chr18) tend to be peripheral
- Active genes often localize at territory surfaces
- Territories have limited intermingling (but some occurs at boundaries)
- Visualized by chromosome painting (FISH with chromosome-specific probes)
- Non-random: certain chromosomes are preferential translocation partners (e.g., BCR-ABL in CML involves chr9 and chr22, which are spatial neighbors)
Euchromatin vs. Heterochromatin
Euchromatin
- Open, decondensed chromatin
- Transcriptionally active (or poised)
- Gene-rich regions
- Early-replicating in S phase
- DNase I hypersensitive
- Marks: H3K4me3, H3K36me3, H3K27ac, acetylated histones
- Low DNA methylation at promoters
- Localized in nuclear interior
- A compartment (Hi-C)
Heterochromatin
- Compact, condensed chromatin
- Transcriptionally silent
- Gene-poor regions
- Late-replicating in S phase
- DNase I resistant
- Marks: H3K9me3 (constitutive), H3K27me3 (facultative)
- High DNA methylation
- Localized at nuclear periphery (lamina-associated)
- B compartment (Hi-C)
Constitutive Heterochromatin
Permanently silenced in all cell types. Found at centromeres, telomeres, and pericentromeric regions. Rich in satellite DNA repeats. Marked by H3K9me3 and HP1 (heterochromatin protein 1). HP1 binds H3K9me3 via its chromodomain and spreads heterochromatin by recruiting the H3K9 methyltransferase SUV39H1.
Facultative Heterochromatin
Silenced in a cell-type-specific manner. Can be reactivated. Marked by H3K27me3 (deposited by Polycomb Repressive Complex 2, PRC2). Examples: inactive X chromosome (Barr body), imprinted genes, tissue-specific silencing. Key for developmental gene regulation.
Histone Modifications: The Histone Code
Post-translational modifications (PTMs) of histone tails regulate chromatin state, gene expression, DNA repair, and replication. The "histone code hypothesis" (Strahl & Allis, 2000) proposes that combinations of modifications are read by effector proteins to direct downstream functions.
Activating Marks
- H3K4me3: Active promoters (written by MLL/SET1 complexes, read by TAF3/BPTF)
- H3K36me3: Active gene bodies (written by SETD2, read by MSH6/DNMT3B)
- H3K27ac: Active enhancers and promoters (written by p300/CBP)
- H3K4me1: Poised/active enhancers (written by MLL3/4)
- H4K16ac: Decondenses chromatin (written by MOF)
- Acetylation (general): Neutralizes positive charge, opens chromatin. Read by bromodomain proteins.
Repressive Marks
- H3K9me3: Constitutive heterochromatin (written by SUV39H1/2, read by HP1)
- H3K27me3: Polycomb silencing (written by PRC2/EZH2, read by PRC1)
- H4K20me3: Heterochromatin, DNA damage response (written by SUV420H1/2)
- H3K9me2: Euchromatic silencing (written by G9a/GLP)
- Deacetylation: Closes chromatin (HDACs remove acetyl groups)
- H2AK119ub1: PRC1-mediated repression (written by RING1B)
Other Important Modifications
- H2A.X-S139ph ($\gamma$H2AX): DNA double-strand break marker (written by ATM/ATR kinases). Spreads over Mb around break sites. Critical for DNA damage response.
- H3K79me2: Transcription elongation (written by DOT1L, the only non-SET domain methyltransferase). Involved in MLL-rearranged leukemia.
- H3.3 and H2A.Z: Histone variants (not PTMs but incorporated by specialized chaperones). H3.3 marks active genes; H2A.Z marks gene regulatory regions.
Writers, readers, erasers: Each modification has enzymes that add it (writers, e.g., HATs, HMTs), remove it (erasers, e.g., HDACs, KDMs), and proteins that recognize it (readers, e.g., bromodomains read acetylation, chromodomains read methylation, Tudor domains read methylation). Disruption of these enzymes is a common feature of cancer: EZH2 mutations in lymphoma, MLL rearrangements in leukemia, p300 mutations in multiple cancers.
Python: Chromatin Compaction and Nucleosome Positioning
Chromatin Compaction Calculator
PythonCalculate DNA compaction at each level and simulate nucleosome positioning
Click Run to execute the Python code
Code will be executed with Python 3 on the server