Module 0: Origin & Endosymbiosis

Every mitochondrion carries an echo of a lost world: ~1.5–1.8 billion years ago, a free-living α-proteobacterium entered an archaeal host cell and, improbably, survived. Lynn Margulis (1967) argued against decades of orthodoxy that the organelle inside every eukaryotic cell was once a bacterium. We retrace the evidence, from 16S rRNA phylogenies through Asgard-archaea genomics to Lane & Martin’s (2010) energy-per-gene argument for why internalized bioenergetics is the prerequisite of eukaryotic complexity, and we derive the population-genetic signature of the founding event.

Video: Mitochondria — The Mysterious Cellular Parasite

A provocative framing of endosymbiotic origin: the organelle we can no longer live without arrived as something closer to a parasite than a gift. Useful preparation for the Margulis / Lane–Martin arguments that follow.

1. Margulis 1967: The Serial Endosymbiotic Theory

In 1967 Lynn Sagan (later Margulis) published “On the Origin of Mitosing Cells” in the Journal of Theoretical Biology after more than a dozen journal rejections. She proposed that three organelles in the eukaryotic cell—mitochondria, plastids, and (controversially) undulipodia (flagella)—originated as free-living prokaryotes that were engulfed by a host cell and never digested. This is the Serial Endosymbiotic Theory (SET).

The SET was an old idea in new clothes. Andreas Schimper (1883), Konstantin Mereschkowski (1905), and Ivan Wallin (1927) had all proposed bacterial ancestry for plastids or mitochondria; Wallin even claimed to have cultured mitochondria in vitro (he had not). By the mid-20th century the hypothesis had been dismissed as fringe. Margulis revived it, added rigor, and backed it with the newly available electron-microscopy record—double membranes, bacterial-sized ribosomes inside the organelles, a circular DNA molecule with a bacterial G+C content, antibiotic sensitivities that matched bacteria rather than cytoplasm.

The five classical lines of evidence for endosymbiosis:

Double membrane architecture (outer = host phagosomal; inner = bacterial plasma)
Circular DNA genome of prokaryotic size (~16.5 kb in human mitochondria)
70S ribosomes (bacterial) rather than 80S (cytoplasmic)
Sensitivity to prokaryotic antibiotics (chloramphenicol, erythromycin)
Binary fission independent of the cell cycle

SET was fully vindicated by molecular phylogenetics in the 1970s and 1980s. The rise of 16S ribosomal-RNA sequencing in Carl Woese’s laboratory placed mitochondrial rRNA squarely inside the α-proteobacterial clade, with closest affinities to modern Rickettsiales and SAR11-like marine aerobes. The plastid branch landed equally clearly inside the cyanobacteria.

2. Gray 1999: Molecular Phylogenies Pin the Ancestor

Michael Gray’s review in Science (1999) consolidated the 16S rRNA, cytochrome oxidase, and ATP synthase phylogenies. The mitochondrial ancestor branched inside the α-proteobacteria—a single origin, an unambiguous topology, though the precise sister taxon (Rickettsiales? SAR11?) depended on the gene chosen and the outgroup tree. Modern genome-scale phylogenomics (Wang & Wu 2015, Martíjn et al. 2018) still converges on a deep-branching α-proteobacterium, possibly related to the free-living marine SAR11 clade of Pelagibacter.

The phylogenetic signal is preserved in the 13 mitochondrial-encoded proteins of the electron transport chain (cytochrome b, three subunits of Complex IV, seven subunits of Complex I, two subunits of ATP synthase). These are the genes that refused to migrate to the nucleus, for reasons that remain debated (the CoRR hypothesis—co-location for redox regulation—of John Allen 2015 is the leading model). Their bacterial origin is transparent in sequence.

\[ \text{Rickettsia prowazekii genome} \;\leftrightarrow\; \text{human mtDNA}:\; \text{~70\%\ amino-acid identity for cyt}\,b \]

Andersson et al. (1998, Nature) sequenced R. prowazekii and demonstrated direct homology to mitochondrial OxPhos proteins. Rickettsias are obligate intracellular parasites—a living window into the physiology of the ancestral endosymbiont.

A crucial point from Gray is that mitochondria share a single common ancestor. All extant eukaryotes—animals, plants, fungi, protists—descend from a lineage that already contained mitochondria. The few apparently amitochondriate protists (diplomonads, parabasalids, microsporidia, pelobionts, entamoebids) are not primitive; they are secondarily derived, harboring reduced organelles called hydrogenosomes or mitosomes.

3. The Host: Asgard Archaea and Lokiarchaeota

Margulis left the host cell ambiguous. The question of “who swallowed the bacterium?” was finally resolved in the 2010s by metagenomic sequencing of deep-sea sediment from Loki’s Castle (Spang et al. 2015, Nature; Zaremba-Niedzwiedzka et al. 2017, Nature). The Asgard archaeal superphylum—Lokiarchaeota, Thorarchaeota, Odinarchaeota, Heimdallarchaeota, Helarchaeota—encodes hundreds of eukaryotic signature proteins (ESPs): actin homologs (profilin, gelsolin-like, Arp2/3), tubulin relatives, Ras-superfamily GTPases, ubiquitin modifier cascades, ESCRT machinery.

Roger, Muñoz-Gómez & Kamikawa (2017, Curr. Biol.) synthesized the state of the art into a coherent eukaryogenesis model: the host of the proto-mitochondrion was an Asgard archaeon, most likely a close relative of Lokiarchaeum; the nucleus, the endomembrane system, and the cytoskeleton all emerged during the integration with the bacterial symbiont, not before. In 2020 Imachi et al. reported the first cultivated Asgard archaeon, Candidatus Prometheoarchaeum syntrophicum—a small coccoid cell with long membrane protrusions that interact with partner bacteria.

The E3 (“eocyte”) tree:

Modern phylogenomics places eukaryotes within the archaea, not as a separate domain: Eukarya branches inside Asgardarchaeota. The tree of life has two primary domains (Bacteria and Archaea), and eukaryotes are a secondary, chimeric lineage—an archaeal host that merged with a bacterial endosymbiont.

A consequence: the eukaryotic cell is a chimera. The cytoplasm, ribosomes, replication and transcription machinery trace to archaea; the mitochondria and the phospholipids of the cellular membranes (a bacterial-type ester-linked bilayer replaced the archaeal ether-linked system) trace to bacteria. This transition is possibly the single most unusual evolutionary event in 3.8 billion years of life on Earth.

4. A Single Event: Nasir 2020 and the Uniqueness of Eukaryogenesis

The mitochondrial endosymbiosis happened once, at the stem of all eukaryotes. Despite tens of thousands of independent origins of eukaryotic lineages spanning 1.5+ billion years, no other bacterial endosymbiosis has produced a new branch of the tree of life with comparable complexity. Why?

Nasir, Forterre & Caetano-Anollés (2020) and Martin, Tielens & Müller (2020) argue that this singularity reflects a genuinely rare evolutionary transition. The prerequisites are stringent:

A host and a symbiont metabolically interlocked (syntrophy): H₂ exchange, or a redox partnership
An engulfment event (phagocytosis or cell-cell fusion) without digestion
Survival of the symbiont through many host divisions: vertical transmission
Gene transfer to the host nucleus—hundreds of independent transfer events
Targeting machinery to return the transferred products to the organelle

Each step has non-trivial probability; their product is small. The fact that plastids arose independently several times (primary endosymbiosis in Archaeplastida; secondary/tertiary in chromalveolates, chlorarachniophytes, euglenoids) suggests that after the invention of eukaryotic cell biology—phagocytosis, endomembrane trafficking, a nucleus to receive transferred genes—endosymbiosis becomes much easier. The first event, starting from a prokaryote, seems to require a singular coincidence.

5. Lane & Martin 2010: The Energy-per-Gene Argument

Nick Lane and William Martin’s 2010 Nature paper, “The energetics of genome complexity,” asks why prokaryotes have stayed bacteria for 3+ billion years while eukaryotes evolved nuclei, meiosis, sex, mitosis, multicellularity, brains. Their answer: bioenergetic constraint.

A bacterium runs its electron-transport chain on its plasma membrane. Its power output scales with surface area $S \sim r^2$ while the genome it must maintain scales with cell volume $V \sim r^3$. The available ATP per unit of DNAtherefore falls as $S/V \sim 1/r$: a giant bacterium is energy-starved per gene.

\[ P_{\text{per gene}}^{\text{prok}} \;\propto\; \frac{\sigma\,S}{\rho_g\,V} \;\sim\; \frac{1}{r} \]

$\sigma$ = ETC power density (W/m²),$\rho_g$ = gene density per unit volume.

A eukaryote with $N_{\text{mito}}$ mitochondria, each of which amplifies its inner membrane surface by a factor $A_{\text{crist}} \sim 25\text{-}30$through cristae, hosts a total bioenergetic membrane that scales linearly with cell volume. The power per gene therefore becomes size-independent—and the cell can afford orders of magnitude more genes.

\[ P_{\text{per gene}}^{\text{euk}} \;\propto\; \frac{\sigma\,A_{\text{crist}}\,N_{\text{mito}}\,S_{\text{outer}}}{\rho_g\,V} \;\sim\; \text{const} \]

Lane & Martin’s headline number: eukaryotic cells have ~200,000× more power per gene than the average prokaryote.

The empirical evidence is compelling. Truly giant bacteria do exist—Thiomargarita namibiensis (750 μm), Epulopiscium fishelsoni (700 μm)—but they pay by carrying tens of thousands of copies of their genome to keep up with the volume-scaling metabolic demand. They have not radiated. Bacteria cannot use their membrane estate to grow complex because that membrane must also run the ETC.

Mitochondria decouple information storage (the nucleus) from energy generation (many bioenergetic membranes in each cell). This is Lane’s central claim: eukaryotic complexity is a thermodynamic phenomenon, not a merely genetic one.

6. Horizontal Gene Transfer and the Rise of the Nuclear Genome

When the α-proteobacterial ancestor entered the archaeal host it carried ~3000–4000 genes. The modern human mitochondrial genome encodes only 37. Where did the others go?

The answer is endosymbiotic gene transfer (EGT): over evolutionary time, bacterial DNA leaked into the host nucleus, was integrated there, and evolved a targeting peptide that returned the protein to the organelle. Timmis, Ayliffe, Huang & Martin (2004, Nat. Rev. Genet.) estimate that over 90% of the proteobacterial ancestor’s gene complement moved to the nucleus. The nuclear genome today contains roughly 1000–1500 genes of mitochondrial origin.

Why does anything stay in mitochondria?

The 13 proteins that remain are all membrane-embedded, highly hydrophobic OxPhos subunits. John Allen’s CoRR hypothesis (co-location for redox regulation, Allen 2015,PNAS) proposes that these genes must be transcribed locally so that their expression can be regulated by the redox state of the ETC they sit in. A nucleus-encoded copy cannot respond fast enough to mitochondrial redox stress.

Genes that moved to the nucleus needed a re-entry ticket. An N-terminal mitochondrial targeting sequence (MTS)—an amphipathic α-helix of ~20–50 residues, rich in positive and hydrophobic residues—was added, typically by recombination with a pre-existing targeted gene. The MTS is recognized by the TOM complex (translocase of the outer membrane) at the OMM and then passed to the TIM23 complexat the inner membrane, after which it is cleaved by mitochondrial processing peptidase (MPP).

In plastids, an analogous machinery (TOC/TIC) handles import. Some eukaryotic lineages with secondary plastids (derived by engulfing an already-plastid-bearing alga) use a SELMA translocon (“symbiont-specific ERAD-like machinery”) to cross the extra membrane layers; the SELMA motif is a feature of four-membrane plastids in diatoms and related organisms. No SELMA-equivalent exists in mitochondria; the two-membrane topology is handled by TOM/TIM alone.

EGT is still occurring. NUMTs (nuclear-mitochondrial DNA segments) are fragments of mtDNA that have recently jumped to the nucleus; the human genome contains hundreds. The transfer is one-way: once in the nucleus, a gene competes under a completely different selective regime.

7. Evolution of Cristae

The modern mitochondrion is not just a bacterium in a double membrane; its inner membrane is folded into invaginations called cristae. Cristae amplify inner-membrane surface area by ~25–30× and concentrate the ETC into locally curved compartments with a high pH at the crista junction.

Cristae topology varies with tissue and organism: lamellar (flattened) in most vertebrate tissues, tubular in steroidogenic cells (adrenal cortex, Leydig cells, corpus luteum) to favor hydroxylase chemistry, and paracrystalline in insect flight muscle and brown adipose tissue where the ETC is packed almost as a 2D crystal for maximum ATP throughput.

Cristae shape is set by the MICOS complex(Mitofilin/MIC60, MIC19, MIC10, MIC26/27) at crista junctions (Rabl et al. 2009; Pfanner et al. 2014) and by OPA1 oligomerization on the matrix face. A reduction of OPA1 unravels cristae, exposes cytochrome c, and initiates the intrinsic apoptotic cascade (Scorrano et al. 2002, Dev. Cell).

Cristae in non-animal lineages:

Euglenozoa (including trypanosomes): discoidal cristae
Ciliates: tubular cristae with unusual MICOS architecture
Alveolates: flat lamellar cristae with pores
Jakobid flagellates: ancestral mitochondrial gene complement, lamellar cristae

The diversity of cristae geometry across eukaryotes suggests the MICOS/OPA1 system has been remodeled many times to match each lineage’s metabolic style.

8. Hydrogenosomes and Mitosomes: Degenerate Relics

Not every eukaryote runs aerobic OxPhos. Parasitic and anaerobic lineages have reduced their mitochondria drastically, and for decades these organelles were mistaken for primitive (pre-mitochondrial) states. Molecular markers—heat-shock proteins, iron-sulfur cluster biosynthesis—eventually revealed them as degenerate mitochondria.

Hydrogenosomes (trichomonads, anaerobic ciliates, chytrid fungi): produce ATP via substrate-level phosphorylation with pyruvate: ferredoxin oxidoreductase and an Fe-hydrogenase, exhaling H₂. No ETC, usually no DNA. Müller et al. (2012, Microbiol. Mol. Biol. Rev.) reviews.
Mitosomes (Giardia lamblia, Entamoeba histolytica, microsporidia): minimal organelles (~200–400 nm) whose sole retained function is iron-sulfur cluster biosynthesis. No ATP production, no cristae, often no genome. Their sole evolutionary raison d’être is that Fe-S assembly must occur in an enclosed compartment for chemical reasons (Tovar et al. 2003,Nature).
Anaerobic mitochondria (Nyctotherus, intestinal parasites): retain partial ETC using alternative terminal acceptors (fumarate, nitrate).

That mitosomes exist and are universally derived from full mitochondria is, perhaps, the strongest single argument that the endosymbiotic event occurred once at the stem of eukaryotes: every extant amitochondriate lineage carries a clear molecular fingerprint of the lost organelle.

9. Rickettsia: A Living Window into the Ancestor

Rickettsia prowazekii, the agent of epidemic typhus, cannot grow outside a eukaryotic host. It is an obligate intracellular parasite with a genome of only ~1.1 Mb (Andersson et al. 1998). Its metabolism is strikingly reduced: it imports ATP directly from the host cytoplasm via a ADP/ATP translocase that is structurally and mechanistically identical to the one mitochondria use to export ATP to the host—just run backwards.

Rickettsial biology is, in effect, a controlled experiment in endosymbiosis: the early stages of the mitochondrial integration may have resembled a parasitic intrusion that became mutualistic only when gene transfer stabilized the system. The closely related Wolbachia (a symbiont of arthropods and nematodes) sits in the ambiguous zone between parasite and mutualist—sometimes required for host reproduction, sometimes a manipulator.

The Rickettsia “time machine”:

1.1 Mb genome, ~834 protein genes (vs. ~4300 in free-living E. coli)
Depends on the host for amino acids, nucleotides, and ATP
Retains cytochrome oxidase and ATP synthase despite siphoning ATP
Closest free-living relative: SAR11 (Pelagibacter ubique) in ocean surface water

10. Mitonuclear Coadaptation

Human OxPhos complexes are chimeras: Complex I has 7 mtDNA-encoded and ~38 nuclear-encoded subunits; Complex IV has 3 mtDNA + 11 nuclear; ATP synthase has 2 mtDNA + ~15 nuclear. Every complex must be assembled from parts encoded in two genomes with radically different evolutionary dynamics: mtDNA mutates 10–100× faster than nDNA and is inherited uniparentally.

This imposes mitonuclear coadaptation. A mtDNA variant that changes a mitochondrial Complex I subunit must be matched by a compensating nuclear variant. Mismatched pairs—introduced by crossing distant populations— cause hybrid breakdown in many species (Dowling, Friberg & Lindell 2008,Trends Ecol. Evol.; Sloan et al. 2018, Curr. Biol.).

\[ \text{ATP synthase}\;=\; F_o(\text{2 mtDNA subunits}) \;+\; F_1(\text{15 nDNA subunits}) \]

Twelve distinct genomes contributing to one holoenzyme (two parent genomes × two types). Compare to the bacterial F_oF₁, encoded in a single operon.

Mitonuclear incompatibility is one reason mitochondrial replacement therapy (“three-parent IVF”) is medically regulated: replacing the mitochondria of a fertilized egg can introduce a mismatch with the nuclear background. We return to this in Module 8.

11. Visual Timeline: From LUCA to Eukaryotes

Simulation 1: Lane-Martin Energy-per-Gene Scaling

Reproduce the central Lane & Martin (2010) figure: power per gene across a tree-of-life sample, the theoretical $S/V$ scaling argument for why bacteria remain small, and how mitochondrial inner-membrane amplification ($A_{\text{crist}} N_{\text{mito}}$) decouples bioenergetics from cell geometry. The 10⁵× advantage of eukaryotic over prokaryotic $P/\text{gene}$ emerges naturally.

Python

script.py157 lines

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

# Lane & Martin (2010) energy-per-gene scaling
# --------------------------------------------
# Prokaryotes pay the full genome-maintenance cost from a single plasma
# membrane that also carries the ETC. Eukaryotes outsource ATP production
# to many internal mitochondrial inner membranes: this multiplies the
# bioenergetic membrane area available per unit of nuclear DNA by a factor
# ~ N_mito (number of mitochondria per cell), each with a cristae area
# ~ A_crist roughly 20-30x the area of the enclosing outer membrane.
#
# The paper's central figure: plot metabolic power per haploid genome
# (W / gene) against gene number across the tree of life. Bacteria are
# capped by membrane area scaling ~ L^2 whereas genome scales ~ L^3
# (where L is linear cell size). Mitochondria break this by internalizing
# the bioenergetic membrane.

# Representative organisms (name, gene count, metabolic rate W, mass kg)
# Values adapted from Lane & Martin 2010 Nature and Makarieva et al. 2008
organisms = [
    # prokaryotes
    ("E. coli",         4300,  2.1e-13, 1e-15, "prok"),
    ("B. subtilis",     4100,  1.8e-13, 1e-15, "prok"),
    ("Thiomargarita",   4500,  2.0e-11, 1e-10, "prok"),  # giant bacterium
    ("Epulopiscium",    4000,  1.5e-11, 1e-10, "prok"),
    # unicellular eukaryotes
    ("Saccharomyces",   6000,  6.0e-12, 4e-14, "euk"),
    ("Tetrahymena",    27000,  1.2e-10, 1e-12, "euk"),
    ("Paramecium",     40000,  4.0e-10, 3e-12, "euk"),
    ("Amoeba",         34000,  3.2e-10, 2e-12, "euk"),
    # multicellular
    ("Chlamydomonas",  17000,  3.0e-11, 5e-14, "euk"),
    ("Drosophila",     14000,  2.5e-8,  1e-6,  "euk"),
    ("Danio rerio",    26000,  8.0e-6,  5e-4,  "euk"),
    ("Homo sapiens",   20000,  1.0e+2,  7e+1,  "euk"),
]

names  = [o[0] for o in organisms]
genes  = np.array([o[1] for o in organisms], dtype=float)
power  = np.array([o[2] for o in organisms], dtype=float)   # watts
mass   = np.array([o[3] for o in organisms], dtype=float)   # kg
kind   = [o[4] for o in organisms]

# Power per gene (key Lane-Martin figure)
wpg = power / genes

# Power per gene, per unit mass (biomass-normalized)
wpgm = wpg / mass

# Separate groups
prok = [i for i, k in enumerate(kind) if k == "prok"]
eukk = [i for i, k in enumerate(kind) if k == "euk"]

wpg_prok = wpg[prok].mean()
wpg_euk  = wpg[eukk].mean()
fold     = wpg_euk / wpg_prok
print(f"Mean power per gene, prokaryotes : {wpg_prok:.3e} W/gene")
print(f"Mean power per gene, eukaryotes  : {wpg_euk:.3e} W/gene")
print(f"Eukaryote / prokaryote advantage : {fold:.0f}x")

# Theoretical membrane-area scaling
# Prokaryote: surface area S ~ 4*pi*r^2, volume V ~ (4/3)*pi*r^3.
# ATP flux through plasma membrane: P_prok ~ sigma * S, sigma ~ ETC density.
# Genome cost scales with V: P / genome ~ S/V ~ 1/r -> drops as r grows!
# Eukaryote: multiple mitos; total IMM area ~ N_mito * A_crist * S_outer ~ V.
# So P_euk / genome ~ const, independent of cell size.

r_cell = np.logspace(-7, -4, 200)   # 100 nm to 100 um
sigma  = 1e-4                       # W per m^2 of ETC membrane (arbitrary)
gene_density = 1e14                 # bp per m^3 (arbitrary)

# Surface and volume scaling
S = 4 * np.pi * r_cell**2
V = (4.0/3.0) * np.pi * r_cell**3

# Power per gene (prok): flux on outer membrane / total DNA in volume
# P_prok_pg  ~ (sigma * S) / (gene_density * V)  ~ 3 sigma / (gene_density * r)
P_pg_prok = (sigma * S) / (gene_density * V)

# Power per gene (euk): mitochondrial IMM scales with V, so P ~ const
P_pg_euk  = np.full_like(r_cell, 3 * sigma / (gene_density * 1e-6))

# Integrated differential advantage
diff = P_pg_euk - P_pg_prok
# integrate with trapezoid rule
area_diff = np.trapezoid(np.maximum(diff, 0.0), r_cell)
print(f"Integrated euk advantage over size range: {area_diff:.3e} W*m/gene")

# ---------- Plot ----------
fig, axes = plt.subplots(2, 2, figsize=(13.5, 10))
fig.patch.set_facecolor('#0a0a1a')
for ax in axes.ravel():
    ax.set_facecolor('#0a0a1a')
    ax.tick_params(colors='#cbd5e1')
    for s in ax.spines.values(): s.set_color('#334155')
    ax.grid(True, color='#334155', alpha=0.35)

ax1, ax2, ax3, ax4 = axes.ravel()

# --- Panel 1: power per gene across organisms (bar) ---
colors = ['#f59e0b' if k == 'prok' else '#fb923c' for k in kind]
order  = np.argsort(wpg)
ax1.barh(np.array(names)[order], wpg[order], color=np.array(colors)[order],
         edgecolor='#1e293b')
ax1.set_xscale('log')
ax1.set_xlabel('power per gene (W / gene)', color='#cbd5e1')
ax1.set_title('Lane-Martin energy-per-gene across the tree of life',
              color='#fed7aa', fontweight='bold')
ax1.tick_params(axis='y', colors='#cbd5e1', labelsize=9)

# --- Panel 2: genes vs mass (log-log) ---
p_idx = np.array([i for i, k in enumerate(kind) if k == 'prok'])
e_idx = np.array([i for i, k in enumerate(kind) if k == 'euk'])
ax2.loglog(mass[p_idx], genes[p_idx], 'o', color='#fbbf24', markersize=10,
           label='prokaryotes', markeredgecolor='#1e293b')
ax2.loglog(mass[e_idx], genes[e_idx], 's', color='#fb923c', markersize=10,
           label='eukaryotes', markeredgecolor='#1e293b')
ax2.set_xlabel('organism mass (kg)', color='#cbd5e1')
ax2.set_ylabel('gene count', color='#cbd5e1')
ax2.set_title('Genome complexity vs. organism mass',
              color='#fed7aa', fontweight='bold')
ax2.legend(facecolor='#0f172a', edgecolor='#334155', labelcolor='#cbd5e1', fontsize=10)

# --- Panel 3: theoretical scaling power/gene vs cell radius ---
ax3.loglog(r_cell*1e6, P_pg_prok, '-', color='#fbbf24', linewidth=2.5,
           label=r'prokaryote: $\propto 1/r$')
ax3.loglog(r_cell*1e6, P_pg_euk,  '-', color='#fb923c', linewidth=2.5,
           label='eukaryote (mitos): constant')
ax3.axvspan(0.3, 3.0, color='#fbbf24', alpha=0.08)
ax3.axvspan(3.0, 100.0, color='#fb923c', alpha=0.08)
ax3.set_xlabel('cell radius (um)', color='#cbd5e1')
ax3.set_ylabel('power per gene (W/gene, arb.)', color='#cbd5e1')
ax3.set_title('Why bacteria cannot grow big: S/V limit',
              color='#fed7aa', fontweight='bold')
ax3.legend(facecolor='#0f172a', edgecolor='#334155', labelcolor='#cbd5e1', fontsize=10)

# --- Panel 4: number of mitochondria needed to match required power ---
N_mito = np.logspace(0, 5, 200)
A_crist = 30.0   # fold amplification by cristae
A_outer = 1.0
total_etc_area = N_mito * A_crist * A_outer
ax4.loglog(N_mito, total_etc_area, '-', color='#fb923c', linewidth=2.5,
           label='IMM area (arb. units)')
ax4.axhline(1.0, color='#fbbf24', linestyle='--', linewidth=2,
            label='bacterial PM area')
ax4.set_xlabel('mitochondria per cell', color='#cbd5e1')
ax4.set_ylabel('total bioenergetic membrane area', color='#cbd5e1')
ax4.set_title('Internalizing energy: N mitos x 30x cristae amplification',
              color='#fed7aa', fontweight='bold')
ax4.legend(facecolor='#0f172a', edgecolor='#334155', labelcolor='#cbd5e1', fontsize=10)

plt.tight_layout()
plt.savefig('output.png', dpi=120, bbox_inches='tight', facecolor='#0a0a1a')

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Simulation 2: Coalescent Model of the Endosymbiosis Bottleneck

Kingman coalescent simulation of the α-proteobacterial endosymbiont as it passes from a free-living population ($N_e \sim 10^9$) through a severe acquisition bottleneck ($N_b \sim 10$) into a stably vertically transmitted endosymbiotic meta-population. We track the compression of $T_{\text{MRCA}}$ and the three-orders-of-magnitude collapse of nucleotide diversity $\pi = 2\mu N_e$ that leaves its signature in the reduced mtDNA polymorphism observed today.

Python

script.py150 lines

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

# Coalescent model of an endosymbiont population going through an
# acquisition bottleneck
# ----------------------------------------------------------------
# Scenario: a free-living alpha-proteobacterial population with effective
# size N0 enters a proto-eukaryotic host cell. Inside the host, the
# endosymbiont population is drastically reduced (vertical transmission,
# cell division bottleneck) to N_b mitochondria per cell per generation.
# Over t_bott generations the effective population size drops to N_b;
# subsequently it recovers to a quasi-stable N1 = N_mito * N_cells.
#
# Expected coalescence time in a time-varying N(t) follows Hein, Schierup
# & Wiuf (2005). For a sample of n lineages, the mean time to the most
# recent common ancestor is
#     E[T_MRCA] = sum_{k=2..n}  integral_0^inf  P(k lineages survive) dt
# which we estimate by Monte-Carlo coalescent simulation.
#
# We compare pre-bottleneck (free-living alpha-proteobacterium) with
# post-bottleneck (endosymbiont lineages inside eukaryote host).

rng = np.random.default_rng(11)

def simulate_coalescent(n_samples, N_t):
    """
    Continuous-time coalescent with variable N(t).
    N_t: array of effective sizes at each time step.
    Returns T_MRCA in generations.
    """
    dt = 1.0
    k = n_samples
    t = 0
    while k > 1 and t < len(N_t) - 1:
        rate = k * (k - 1) / (2.0 * N_t[int(t)])
        # Exponential waiting time with rate
        u = rng.random()
        dtau = -np.log(u) / rate
        t += dtau
        if int(t) < len(N_t):
            k -= 1
    return t

# Demographic history (generations): pre-symbiosis N=1e9, bottleneck to 10,
# then recovery
T_total = 3000
N_t = np.zeros(T_total)
N_t[0:1000]    = 1e9    # free-living alpha-proteobacterium
N_t[1000:1200] = 10.0   # severe bottleneck: few cells captured
N_t[1200:1800] = np.linspace(10, 1e6, 600)  # recovery as host proliferates
N_t[1800:]     = 1e6    # stable endosymbiont meta-population

# Monte-Carlo coalescent over many replicate samples of 10 lineages
n_rep = 400
n_samples = 10
tmrca_pre  = np.array([simulate_coalescent(n_samples, np.full(T_total, 1e9))
                       for _ in range(n_rep)])
tmrca_post = np.array([simulate_coalescent(n_samples, N_t) for _ in range(n_rep)])

print(f"Mean T_MRCA free-living (pre-symbiosis): {tmrca_pre.mean():.1f} generations")
print(f"Mean T_MRCA post-bottleneck endosymbiont: {tmrca_post.mean():.1f} generations")
print(f"Compression factor (pre/post): {tmrca_pre.mean()/max(tmrca_post.mean(),1):.2f}x")

# Genetic diversity pi ~ 2 * mu * N_e
mu = 1e-8  # per-bp mutation rate per generation
pi_pre  = 2 * mu * 1e9
pi_post = 2 * mu * 1e6
print(f"Expected pi free-living   : {pi_pre:.3e}")
print(f"Expected pi endosymbiont  : {pi_post:.3e}   (1000x loss of diversity)")

# Integrated deficit in expected diversity
deficit = np.trapezoid(np.log10(N_t[0])*np.ones_like(N_t) - np.log10(N_t+1),
                       np.arange(T_total))
print(f"Integrated log10-N deficit: {deficit:.1f} log-gen units")

ax1, ax2, ax3, ax4 = axes.ravel()

# --- Panel 1: demographic history N(t) ---
ax1.semilogy(np.arange(T_total), N_t, '-', color='#fb923c', linewidth=2.4)
ax1.axvspan(1000, 1200, color='#ef4444', alpha=0.2, label='acquisition bottleneck')
ax1.axvspan(1200, 1800, color='#fbbf24', alpha=0.15, label='recovery')
ax1.set_xlabel('generations', color='#cbd5e1')
ax1.set_ylabel('effective population size N_e', color='#cbd5e1')
ax1.set_title('Endosymbiont demographic history',
              color='#fed7aa', fontweight='bold')
ax1.legend(facecolor='#0f172a', edgecolor='#334155', labelcolor='#cbd5e1', fontsize=10)

# --- Panel 2: T_MRCA histograms, pre vs post ---
ax2.hist(tmrca_pre, bins=30, alpha=0.6, color='#fbbf24',
         edgecolor='#78350f', label='pre-symbiosis')
ax2.hist(tmrca_post, bins=30, alpha=0.6, color='#fb923c',
         edgecolor='#78350f', label='post-bottleneck')
ax2.set_xlabel('T_MRCA (generations)', color='#cbd5e1')
ax2.set_ylabel('replicates', color='#cbd5e1')
ax2.set_title('Coalescent TMRCA distribution',
              color='#fed7aa', fontweight='bold')
ax2.legend(facecolor='#0f172a', edgecolor='#334155', labelcolor='#cbd5e1', fontsize=10)

# --- Panel 3: diversity pi vs time ---
pi_hist = 2 * mu * N_t
ax3.semilogy(np.arange(T_total), pi_hist, '-', color='#fbbf24',
             linewidth=2.4, label=r'$\pi(t) = 2\,\mu\,N_e(t)$')
ax3.axhline(pi_pre,  linestyle='--', color='#fb923c', label='free-living pi')
ax3.axhline(pi_post, linestyle='--', color='#ef4444', label='endosymbiont pi')
ax3.set_xlabel('generations', color='#cbd5e1')
ax3.set_ylabel('nucleotide diversity pi', color='#cbd5e1')
ax3.set_title('Loss of diversity through the bottleneck',
              color='#fed7aa', fontweight='bold')
ax3.legend(facecolor='#0f172a', edgecolor='#334155', labelcolor='#cbd5e1', fontsize=10)

# --- Panel 4: expected number of lineages over time (decay) ---
# E[k(t)] starting at n=10: Kingman rate k(k-1)/(2N) per gen
k_traj = []
for _ in range(60):
    k = 10
    traj = []
    t = 0
    while k > 1 and t < 3000:
        rate = k*(k-1) / (2.0 * N_t[int(t)])
        dtau = -np.log(rng.random()) / rate
        t += dtau
        traj.append((t, k))
        k -= 1
    k_traj.append(traj)

for traj in k_traj[:20]:
    ts = [p[0] for p in traj]
    ks = [p[1] for p in traj]
    ax4.step(ts, ks, color='#fb923c', alpha=0.4)
ax4.axvline(1000, color='#ef4444', linestyle='--', label='bottleneck start')
ax4.set_xlabel('generations', color='#cbd5e1')
ax4.set_ylabel('surviving lineages k(t)', color='#cbd5e1')
ax4.set_title('Kingman coalescent trajectories (20 replicates)',
              color='#fed7aa', fontweight='bold')
ax4.legend(facecolor='#0f172a', edgecolor='#334155', labelcolor='#cbd5e1', fontsize=10)

plt.tight_layout()
plt.savefig('output.png', dpi=120, bbox_inches='tight', facecolor='#0a0a1a')

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Key References

• Sagan (Margulis), L. (1967). “On the origin of mitosing cells.” J. Theor. Biol., 14, 225–274.

• Margulis, L. (1981). Symbiosis in Cell Evolution. W. H. Freeman.

• Gray, M.W., Burger, G., & Lang, B.F. (1999). “Mitochondrial evolution.” Science, 283, 1476–1481.

• Andersson, S.G.E. et al. (1998). “The genome sequence of Rickettsia prowazekii and the origin of mitochondria.” Nature, 396, 133–140.

• Lane, N. & Martin, W. (2010). “The energetics of genome complexity.” Nature, 467, 929–934.

• Lane, N. (2015). The Vital Question: Energy, Evolution, and the Origins of Complex Life. W. W. Norton.

• Spang, A. et al. (2015). “Complex archaea that bridge the gap between prokaryotes and eukaryotes.” Nature, 521, 173–179.

• Zaremba-Niedzwiedzka, K. et al. (2017). “Asgard archaea illuminate the origin of eukaryotic cellular complexity.” Nature, 541, 353–358.

• Roger, A.J., Muñoz-Gómez, S.A., & Kamikawa, R. (2017). “The origin and diversification of mitochondria.” Curr. Biol., 27, R1177–R1192.

• Imachi, H. et al. (2020). “Isolation of an archaeon at the prokaryote-eukaryote interface.” Nature, 577, 519–525.

• Timmis, J.N., Ayliffe, M.A., Huang, C.Y., & Martin, W. (2004). “Endosymbiotic gene transfer: organelle genomes forge eukaryotic chromosomes.” Nat. Rev. Genet., 5, 123–135.

• Martin, W., Tielens, A.G.M., & Müller, M. (2020). Biology of the Prokaryotes, Wiley.

• Nasir, A., Forterre, P., & Caetano-Anollés, G. (2020). “A phylogenomic data-driven exploration of viral origins and evolution.” Sci. Adv., 6, eaau9408.

• Allen, J.F. (2015). “Why chloroplasts and mitochondria retain their own genomes and genetic systems.” PNAS, 112, 10231–10238.

• Dowling, D.K., Friberg, U., & Lindell, J. (2008). “Evolutionary implications of non-neutral mitochondrial genetic variation.” Trends Ecol. Evol., 23, 546–554.

• Rabl, R. et al. (2009). “Formation of cristae and crista junctions in mitochondria depends on antagonism between Fcj1 and Su e/g.” J. Cell Biol., 185, 1047–1063.

• Tovar, J. et al. (2003). “Mitochondrial remnant organelles of Giardia function in iron-sulphur protein maturation.” Nature, 426, 172–176.

• Scorrano, L. et al. (2002). “A distinct pathway remodels mitochondrial cristae and mobilizes cytochrome c during apoptosis.” Dev. Cell, 2, 55–67.

• Martijn, J., Vosseberg, J., Guy, L., Offre, P., & Ettema, T.J.G. (2018). “Deep mitochondrial origin outside the sampled alphaproteobacteria.” Nature, 557, 101–105.

Share:X Reddit LinkedIn

← Course Overview Module 1: Ultrastructure & Membranes →