Part IV: Synthesis & Strategy | Chapter 16

Retrosynthetic Analysis

Working backwards from target molecule to available starting materials

1. Introduction: Corey's Retrosynthetic Analysis

Retrosynthetic analysis is a technique for planning organic syntheses by working backwards from the target molecule (TM) to progressively simpler precursors, ultimately arriving at commercially available starting materials. The approach was formalized by E. J. Corey at Harvard University in the 1960s, and earned him the 1990 Nobel Prize in Chemistry.

The fundamental idea is deceptively simple: instead of asking "What can I make from these starting materials?" (the synthetic direction), we ask "What precursors could give me my target molecule?" (the retrosynthetic direction). This reversal of logic transforms synthesis planning from a combinatorial nightmare into a structured, logical process.

The Retrosynthetic Arrow

The open-headed double arrow $\Longrightarrow$ (the "retrosynthetic arrow" or "transform arrow") indicates a retrosynthetic step. It means: "can be made from."

$\text{Target Molecule (TM)} \Longrightarrow \text{Precursor(s)}$

This arrow is never interchangeable with the forward reaction arrow ($\rightarrow$). The retrosynthetic arrow signifies a mental operation — atransform — not a chemical reaction.

A complete retrosynthetic analysis produces a retrosynthetic tree: a branching diagram in which the target sits at the root and the leaves are available starting materials. Each node in the tree is connected to its children by retrosynthetic arrows. The tree may be linear (a single chain of transforms) or convergent (branches that merge), with convergent strategies generally preferred because they maximize overall yield.

Linear vs Convergent Synthesis

For a linear synthesis of $n$ steps, each with yield $y$, the overall yield is:

$\text{Overall yield (linear)} = y^n$

For a convergent synthesis that joins two branches of $n/2$ steps each:

$\text{Overall yield (convergent)} = y^{(n/2)} \times y^{(n/2)} \times y = y^{n/2 + 1}$

For $n = 10$ steps at $y = 80\%$ each: linear gives $0.8^{10} = 10.7\%$, while convergent gives $0.8^6 = 26.2\%$ — nearly 2.5 times better.

2. Key Concepts: Disconnections and Synthons

The retrosynthetic approach relies on a precise vocabulary for describing transforms and the fragments they produce. Mastering these terms is essential before tackling complex synthetic problems.

Core Terminology

Transform ($\Longrightarrow$): A retrosynthetic operation that converts a target molecule into potential precursor structures. It is theexact reverse of a synthetic reaction, applied as a mental operation.

Disconnection: The imaginary cleavage of a bond in the target molecule to reveal potential precursor fragments. Disconnections focus on C–C bonds (for building the carbon skeleton) and C–X bonds (for introducing functional groups).

Synthon: An idealized ionic fragment resulting from a disconnection. Synthons represent the reactivity pattern needed, not actual reagents. For example, disconnecting acetone at the C–C bond gives a $ {}^{-}\text{CH}_2\text{COCH}_3$ carbanion synthon and an $\text{R}^+$ cation synthon.

Synthetic Equivalent: A real, commercially available reagent that provides the reactivity of a given synthon. For the carbanion synthon$ {}^{-}\text{CH}_2\text{COCH}_3$, the synthetic equivalent is the enolate of acetone (generated with LDA or NaH).

Donor and Acceptor Synthons

Every heterolytic disconnection produces two synthons with complementary polarity:

Nucleophilic (Donor, $d^n$)

$d^0$: C=C, C=O (as nucleophile)
$d^1$: $\text{R}^-$ (Grignard, organolithium)
$d^2$: Enolates, enamines
$d^3$: Homoenolate equivalents

Electrophilic (Acceptor, $a^n$)

$a^0$: Epoxides, aziridines
$a^1$: $\text{R}^+$ (alkyl halides, carbonyls)
$a^2$: $\alpha,\beta$-unsaturated carbonyls
$a^3$: $\alpha$-halo carbonyls

Functional Group Interconversion (FGI)

FGI is a retrosynthetic operation that replaces one functional group with another to enable a disconnection that was not previously possible. No C–C bonds are broken or formed; only the functional group changes. Common FGIs include:

Alcohol $\Longrightarrow$ Alkene (dehydration/hydration)
Alkene $\Longrightarrow$ Alkyne (partial reduction)
Ketone $\Longrightarrow$ Alcohol (oxidation/reduction)
Amine $\Longrightarrow$ Nitro compound (reduction)
Carboxylic acid $\Longrightarrow$ Ester (hydrolysis)
1,2-Diol $\Longrightarrow$ Alkene (dihydroxylation)

FGI is particularly useful when the target's functional groups do not suggest an obvious disconnection. By converting one group to another, new retrosynthetic pathways open up.

3. C–C Bond Forming Reactions

The formation of carbon–carbon bonds is the central challenge of organic synthesis. Each C–C bond in the target molecule represents a potential disconnection site. The following catalog presents the most important C–C bond forming reactions, showing both the retrosynthetic disconnection and the forward synthetic reaction.

3.1 Grignard Reaction

The Grignard reaction adds an organomagnesium halide ($\text{RMgX}$) to a carbonyl compound, forming a new C–C bond. It is one of the most versatile tools for building carbon skeletons.

Retrosynthetic disconnection:

$\text{R}_1\text{R}_2\text{CHOH} \Longrightarrow \text{R}_1\text{MgBr} + \text{R}_2\text{CHO}$

Forward reaction:

$\text{R}_1\text{MgBr} + \text{R}_2\text{CHO} \xrightarrow{1.\,\text{Et}_2\text{O}} \xrightarrow{2.\,\text{H}_3\text{O}^+} \text{R}_1\text{R}_2\text{CHOH}$

Scope: With formaldehyde $\rightarrow$ primary alcohol; with aldehyde$\rightarrow$ secondary alcohol; with ketone $\rightarrow$ tertiary alcohol; with CO$_2$ $\rightarrow$ carboxylic acid; with ester$\rightarrow$ tertiary alcohol (2 equiv).

3.2 Wittig Reaction

The Wittig reaction converts a carbonyl compound to an alkene using a phosphorus ylide. It is the method of choice for placing a C=C double bond at a specific position.

Retrosynthetic disconnection:

$\text{R}_1\text{CH}=\text{CHR}_2 \Longrightarrow \text{R}_1\text{CHO} + \text{Ph}_3\text{P}=\text{CHR}_2$

Forward reaction:

$\text{R}_1\text{CHO} + \text{Ph}_3\text{P}=\text{CHR}_2 \rightarrow \text{R}_1\text{CH}=\text{CHR}_2 + \text{Ph}_3\text{P}=\text{O}$

Stereochemistry: Unstabilized ylides give predominantly Z-alkenes; stabilized ylides (with EWG) give E-alkenes. The Horner–Wadsworth–Emmons (HWE) modification using phosphonate esters gives E-selectivity with easier purification.

3.3 Aldol Condensation

The aldol reaction forms a $\beta$-hydroxy carbonyl by coupling an enolizable carbonyl with an aldehyde or ketone electrophile. It is the premier method for constructing 1,3-difunctionalized systems.

Retrosynthetic disconnection:

$\text{R}_1\text{COCH}_2\text{CH(OH)R}_2 \Longrightarrow \text{R}_1\text{COCH}_3 + \text{R}_2\text{CHO}$

Forward reaction:

$\text{R}_1\text{COCH}_3 \xrightarrow{\text{LDA, } -78°\text{C}} \text{enolate} \xrightarrow{\text{R}_2\text{CHO}} \beta\text{-hydroxy ketone}$

Key feature: The directed aldol (using LDA to preform a single enolate) avoids the mixed-product problem of classical aldol conditions. The Evans, Zimmerman–Traxler, and Mukaiyama variants provide stereochemical control.

3.4 Claisen Condensation

The Claisen condensation couples two esters (or an ester with another carbonyl) to produce a$\beta$-keto ester. It is the ester analog of the aldol reaction.

Retrosynthetic disconnection:

$\text{R}_1\text{COCH}_2\text{COOR}_2 \Longrightarrow \text{R}_1\text{COOR} + \text{CH}_3\text{COOR}_2$

Forward reaction:

$2\,\text{RCH}_2\text{COOR}' \xrightarrow{\text{NaOR}'} \text{RCH(COOR')COCHR}_2 + \text{R'OH}$

3.5 Diels–Alder Reaction

The Diels–Alder [4+2] cycloaddition constructs a six-membered ring with up to four new stereocenters in a single step. It is perhaps the most powerful ring-forming reaction in organic chemistry.

Retrosynthetic disconnection:

$\text{cyclohexene} \Longrightarrow \text{1,3-diene} + \text{dienophile}$

Forward reaction:

$\text{diene} + \text{dienophile} \xrightarrow{\Delta \text{ or Lewis acid}} \text{cyclohexene}$

Rules: The reaction is syn with respect to both diene and dienophile (suprafacial on both components). Endo products are kinetically favored (Alder rule). Electron-rich dienes + electron-poor dienophiles (normal demand) or vice versa (inverse demand).

3.6 Transition-Metal Catalyzed Cross-Coupling

Palladium-catalyzed cross-coupling reactions have revolutionized C–C bond formation, particularly for $\text{sp}^2$–$\text{sp}^2$ and $\text{sp}^2$–$\text{sp}$ couplings. The 2010 Nobel Prize was awarded to Heck, Negishi, and Suzuki for this work.

Suzuki Coupling

$\text{Ar--X} + \text{Ar'--B(OH)}_2 \xrightarrow{\text{Pd(0), base}} \text{Ar--Ar'}$

Retrosynthetically: $\text{Ar--Ar'} \Longrightarrow \text{ArX} + \text{Ar'B(OH)}_2$

Heck Reaction

$\text{Ar--X} + \text{CH}_2\text{=CHR} \xrightarrow{\text{Pd(0), base}} \text{Ar--CH=CHR}$

Retrosynthetically: $\text{ArCH=CHR} \Longrightarrow \text{ArX} + \text{CH}_2\text{=CHR}$

Sonogashira Coupling

$\text{Ar--X} + \text{HC} \equiv \text{CR} \xrightarrow{\text{Pd(0), CuI, amine}} \text{Ar--C} \equiv \text{CR}$

Retrosynthetically: $\text{Ar--C} \equiv \text{CR} \Longrightarrow \text{ArX} + \text{HC} \equiv \text{CR}$

General catalytic cycle: Oxidative addition of Ar–X to Pd(0) $\rightarrow$ transmetalation with organometallic partner $\rightarrow$ reductive elimination to form the new C–C bond and regenerate Pd(0).

4. Strategy: Two-Group Disconnections

When two functional groups (FGs) are present in the target molecule, their relative position determines which C–C bond forming reaction is most appropriate. This is the 1,n-difunctionalized strategy, and it is the most systematic approach to retrosynthetic planning. The key insight is that the spacing between the functional groups dictates the polarity pattern, which in turn specifies the reaction type.

4.1 1,2-Difunctionalized Compounds

Two functional groups on adjacent carbons. The polarity pattern requires$d^1 + a^1$ or an $a^0$ synthon (epoxide).

Typical target: 1,2-diol, $\beta$-amino alcohol, $\alpha$-hydroxy ketone

Key disconnection:

$\text{R}_1\text{CH(OH)--CH(OH)R}_2 \Longrightarrow \text{R}_1\text{CH(OH)CHO} + \text{R}_2^-$

Synthetic equivalents: Epoxide opening by nucleophiles (Grignard, organocuprate, amine, azide). The $\text{S}_\text{N}2$ opening of an epoxide with a Grignard reagent gives a 1,2-difunctionalized product with anti stereochemistry.

$\text{R}_1\text{MgBr} + \overset{O}{\overbrace{\text{R}_2\text{CH---CH}_2}} \xrightarrow{\text{CuI cat.}} \text{R}_1\text{CH}_2\text{CH(OH)R}_2$

4.2 1,3-Difunctionalized Compounds

Two functional groups separated by one carbon. The polarity pattern is$d^2 + a^1$, which is the natural reactivity of carbonyl compounds. This is the most common pattern in retrosynthesis.

Typical target: $\beta$-hydroxy carbonyl, $\beta$-keto ester, 1,3-diol

Key disconnection (aldol):

$\text{RCOCH}_2\text{CH(OH)R'} \Longrightarrow \text{RCOCH}_3 \;(d^2) + \text{R'CHO} \;(a^1)$

Reactions: Aldol reaction, Claisen condensation, Knoevenagel condensation, malonic ester synthesis. The 1,3-relationship is sometimes called the "aldol pattern" because the aldol reaction naturally produces it.

4.3 1,4-Difunctionalized Compounds

Two functional groups separated by two carbons. The polarity pattern requires$d^2 + a^2$ or $d^1 + a^3$. This is a "dissonant" pattern when both groups have the same polarity bias.

Typical target: 1,4-dicarbonyl, $\gamma$-butyrolactone

Key disconnection (conjugate addition):

$\text{RCOCH}_2\text{CH}_2\text{COR'} \Longrightarrow \text{RCOCH}_3 \;(d^2) + \text{R'COCH=CH}_2 \;(a^2)$

Reactions: Michael addition (conjugate addition of enolate to $\alpha,\beta$-unsaturated carbonyl), Stetter reaction (umpolung, using $d^1$ acyl anion equivalent + $a^2$). The 1,4-dicarbonyl pattern is the hallmark of conjugate addition.

4.4 1,5-Difunctionalized Compounds

Two functional groups separated by three carbons. The polarity pattern is$d^2 + a^3$, which matches the Michael/aldol combination.

Typical target: 1,5-dicarbonyl, cyclohexenone (Robinson annulation product)

Key disconnection (Robinson annulation):

$\text{cyclohexenone} \Longrightarrow \text{ketone} \;(d^2) + \text{methyl vinyl ketone} \;(a^2 + a^1)$

Reactions: The Robinson annulation is a sequential Michael addition + intramolecular aldol condensation that builds a fused six-membered ring. It combines a 1,4-addition (Michael) with a 1,3-relationship (aldol) to create the 1,5-pattern. This was famously used in steroid synthesis.

Summary Table: Two-Group Disconnection Logic

Pattern	Synthon Pairing	Key Reaction	Consonance
1,2-difunctional	$d^1 + a^0$ or $d^1 + a^1$	Epoxide opening	Dissonant
1,3-difunctional	$d^2 + a^1$	Aldol / Claisen	Consonant
1,4-difunctional	$d^2 + a^2$	Michael addition	Dissonant
1,5-difunctional	$d^2 + a^3$	Michael + aldol	Consonant

5. Worked Examples

Example A: 2-Phenyl-2-butanol via Grignard

Target: 2-phenyl-2-butanol, $\text{C}_6\text{H}_5\text{C(CH}_3\text{)(OH)CH}_2\text{CH}_3$

Analysis: The target is a tertiary alcohol. Any of the three C–C bonds to the carbinol carbon can be disconnected, giving three possible Grignard disconnections:

Disconnection 1: break C–Ph bond

$\text{PhC(CH}_3\text{)(OH)Et} \Longrightarrow \text{PhMgBr} + \text{CH}_3\text{COEt (methyl ethyl ketone)}$

Disconnection 2: break C–Me bond

$\text{PhC(CH}_3\text{)(OH)Et} \Longrightarrow \text{CH}_3\text{MgBr} + \text{PhCOEt (propiophenone)}$

Disconnection 3: break C–Et bond

$\text{PhC(CH}_3\text{)(OH)Et} \Longrightarrow \text{EtMgBr} + \text{PhCOCH}_3 \text{ (acetophenone)}$

Best choice: Disconnection 3, because acetophenone and ethylmagnesium bromide are both cheap and commercially available. This exemplifies the principle: choose the disconnection that leads to the simplest, most readily available starting materials.

Example B: 4-Methylcyclohex-2-enone via Diels–Alder

Target: 4-methylcyclohex-2-en-1-one

Step 1: Identify the Diels–Alder retron

A six-membered ring containing a double bond is the hallmark of a Diels–Alder product. We disconnect across the double bond and the bond para to it.

Step 2: FGI if necessary

The enone must first be converted retrosynthetically to a cyclohexene bearing the correct substituents. FGI: ketone $\Longrightarrow$ alcohol $\Longrightarrow$ Diels–Alder adduct.

Step 3: Disconnection

$\text{cyclohexene adduct} \Longrightarrow \text{(E)-penta-1,3-diene} + \text{acrolein (CH}_2\text{=CHCHO)}$

Forward synthesis: (E)-Penta-1,3-diene + acrolein $\xrightarrow{\Delta}$ Diels–Alder adduct $\xrightarrow{\text{PCC}}$ cyclohexanone $\xrightarrow{\text{selenylation, oxidation, elimination}}$ enone target. The endo rule predicts the correct stereochemistry.

Example C: 4-Bromonitrobenzene via EAS Sequence

Target: para-bromonitrobenzene

Retrosynthetic analysis:

The key question: which substituent is introduced first? The order of EAS steps matters because existing substituents direct incoming groups.

$\text{4-BrC}_6\text{H}_4\text{NO}_2 \Longrightarrow \text{bromobenzene} + \text{HNO}_3/\text{H}_2\text{SO}_4$

$\text{4-BrC}_6\text{H}_4\text{NO}_2 \Longrightarrow \text{nitrobenzene} + \text{Br}_2/\text{FeBr}_3$

Analysis: Bromine is an ortho/para director (lone pairs donate to the ring despite its electronegativity). Nitro is a meta director (strong EWG). To get the pararelationship, we must introduce Br first, then nitrate the bromobenzene:

$\text{C}_6\text{H}_6 \xrightarrow{\text{Br}_2,\,\text{FeBr}_3} \text{C}_6\text{H}_5\text{Br} \xrightarrow{\text{HNO}_3,\,\text{H}_2\text{SO}_4} \text{4-BrC}_6\text{H}_4\text{NO}_2$

If we nitrated first, subsequent bromination of nitrobenzene would give the meta-bromo product, not the desired para isomer. This illustrates the critical role of directing effects in aromatic retrosynthesis.

Example D: A Polyketide Fragment via Convergent Strategy

Target: A $\beta$-hydroxy-$\delta$-keto ester fragment, common in polyketide natural products like erythromycin:

$\text{R}_1\text{CH(OH)CH}_2\text{COCH}_2\text{COOR}_2$

Step 1: Identify functional group relationships

The OH and ketone are in a 1,3-relationship ($\beta$-hydroxy ketone = aldol pattern). The ketone and ester are also in a 1,3-relationship (= Claisen pattern). This suggests a two-stage disconnection.

Disconnection 1: Aldol (1,3-OH/C=O)

$\text{target} \Longrightarrow \text{R}_1\text{CHO} + \text{CH}_3\text{COCH}_2\text{COOR}_2$

Disconnection 2: Claisen (1,3-C=O/C=O)

$\text{CH}_3\text{COCH}_2\text{COOR}_2 \Longrightarrow \text{CH}_3\text{COOR} + \text{CH}_3\text{COOR}_2$

Forward synthesis: (1) Claisen condensation of two esters to form the $\beta$-keto ester. (2) Directed aldol of the $\beta$-keto ester enolate with aldehyde $\text{R}_1\text{CHO}$ to install the $\beta$-hydroxy group with Evans auxiliary for stereochemical control. This convergent strategy assembles a complex fragment in just two C–C bond-forming steps.

6. Protecting Groups

In complex molecules with multiple functional groups, a reagent may react with the wrong group. A protecting group is temporarily installed to mask a reactive functional group, allowing selective transformation elsewhere. The ideal protecting group is introduced in high yield, stable to subsequent reaction conditions, and removed cleanly at the end.

When to Protect

A nucleophilic group (OH, NH$_2$) would interfere with an electrophilic reaction
An acidic proton (OH, NH) would quench a strong base or organometallic reagent
A reducible group (C=O) would be reduced by LiAlH$_4$ or NaBH$_4$
An oxidizable group (OH, aldehyde) would be destroyed by an oxidant

Common Protecting Groups by Functional Group

Alcohol Protection (–OH)

Protecting Group	Install	Remove	Stable to
TBS (tert-butyldimethylsilyl)	TBSCl, imidazole	TBAF or HF	Base, mild acid, Grignard
THP (tetrahydropyranyl)	DHP, PPTS	Mild acid (PPTS, MeOH)	Base, Grignard, LiAlH$_4$
Benzyl (Bn)	BnBr, NaH	H$_2$/Pd-C	Acid, base, oxidation
Acetyl (Ac)	Ac$_2$O, pyridine	K$_2$CO$_3$/MeOH	Mild conditions

Amine Protection (–NH$_2$)

Protecting Group	Install	Remove
Boc (tert-butoxycarbonyl)	Boc$_2$O, base	TFA or HCl/dioxane
Cbz (benzyloxycarbonyl)	CbzCl, base	H$_2$/Pd-C
Fmoc (fluorenylmethyloxycarbonyl)	FmocCl, base	Piperidine (base)

Carbonyl Protection (C=O)

Acetals and ketals: Treatment with ethylene glycol and acid catalyst converts a ketone or aldehyde to a 1,3-dioxolane. Removal uses aqueous acid. Acetals are stable to base, nucleophiles, reducing agents (LiAlH$_4$), and Grignard reagents — making them one of the most useful protecting groups in synthesis.

$\text{RCOR'} + \text{HOCH}_2\text{CH}_2\text{OH} \rightleftharpoons[\text{H}^+]{} \text{cyclic acetal} + \text{H}_2\text{O}$

Carboxylic Acid Protection (–COOH)

Methyl/ethyl esters: Formed by Fischer esterification or diazomethane. Removed by saponification (NaOH/H$_2$O). tert-Butyl esters: Formed with isobutylene/acid. Removed with TFA (same conditions as Boc).

Orthogonal Protection Strategy

When a molecule contains multiple identical functional groups (e.g., two OH groups), we need protecting groups that can be removed independently under different conditions. This is called orthogonal protection.

Classic example: Protecting two hydroxyl groups with TBS and Bn:

TBS is removed by fluoride (TBAF) — Bn survives
Bn is removed by hydrogenolysis (H$_2$/Pd) — TBS survives

This allows selective unmasking of either OH group at any point in the synthesis.

In peptide synthesis, the Boc/Bn and Fmoc/tBu strategies exploit orthogonality: Boc is acid-labile while Bn is hydrogenolysis-labile; Fmoc is base-labile while tBu is acid-labile. These complementary pairs allow iterative deprotection/coupling cycles.

7. Applications

Total Synthesis of Natural Products

Retrosynthetic analysis is indispensable in planning the total synthesis of complex natural products. Landmark examples include:

Strychnine (Woodward, 1954): A 28-step linear synthesis of this notoriously complex alkaloid, later improved to 12 steps by Vanderwal (2011) using retrosynthetic strategy.
Vitamin B$_{12}$ (Woodward & Eschenmoser, 1973): A 100+ researcher collaboration requiring nearly 100 steps, demonstrating both the power and limitations of linear synthesis.
Taxol (paclitaxel) (Holton, 1994; Nicolaou, 1994; Danheiser, 1996): Multiple groups achieved total syntheses of this anticancer agent, each using different retrosynthetic strategies highlighting the non-uniqueness of retrosynthetic solutions.
Palytoxin (Kishi, 1994): One of the most complex molecules ever synthesized, with 64 stereocenters and 115 steps, the convergent strategy was essential.

Pharmaceutical Process Chemistry

In the pharmaceutical industry, retrosynthetic analysis is used to design practical, scalable routes to drug candidates. Process chemists prioritize:

Atom economy: Maximizing incorporation of reactant atoms into the product (Trost, 1991). Defined as $\text{AE} = \frac{M_{\text{product}}}{\sum M_{\text{reactants}}} \times 100\%$
Step economy: Minimizing the total number of synthetic steps (Wender, 2006)
Scalability: Avoiding reactions that are hazardous, expensive, or difficult to scale (cryogenic conditions, toxic metals, chromatographic purification)
Stereoselectivity: Using catalytic asymmetric methods rather than resolution to set stereocenters

Green Chemistry and Sustainability

Modern retrosynthetic planning increasingly incorporates the 12 Principles of Green Chemistry(Anastas & Warner, 1998). Key considerations:

Catalytic reactions over stoichiometric reagents (e.g., Pd-catalyzed coupling vs. stoichiometric organocuprate)
Biocatalysis: Enzyme-catalyzed transformations (lipases, transaminases, ketoreductases) offer exquisite selectivity under mild conditions
Cascade/tandem reactions: Performing multiple bond-forming events in one pot reduces waste, solvent use, and purification steps
Renewable feedstocks: Designing syntheses from biomass-derived starting materials rather than petrochemicals
Electrochemistry and photochemistry: Electron or photon as the "reagent," minimizing chemical waste

8. Historical Context

E. J. Corey and the Logic of Chemical Synthesis

Elias James Corey (b. 1928) received the 1990 Nobel Prize in Chemistry "for his development of the theory and methodology of organic synthesis." His key contributions include:

Formalization of retrosynthetic analysis (1960s): Corey introduced the systematic vocabulary of transforms, disconnections, synthons, and retrons that we use today. His 1967 paper and 1989 book The Logic of Chemical Synthesis are foundational texts.
LHASA (Logic and Heuristics Applied to Synthetic Analysis): One of the first computer programs for retrosynthetic planning, developed at Harvard starting in 1969. It anticipated modern AI-driven synthesis planning by decades.
Reagent development: Corey developed numerous reagents and reactions, including the Corey–Bakshi–Shibata (CBS) reduction, Corey–Chaykovsky reaction, Corey–Winter olefin synthesis, and Corey–Fuchs reaction.

R. B. Woodward: The Art Before the Science

Robert Burns Woodward (1917–1979) won the 1965 Nobel Prize for his achievements in the art of organic synthesis. Before Corey's formalization, Woodward relied on deep chemical intuition and pattern recognition. His total syntheses include:

Quinine (1944), cholesterol (1951), cortisone (1951), strychnine (1954)
Reserpine (1956), chlorophyll (1960), vitamin B$_{12}$ (1973, with Eschenmoser)

Woodward's work demonstrated that any molecule, no matter how complex, could in principle be synthesized. His approach was more artistic than algorithmic, relying on visual pattern matching and an encyclopedic knowledge of reactions.

Convergent vs Linear Synthesis: Historical Evolution

Early total syntheses (1940s–1960s) were predominantly linear: each intermediate was prepared sequentially. This approach is conceptually simple but suffers from exponential yield loss. The shift toward convergent strategies — in which fragments are built independently and joined late in the synthesis — was driven by:

Recognition that convergent routes dramatically improve overall yield
The availability of powerful C–C bond forming reactions (cross-coupling, olefin metathesis)
Computer-aided retrosynthetic analysis that could explore convergent disconnections systematically
Economic pressures in pharmaceutical manufacturing that demand efficiency

Modern total syntheses, like those from the groups of Baran, MacMillan, and Shenvi, emphasize step economy and ideality (minimizing non-strategic steps like protecting group manipulations and oxidation state changes).

9. Interactive Retrosynthetic Tree Builder

The following Python simulation builds a retrosynthetic analysis tree for a target molecule described by its functional groups. Given a set of functional groups and their positions, the program identifies possible disconnections, ranks them by efficiency criteria (convergence, availability of starting materials, number of steps), and displays the full retrosynthetic tree with scoring.

The algorithm encodes the two-group disconnection logic from Section 4 and the C–C bond forming reactions from Section 3. It uses numpy for numerical scoring and ranking.

Retrosynthetic Analysis: Disconnection Scoring & Yield Comparison

Python

script.py434 lines

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

print("=" * 65)
print("  RETROSYNTHETIC ANALYSIS TREE BUILDER")
print("  Automated Disconnection & Ranking Engine")
print("=" * 65)

# ===================================================================
# Define functional groups and their properties
# ===================================================================
FG_TYPES = {
    'OH':    {'name': 'Hydroxyl',          'polarity': 'donor',    'pKa': 16.0},
    'C=O':   {'name': 'Carbonyl (ketone)', 'polarity': 'acceptor', 'pKa': 20.0},
    'CHO':   {'name': 'Aldehyde',          'polarity': 'acceptor', 'pKa': 17.0},
    'COOH':  {'name': 'Carboxylic acid',   'polarity': 'acceptor', 'pKa': 4.75},
    'COOR':  {'name': 'Ester',             'polarity': 'acceptor', 'pKa': 25.0},
    'NH2':   {'name': 'Primary amine',     'polarity': 'donor',    'pKa': 38.0},
    'C=C':   {'name': 'Alkene',            'polarity': 'neutral',  'pKa': 44.0},
    'ArH':   {'name': 'Aromatic ring',     'polarity': 'neutral',  'pKa': 43.0},
    'X':     {'name': 'Halide (Br/Cl)',    'polarity': 'acceptor', 'pKa': 50.0},
    'C#C':   {'name': 'Alkyne',            'polarity': 'donor',    'pKa': 25.0},
}

# ===================================================================
# Disconnection database: keyed by (fg1, fg2, distance)
# Each entry: (reaction_name, synthon_a, synthon_b, score_base, reliability)
# score_base: 1-10, higher = better disconnection
# reliability: probability of success (0-1)
# ===================================================================
DISCONNECTIONS = {
    # 1,2-difunctionalized
    ('OH', 'OH', 2):   ('Epoxide opening (dihydroxylation)',
                         'Epoxide', 'H2O/OsO4', 7, 0.85),
    ('OH', 'NH2', 2):  ('Epoxide opening by amine',
                         'Epoxide', 'R-NH2', 8, 0.80),
    ('OH', 'C=O', 2):  ('Alpha-hydroxy ketone: acyloin',
                         'Two esters (acyloin cond.)', 'Na/xylene', 5, 0.60),

# 1,3-difunctionalized (consonant - best disconnections)
    ('OH', 'C=O', 3):  ('Aldol reaction',
                         'Enolate (d2)', 'Aldehyde (a1)', 9, 0.90),
    ('C=O', 'C=O', 3): ('Claisen condensation',
                         'Ester enolate (d2)', 'Ester (a1)', 8, 0.85),
    ('C=O', 'COOR', 3):('Claisen condensation',
                         'Ester enolate (d2)', 'Ester (a1)', 8, 0.85),
    ('OH', 'COOR', 3): ('Reformatsky reaction',
                         'Zn enolate (d2)', 'Aldehyde (a1)', 7, 0.75),
    ('OH', 'OH', 3):   ('Aldol + reduction',
                         'Enolate (d2)', 'Aldehyde (a1)', 7, 0.80),

# 1,4-difunctionalized (dissonant)
    ('C=O', 'C=O', 4): ('Michael addition (conjugate)',
                         'Enolate (d2)', 'Enone (a2)', 8, 0.85),
    ('C=O', 'COOR', 4):('Michael addition',
                         'Enolate (d2)', 'Acrylate (a2)', 7, 0.80),
    ('OH', 'C=O', 4):  ('Conjugate addition + reduction',
                         'Cuprate (d1)', 'Enone (a2)', 6, 0.70),

# 1,5-difunctionalized (consonant)
    ('C=O', 'C=O', 5): ('Robinson annulation (Michael+aldol)',
                         'Ketone enolate (d2)', 'MVK (a2+a1)', 9, 0.80),
    ('OH', 'C=O', 5):  ('Michael + aldol cascade',
                         'Enolate (d2)', 'Enone (a2)', 7, 0.70),

# Single-group disconnections (Grignard, Wittig, etc.)
    ('OH', 'X', 0):    ('Grignard reaction',
                         'RMgX (d1)', 'R\'CHO (a1)', 9, 0.90),
    ('C=C', 'X', 0):   ('Wittig reaction',
                         'Ylide (d1)', 'Aldehyde (a1)', 8, 0.85),
    ('ArH', 'ArH', 0): ('Suzuki coupling',
                         'ArB(OH)2', 'ArX', 9, 0.90),
    ('ArH', 'C=C', 0): ('Heck reaction',
                         'ArX', 'Alkene', 8, 0.85),
    ('ArH', 'C#C', 0): ('Sonogashira coupling',
                         'ArX', 'Terminal alkyne', 8, 0.85),
}

# ===================================================================
# FGI (Functional Group Interconversion) database
# ===================================================================
FGI_TRANSFORMS = {
    'OH':   [('C=O', 'Oxidation (PCC, Swern, DMP)', 0.95),
             ('C=C', 'Dehydration (H2SO4, POCl3)', 0.80)],
    'C=O':  [('OH', 'Reduction (NaBH4, LiAlH4)', 0.95),
             ('C=C', 'Wittig / HWE', 0.85)],
    'NH2':  [('NO2', 'Reduction (H2/Pd, Fe/HCl)', 0.90)],
    'COOH': [('COOR', 'Esterification (Fischer)', 0.95),
             ('CHO', 'Reduction (DIBAL-H)', 0.80)],
    'C=C':  [('OH', 'Hydroboration-oxidation', 0.90),
             ('C=O', 'Wacker oxidation', 0.75)],
}

# ===================================================================
# Define target molecules for analysis
# ===================================================================
TARGETS = [
    {
        'name': '4-Hydroxy-2-pentanone (beta-hydroxy ketone)',
        'groups': [('OH', 4), ('C=O', 2)],
        'skeleton_length': 5,
        'description': 'A 1,3-difunctionalized target: classic aldol product'
    },
    {
        'name': '2,5-Hexanedione (1,4-diketone)',
        'groups': [('C=O', 2), ('C=O', 5)],
        'skeleton_length': 6,
        'description': 'A 1,4-difunctionalized target: Michael addition product'
    },
    {
        'name': 'Cyclohex-2-enone (Diels-Alder + oxidation)',
        'groups': [('C=O', 1), ('C=C', 2)],
        'skeleton_length': 6,
        'description': 'A six-membered ring with enone: Diels-Alder retron'
    },
    {
        'name': '4-Biphenylacetic acid fragment',
        'groups': [('ArH', 1), ('ArH', 2), ('COOH', 3)],
        'skeleton_length': 3,
        'description': 'Biaryl + acid: Suzuki coupling + malonate alkylation'
    },
    {
        'name': 'Polyketide fragment (1,3,5-triol)',
        'groups': [('OH', 1), ('OH', 3), ('OH', 5), ('C=O', 7)],
        'skeleton_length': 8,
        'description': 'Complex fragment: iterative aldol disconnections'
    },
]

def find_disconnections(target):
    """Find all applicable disconnections for a target."""
    groups = target['groups']
    results = []

# Check pairwise functional group relationships
    for i in range(len(groups)):
        for j in range(i + 1, len(groups)):
            fg1, pos1 = groups[i]
            fg2, pos2 = groups[j]
            dist = abs(pos2 - pos1)

# Check direct match
            key = (fg1, fg2, dist)
            key_rev = (fg2, fg1, dist)

if key in DISCONNECTIONS:
                rxn_name, synth_a, synth_b, score, rel = DISCONNECTIONS[key]
                results.append({
                    'reaction': rxn_name,
                    'synthon_a': synth_a,
                    'synthon_b': synth_b,
                    'base_score': score,
                    'reliability': rel,
                    'distance': dist,
                    'fg_pair': (fg1, fg2),
                })
            elif key_rev in DISCONNECTIONS:
                rxn_name, synth_a, synth_b, score, rel = DISCONNECTIONS[key_rev]
                results.append({
                    'reaction': rxn_name,
                    'synthon_a': synth_a,
                    'synthon_b': synth_b,
                    'base_score': score,
                    'reliability': rel,
                    'distance': dist,
                    'fg_pair': (fg2, fg1),
                })

# Check single-group disconnections (distance-independent)
    fg_set = set(fg for fg, _ in groups)
    for key, val in DISCONNECTIONS.items():
        if key[2] == 0:  # distance-independent
            fg_a, fg_b, _ = key
            if fg_a in fg_set and fg_b in fg_set:
                rxn_name, synth_a, synth_b, score, rel = val
                results.append({
                    'reaction': rxn_name,
                    'synthon_a': synth_a,
                    'synthon_b': synth_b,
                    'base_score': score,
                    'reliability': rel,
                    'distance': 0,
                    'fg_pair': (fg_a, fg_b),
                })

return results

def find_fgi_options(target):
    """Find applicable FGI transforms."""
    results = []
    for fg, pos in target['groups']:
        if fg in FGI_TRANSFORMS:
            for new_fg, method, success in FGI_TRANSFORMS[fg]:
                results.append({
                    'original': fg,
                    'position': pos,
                    'new_fg': new_fg,
                    'method': method,
                    'success_rate': success,
                })
    return results

def score_disconnections(disconnections, target):
    """Score and rank disconnections using multiple criteria."""
    if not disconnections:
        return []

n = len(disconnections)
    scores = np.zeros((n, 5))  # 5 criteria

for i, d in enumerate(disconnections):
        # Criterion 1: Base reaction quality (0-10)
        scores[i, 0] = d['base_score']

# Criterion 2: Reliability (0-10)
        scores[i, 1] = d['reliability'] * 10

# Criterion 3: Convergence bonus (0-10)
        # Disconnections that split the molecule more evenly score higher
        skel = target['skeleton_length']
        if d['distance'] > 0:
            split_ratio = d['distance'] / skel
            # Best split is 0.5 (even), penalize uneven splits
            scores[i, 2] = 10 * (1 - 2 * abs(split_ratio - 0.5))
        else:
            scores[i, 2] = 5.0  # neutral for distance-independent

# Criterion 4: Consonance bonus (0-10)
        # 1,3 and 1,5 patterns are consonant (natural polarity)
        if d['distance'] in [3, 5]:
            scores[i, 3] = 9.0  # consonant
        elif d['distance'] in [2, 4]:
            scores[i, 3] = 5.0  # dissonant
        else:
            scores[i, 3] = 6.0  # neutral

# Criterion 5: Simplicity of starting materials (0-10)
        # Heuristic: common reactions score higher
        common_rxns = ['Aldol', 'Grignard', 'Suzuki', 'Diels', 'Michael', 'Claisen']
        if any(r in d['reaction'] for r in common_rxns):
            scores[i, 4] = 9.0
        else:
            scores[i, 4] = 6.0

# Weighted combination
    weights = np.array([0.30, 0.20, 0.15, 0.20, 0.15])
    total_scores = scores @ weights

# Normalize to 0-100
    total_scores = total_scores / 10.0 * 100

# Rank
    ranking = np.argsort(-total_scores)

return [(disconnections[i], total_scores[i], scores[i]) for i in ranking]

# ===================================================================
# Run analysis on all targets
# ===================================================================
print()

criteria_names = ['Reaction Quality', 'Reliability', 'Convergence',
                  'Consonance', 'SM Availability']

for t_idx, target in enumerate(TARGETS):
    print(f"{'='*65}")
    print(f"  TARGET {t_idx + 1}: {target['name']}")
    print(f"  {target['description']}")
    print(f"{'='*65}")
    print(f"  Functional groups: ", end="")
    for fg, pos in target['groups']:
        print(f"{FG_TYPES[fg]['name']}(C{pos}) ", end="")
    print(f"\n  Skeleton length: {target['skeleton_length']} carbons")

# Find disconnections
    disconnections = find_disconnections(target)

if not disconnections:
        print("  No direct disconnections found. Trying FGI...")
        fgis = find_fgi_options(target)
        if fgis:
            print(f"  Found {len(fgis)} FGI options:")
            for f in fgis[:3]:
                print(f"    {f['original']}(C{f['position']}) => "
                      f"{f['new_fg']} via {f['method']} "
                      f"(success: {f['success_rate']:.0%})")
        print()
        continue

# Score and rank
    ranked = score_disconnections(disconnections, target)

print(f"\n  Found {len(ranked)} possible disconnections:\n")

for rank, (d, total, sub_scores) in enumerate(ranked):
        marker = " <<<< RECOMMENDED" if rank == 0 else ""
        print(f"  Rank {rank + 1}: {d['reaction']}{marker}")
        print(f"    FG pair: {d['fg_pair'][0]} + {d['fg_pair'][1]}"
              f" (distance: {d['distance']})")
        print(f"    Synthon A: {d['synthon_a']}")
        print(f"    Synthon B: {d['synthon_b']}")
        print(f"    Overall Score: {total:.1f}/100")
        print(f"    Subscores: ", end="")
        for name, sc in zip(criteria_names, sub_scores):
            print(f"{name}={sc:.1f} ", end="")
        print()
        print()

# Show FGI options
    fgis = find_fgi_options(target)
    if fgis:
        print(f"  Available FGI transforms:")
        for f in fgis[:4]:
            print(f"    {f['original']}(C{f['position']}) => {f['new_fg']}"
                  f" via {f['method']} ({f['success_rate']:.0%})")
        print()

# ===================================================================
# Comparative analysis: linear vs convergent yield
# ===================================================================
print("=" * 65)
print("  CONVERGENT vs LINEAR YIELD ANALYSIS")
print("=" * 65)

steps = np.arange(2, 21)
yields_per_step = np.array([0.70, 0.80, 0.90, 0.95])

print(f"\n  {'Steps':>5s}", end="")
for y in yields_per_step:
    print(f"  {'Lin@'+str(int(y*100))+'%':>10s}", end="")
    print(f"  {'Conv@'+str(int(y*100))+'%':>10s}", end="")
print()
print("  " + "-" * 93)

for n in steps:
    print(f"  {n:5d}", end="")
    for y in yields_per_step:
        linear = y ** n * 100
        # Convergent: two branches of n//2 steps joined in 1 step
        n_branch = n // 2
        conv = (y ** n_branch) * (y ** n_branch) * y * 100
        print(f"  {linear:10.1f}", end="")
        print(f"  {conv:10.1f}", end="")
    print()

# ===================================================================
# MATPLOTLIB VISUALIZATIONS
# ===================================================================

fig, axes = plt.subplots(1, 3, figsize=(16, 5))
fig.patch.set_facecolor('#0a0a0a')
for ax in axes:
    ax.set_facecolor('#111111')
    ax.tick_params(colors='#aaaaaa', labelsize=8)
    for spine in ax.spines.values():
        spine.set_color('#333333')

# --- Panel 1: Linear vs Convergent Yield ---
ax = axes[0]
steps_arr = np.arange(2, 16)
y90_lin = 0.90 ** steps_arr * 100
y90_conv = np.array([(0.90 ** (n // 2)) ** 2 * 0.90 * 100 for n in steps_arr])
y80_lin = 0.80 ** steps_arr * 100
y80_conv = np.array([(0.80 ** (n // 2)) ** 2 * 0.80 * 100 for n in steps_arr])

ax.plot(steps_arr, y90_lin, 'o-', color='#ff6b6b', linewidth=2, markersize=4, label='Linear (90%/step)')
ax.plot(steps_arr, y90_conv, 's--', color='#51cf66', linewidth=2, markersize=4, label='Convergent (90%/step)')
ax.plot(steps_arr, y80_lin, 'o-', color='#ff922b', linewidth=2, markersize=4, label='Linear (80%/step)')
ax.plot(steps_arr, y80_conv, 's--', color='#22b8cf', linewidth=2, markersize=4, label='Convergent (80%/step)')
ax.set_xlabel('Total Steps', color='#cccccc', fontsize=9)
ax.set_ylabel('Overall Yield (%)', color='#cccccc', fontsize=9)
ax.set_title('Linear vs Convergent Synthesis', color='white', fontsize=11, fontweight='bold')
ax.legend(fontsize=7, facecolor='#1a1a1a', edgecolor='#333333', labelcolor='#cccccc')
ax.set_ylim(0, 105)
ax.grid(True, alpha=0.15)

# --- Panel 2: Disconnection Scoring (Bar Chart) ---
ax = axes[1]
target = TARGETS[0]
disconnections = find_disconnections(target)
ranked = score_disconnections(disconnections, target)

if ranked:
    names = [d['reaction'][:18] for d, _, _ in ranked[:5]]
    totals = [t for _, t, _ in ranked[:5]]
    colors = ['#51cf66', '#22b8cf', '#cc5de8', '#ff922b', '#ff6b6b'][:len(names)]
    y_pos = np.arange(len(names))
    bars = ax.barh(y_pos, totals, color=colors, height=0.6, edgecolor='none')
    ax.set_yticks(y_pos)
    ax.set_yticklabels(names, color='#cccccc', fontsize=8)
    ax.set_xlabel('Score (0-100)', color='#cccccc', fontsize=9)
    ax.set_title('Disconnection Ranking\n(beta-Hydroxy Ketone)', color='white', fontsize=11, fontweight='bold')
    ax.set_xlim(0, 105)
    for bar, val in zip(bars, totals):
        ax.text(bar.get_width() + 1, bar.get_y() + bar.get_height()/2,
                f'{val:.0f}', va='center', color='#aaaaaa', fontsize=8)
    ax.grid(True, axis='x', alpha=0.15)

# --- Panel 3: Criteria Breakdown (Grouped Bar) ---
ax = axes[2]
if ranked and len(ranked) >= 2:
    top3 = ranked[:min(3, len(ranked))]
    n_rxns = len(top3)
    criteria = ['Quality', 'Reliability', 'Convergence', 'Consonance', 'Availability']
    x = np.arange(len(criteria))
    width = 0.25
    rxn_colors = ['#51cf66', '#22b8cf', '#cc5de8']

for i, (d, _, sub) in enumerate(top3):
        offset = (i - (n_rxns - 1) / 2) * width
        label = d['reaction'][:15]
        ax.bar(x + offset, sub, width, color=rxn_colors[i], label=label, alpha=0.85)

ax.set_xticks(x)
    ax.set_xticklabels(criteria, color='#cccccc', fontsize=7, rotation=30, ha='right')
    ax.set_ylabel('Score (0-10)', color='#cccccc', fontsize=9)
    ax.set_title('Criteria Breakdown\n(Top Disconnections)', color='white', fontsize=11, fontweight='bold')
    ax.legend(fontsize=7, facecolor='#1a1a1a', edgecolor='#333333', labelcolor='#cccccc', loc='upper right')
    ax.set_ylim(0, 11)
    ax.grid(True, axis='y', alpha=0.15)

plt.tight_layout()
plt.savefig('output.png', dpi=150, bbox_inches='tight', facecolor='#0a0a0a')
plt.close()

print(f"\n{'='*65}")
print("  Analysis complete. Plots saved.")
print("  Left: Linear vs convergent yield comparison")
print("  Center: Disconnection ranking for beta-hydroxy ketone")
print("  Right: Criteria breakdown for top disconnections")
print("=" * 65)

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Back to Organic Chemistry

Share:X Reddit LinkedIn

Retrosynthetic Analysis

1. Introduction: Corey's Retrosynthetic Analysis

The Retrosynthetic Arrow

Linear vs Convergent Synthesis

2. Key Concepts: Disconnections and Synthons

Core Terminology

Donor and Acceptor Synthons

Nucleophilic (Donor, $d^n$)

Electrophilic (Acceptor, $a^n$)

Functional Group Interconversion (FGI)

3. C–C Bond Forming Reactions

3.1 Grignard Reaction

3.2 Wittig Reaction

3.3 Aldol Condensation

3.4 Claisen Condensation

3.5 Diels–Alder Reaction

3.6 Transition-Metal Catalyzed Cross-Coupling

Suzuki Coupling

Heck Reaction

Sonogashira Coupling

4. Strategy: Two-Group Disconnections

4.1 1,2-Difunctionalized Compounds

4.2 1,3-Difunctionalized Compounds

4.3 1,4-Difunctionalized Compounds

4.4 1,5-Difunctionalized Compounds

Summary Table: Two-Group Disconnection Logic

5. Worked Examples

Example A: 2-Phenyl-2-butanol via Grignard

Example B: 4-Methylcyclohex-2-enone via Diels–Alder

Example C: 4-Bromonitrobenzene via EAS Sequence

Example D: A Polyketide Fragment via Convergent Strategy

6. Protecting Groups

When to Protect

Common Protecting Groups by Functional Group

Alcohol Protection (–OH)

Amine Protection (–NH$_2$)

Carbonyl Protection (C=O)

Carboxylic Acid Protection (–COOH)

Orthogonal Protection Strategy

7. Applications

Total Synthesis of Natural Products

Pharmaceutical Process Chemistry

Green Chemistry and Sustainability

8. Historical Context

E. J. Corey and the Logic of Chemical Synthesis

R. B. Woodward: The Art Before the Science

Convergent vs Linear Synthesis: Historical Evolution

Related Video Lectures

9. Interactive Retrosynthetic Tree Builder

Retrosynthetic Analysis: Disconnection Scoring & Yield Comparison