Part I: Structure & Stereochemistry | Chapter 2

IUPAC Nomenclature

Systematic naming rules for organic compounds — alkanes, alkenes, alkynes, aromatics, and functional groups with priority rules and worked examples

1. Introduction — Why Systematic Naming Matters

Before the International Union of Pure and Applied Chemistry (IUPAC) introduced its systematic naming conventions, organic chemistry was awash in a sea of conflicting trivial names. A single compound might bear half a dozen names depending on who discovered it, where it was found, or what property first caught a chemist's attention. Acetic acid, for instance, derives from the Latin acetum (vinegar), while its systematic name — ethanoic acid — immediately reveals its two-carbon backbone and carboxylic acid functional group.

The IUPAC system, first proposed in 1892 at an international conference in Geneva and refined through subsequent editions (most recently the 2013 Recommendations), provides a unique, unambiguous name for every organic compound based on its molecular structure. This universality is indispensable: a chemist in Tokyo, a pharmacologist in Berlin, and a patent attorney in New York can all identify the same molecule from its IUPAC name without recourse to structural drawings.

The naming system rests on a simple principle: decompose the molecule into a parent chain (the longest continuous carbon chain), identify substituents (branches), specify functional groups, and assign locants (numerical positions) to describe where each feature is attached. The goal is to reconstruct the full connectivity of the molecule from the name alone.

Historical Note

The 1892 Geneva Congress brought together 34 chemists from nine countries. Their work built upon earlier proposals by August Wilhelm von Hofmann (1866) and a French commission (1889). The suffix-based system for functional groups — -ol for alcohols, -al for aldehydes, -one for ketones — has remained essentially unchanged for over a century, a testament to the elegance of the original design.

2. Alkane Nomenclature — The Foundation

Alkanes ($\text{C}_n\text{H}_{2n+2}$) are saturated hydrocarbons containing only single bonds. Their nomenclature forms the backbone of the entire IUPAC system. Every organic name, regardless of complexity, begins with identifying the parent alkane chain.

2.1 Root Names (Prefixes for Chain Length)

The first four alkane names are historical: methane, ethane, propane, butane. From five carbons onward, Greek numerical prefixes are used:

1: meth-

2: eth-

3: prop-

4: but-

5: pent-

6: hex-

7: hept-

8: oct-

9: non-

10: dec-

2.2 The Four-Step Naming Algorithm

Find the longest continuous chain. This chain determines the parent name. If two chains of equal length exist, choose the one with more substituents.
Number the chain. Begin numbering from the end that gives the lowest set of locants to the substituents. Compare locant sets at the first point of difference (the "first point of difference" rule).
Name each substituent. Alkyl groups are named by dropping the -ane suffix and adding -yl: methyl, ethyl, propyl, etc. Complex substituents are named as substituted alkyl groups in parentheses.
Assemble the name. List substituents in alphabetical order (ignoring multiplicative prefixes di-, tri-, tetra-), attach locants with hyphens, and append the parent chain name with the suffix -ane.

2.3 Worked Example: 2,3-Dimethylpentane

Consider the structure:

$$\text{CH}_3\text{CH(CH}_3\text{)CH(CH}_3\text{)CH}_2\text{CH}_3$$

Step 1: The longest chain has 5 carbons → pentane.
Step 2: Numbering from the left gives substituents at positions 2 and 3. Numbering from the right also gives 2 and 3. Both are equivalent, so either direction works.
Step 3: Two methyl groups at positions 2 and 3.
Step 4: Name = 2,3-dimethylpentane.

2.4 Common vs. IUPAC Names

Common Name	IUPAC Name	Structure
Isobutane	2-Methylpropane	$\text{(CH}_3\text{)}_3\text{CH}$
Neopentane	2,2-Dimethylpropane	$\text{C(CH}_3\text{)}_4$
Isopentane	2-Methylbutane	$\text{(CH}_3\text{)}_2\text{CHCH}_2\text{CH}_3$
Isohexane	2-Methylpentane	$\text{(CH}_3\text{)}_2\text{CH(CH}_2\text{)}_2\text{CH}_3$

While common names persist in everyday usage (especially for simple, well-known molecules), IUPAC names are mandatory in scientific publications and patent filings. The unambiguous nature of IUPAC nomenclature ensures that any chemist worldwide can reconstruct the exact molecular structure from the name.

3. Alkene and Alkyne Nomenclature

3.1 Alkenes ($\text{C}_n\text{H}_{2n}$)

Alkenes contain at least one carbon-carbon double bond ($\text{C=C}$). The naming procedure follows the alkane rules with modifications:

Identify the longest chain containing the double bond. This chain determines the parent name, even if a longer chain exists elsewhere in the molecule.
Replace -ane with -ene. The locant of the double bond is the lower-numbered carbon of the pair: but-1-ene, but-2-ene.
Number to give the double bond the lowest locant. In cases of conflict between a substituent and the double bond, the double bond takes priority.
Specify E/Z geometry if applicable. When each carbon of the double bond bears two different groups, geometric isomerism arises. Use the Cahn-Ingold-Prelog (CIP) priority rules to assign E (higher-priority groups on opposite sides) or Z (same side).

$$\text{E/Z assignment:} \quad \text{Z (zusammen = together)} \quad \text{E (entgegen = opposite)}$$

3.2 Alkynes ($\text{C}_n\text{H}_{2n-2}$)

Alkynes contain a carbon-carbon triple bond ($\text{C} \equiv \text{C}$). The rules parallel those for alkenes:

Replace -ane with -yne: ethyne, propyne, but-1-yne, but-2-yne.
The parent chain must contain the triple bond.
Number to give the triple bond the lowest locant.
Terminal alkynes ($\text{R-C} \equiv \text{CH}$) have the triple bond at position 1.

3.3 Enynes — Compounds with Both Double and Triple Bonds

When both double and triple bonds are present, the suffix becomes -en-...-yne. The chain is numbered to give the lowest set of locants to the multiple bonds collectively. If there is a tie, the double bond receives the lower number (2013 IUPAC recommendation).

$$\text{CH}_2\text{=CHCH}_2\text{C} \equiv \text{CH} \quad \longrightarrow \quad \text{pent-1-en-4-yne}$$

3.4 Degree of Unsaturation

The degree of unsaturation (or index of hydrogen deficiency, IHD) tells us how many rings or $\pi$ bonds a molecule contains. For a molecule$\text{C}_c\text{H}_h\text{N}_n\text{O}_o\text{X}_x$ (where X = halogen):

$$\text{IHD} = \frac{2c + 2 - h - x + n}{2}$$

Each double bond contributes 1 degree, each triple bond contributes 2 degrees, and each ring contributes 1 degree. Oxygen does not affect the count (it replaces $\text{CH}_2$ without changing the hydrogen count). Nitrogen adds one hydrogen equivalent, so it appears with a +n in the numerator.

Quick Check: Benzene $\text{C}_6\text{H}_6$

IHD = $\frac{2(6) + 2 - 6}{2} = \frac{8}{2} = 4$. Benzene has 3 double bonds + 1 ring = 4 degrees of unsaturation. This matches perfectly.

4. Functional Group Nomenclature and Priority Rules

Functional groups are the reactive sites of organic molecules. When multiple functional groups are present, a priority hierarchy determines which group is named as the principal characteristic group (suffix) and which groups are named as prefixes.

4.1 The Functional Group Priority Table

The following table lists common functional groups in decreasing order of priority for suffix naming. The group at the top is named as the suffix (principal characteristic group); all others of lower priority are named as prefixes:

Priority	Functional Group	Suffix	Prefix
1	Carboxylic acid ($\text{-COOH}$)	-oic acid	carboxy-
2	Ester ($\text{-COOR}$)	-oate	alkoxycarbonyl-
3	Amide ($\text{-CONH}_2$)	-amide	amido- / carbamoyl-
4	Aldehyde ($\text{-CHO}$)	-al	formyl- / oxo-
5	Ketone ($\text{C=O}$)	-one	oxo-
6	Alcohol ($\text{-OH}$)	-ol	hydroxy-
7	Amine ($\text{-NH}_2$)	-amine	amino-
8	Alkene / Alkyne	-ene / -yne	—

4.2 Naming Polyfunctional Molecules

When a molecule contains multiple functional groups, the naming strategy is:

Identify the highest-priority group — it becomes the suffix that determines the parent chain name.
Choose the parent chain to include both the highest-priority group and the maximum number of other functional groups, with the most carbon atoms.
Number the chain to give the principal characteristic group (suffix group) the lowest locant.
Express remaining groups as prefixes in alphabetical order, each with its locant.

Worked Example: 4-Amino-3-hydroxypentanoic acid

Consider a five-carbon chain with $\text{-COOH}$ at C-1, $\text{-OH}$ at C-3, and $\text{-NH}_2$ at C-4.

Highest priority: carboxylic acid → suffix = -oic acid → pentanoic acid
$\text{-OH}$ at C-3 → prefix: 3-hydroxy
$\text{-NH}_2$ at C-4 → prefix: 4-amino
Alphabetical order: amino before hydroxy
Final name: 4-amino-3-hydroxypentanoic acid

4.3 Halogens and Nitro Groups as Prefixes

Halogens and the nitro group are always named as prefixes — they never serve as the principal characteristic group:

Fluoro-, chloro-, bromo-, iodo- for halogens
Nitro- for the $\text{-NO}_2$ group

Example: $\text{CH}_3\text{CHBrCH}_2\text{CHO}$ = 3-bromobutanal(aldehyde is the suffix; bromo is the prefix at position 3).

5. Aromatic Compound Nomenclature

Aromatic compounds present a unique naming challenge because benzene-derived names have deep historical roots. IUPAC retains many traditional names — toluene, phenol, aniline, benzaldehyde — alongside systematic alternatives.

5.1 Monosubstituted Benzenes

Simple monosubstituted benzenes are named as derivatives of benzene: chlorobenzene, nitrobenzene, ethylbenzene. Several retain historical names:

Toluene = methylbenzene
Phenol = hydroxybenzene
Aniline = aminobenzene
Anisole = methoxybenzene
Styrene = vinylbenzene (ethenylbenzene)

5.2 Disubstituted Benzenes

For disubstituted benzenes, three positional isomers exist. They can be designated by locants (1,2- / 1,3- / 1,4-) or by the classical prefixes:

$$\text{ortho (o-)} = 1,2\text{-} \quad \text{meta (m-)} = 1,3\text{-} \quad \text{para (p-)} = 1,4\text{-}$$

IUPAC recommends numerical locants for unambiguous naming, but o-, m-, p- remain widely used in speech and informal writing.

5.3 Polysubstituted Benzenes

When three or more substituents are present, numerical locants are essential. The ring is numbered to give the lowest set of locants. If one substituent defines a retained name (e.g., toluene, phenol), it is assigned position 1.

Example: 2,4,6-trinitrotoluene (TNT) — the methyl group (toluene) is at C-1, and three nitro groups are at positions 2, 4, and 6.

5.4 Benzene as a Substituent: Phenyl vs. Benzyl

When the benzene ring is a substituent rather than the parent, two common group names arise:

Phenyl ($\text{C}_6\text{H}_5\text{-}$, abbreviated Ph): the ring directly attached to the parent chain. Example: 2-phenylhexane.
Benzyl ($\text{C}_6\text{H}_5\text{CH}_2\text{-}$, abbreviated Bn): a phenyl group with a $\text{-CH}_2\text{-}$ linker. Example: benzyl chloride.

6. Cycloalkane and Bicyclic Nomenclature

Cyclic saturated hydrocarbons are named by adding the prefix cyclo- to the corresponding alkane: cyclopropane, cyclobutane, cyclopentane, cyclohexane. The general formula is$\text{C}_n\text{H}_{2n}$, the same as for alkenes (both have one degree of unsaturation).

6.1 Substituted Cycloalkanes

When a cycloalkane bears substituents, the ring is the parent if it has more carbons than any chain substituent. Otherwise, the chain is the parent and the ring is a cycloalkyl substituent (e.g., cyclopentyl).

Number the ring starting with a substituted carbon, and choose the numbering that gives the lowest locant set. When a single substituent is present, it is understood to be at position 1 (no locant needed).

6.2 Bicyclic Systems

Bicyclic alkanes contain two fused or bridged rings. The naming system uses the format:

$$\text{bicyclo}[a.b.c]\text{alkane}$$

where $a \geq b \geq c$ are the numbers of carbons in each bridge (connecting the bridgehead carbons), listed in decreasing order. The total carbon count equals $a + b + c + 2$(the +2 accounts for the two bridgehead carbons).

Example: bicyclo[2.2.1]heptane (norbornane) has bridges of 2, 2, and 1 carbons, with$2 + 2 + 1 + 2 = 7$ total carbons.

7. Advanced Naming Topics

7.1 Stereodescriptors in Nomenclature

Complete IUPAC names for stereoisomers include stereodescriptors as prefixes:

R/S for chiral centers (Cahn-Ingold-Prelog system)
E/Z for double bond geometry
cis/trans for ring substituents (acceptable alternative to R/S for simple cases)

Example: (2R,3S)-2-bromo-3-methylpentane unambiguously specifies the configuration at both stereocenters. The stereodescriptors are enclosed in parentheses and placed before the name.

7.2 Substitutive vs. Replacement Nomenclature

The standard IUPAC system is substitutive nomenclature, where the parent hydride (alkane) is modified by substituent prefixes and functional group suffixes. An alternative system, replacement nomenclature (Hantzsch-Widman for small heterocycles, or "a" nomenclature for longer chains), replaces carbon atoms in the parent chain with heteroatoms:

$$\text{oxa- (O)}, \quad \text{aza- (N)}, \quad \text{thia- (S)}, \quad \text{phospha- (P)}$$

Example: 2-oxacyclopentane is another name for tetrahydrofuran (THF), indicating that position 2 of the cyclopentane ring is occupied by oxygen.

7.3 Naming Complex Substituents

When a substituent itself is branched, it is named as a substituted alkyl group and enclosed in parentheses. The substituent is numbered starting from the carbon attached to the parent chain:

Example: 5-(1,2-dimethylpropyl)nonane. The substituent at C-5 of nonane is a 3-carbon group (propyl) that itself bears methyl groups at its positions 1 and 2.

Multiplicative prefixes for identical complex substituents use bis-, tris-, tetrakis- (instead of di-, tri-, tetra-) to avoid ambiguity.

7.4 Naming Ethers, Epoxides, and Thiols

Ethers ($\text{R-O-R'}$) are named by the prefix alkoxy- on the longer chain: methoxypropane, or as alkyl alkyl ether (common name). Epoxides are named as epoxyalkanes or as oxiranes. Thiols ($\text{-SH}$) use the suffix -thiol: ethanethiol (common name: ethyl mercaptan).

8. Derivation: From Molecular Formula to Name

A key skill in nomenclature is deducing the IUPAC name from the molecular formula plus structural information. Let us work through a systematic procedure:

Step-by-Step for $\text{C}_7\text{H}_{14}\text{O}_2$

Step 1: Degree of unsaturation.

$$\text{IHD} = \frac{2(7) + 2 - 14}{2} = \frac{2}{2} = 1$$

One degree of unsaturation. This could be one double bond or one ring. Oxygen does not affect the calculation.

Step 2: Identify functional groups from the formula.

Two oxygen atoms with IHD = 1. Possible functional groups: carboxylic acid ($\text{-COOH}$, uses one C=O), ester ($\text{-COOR}$, uses one C=O), or two hydroxyl groups + one C=C. Given the molecular formula, a carboxylic acid or ester is most likely.

Step 3: Suppose it is heptanoic acid.

Heptanoic acid = $\text{CH}_3\text{(CH}_2\text{)}_5\text{COOH}$ =$\text{C}_7\text{H}_{14}\text{O}_2$. The formula matches perfectly. If spectroscopic data confirm a straight chain with a terminal carboxylic acid, the name is simply heptanoic acid.

Step 4: Alternative isomers.

The same formula could also represent methylhexanoic acid isomers (2-methylhexanoic acid, 3-methylhexanoic acid, etc.), or esters like methyl hexanoate, ethyl pentanoate, propyl butanoate, and so forth. Without additional structural information, multiple valid names exist for the same formula — underscoring the importance of having structural data (NMR, IR, MS) before assigning a name.

9. Real-World Applications of Nomenclature

9.1 Pharmaceutical Naming

Drug molecules often have three names: a systematic IUPAC name (which can be extremely long for complex molecules), a generic name (International Nonproprietary Name, INN), and a brand name. For example, ibuprofen's IUPAC name is (RS)-2-(4-(2-methylpropyl)phenyl)propanoic acid. While no clinician uses this name in practice, it precisely specifies the molecular structure and is essential for patent claims, regulatory filings, and chemical databases.

9.2 Chemical Databases and Informatics

Modern chemical databases (CAS Registry, PubChem, ChemSpider) rely on systematic naming and related line-notation systems like SMILES and InChI. The IUPAC name can be algorithmically converted to a connection table and back, enabling computer-based structure searching. The CAS Registry Number system assigns a unique numerical identifier to every known substance, but the underlying entry always includes the systematic name.

9.3 Environmental and Safety Regulations

Regulatory agencies (EPA, REACH, GHS) require systematic names on Safety Data Sheets (SDS). Correct nomenclature ensures that emergency responders and workers can identify hazardous substances unambiguously. An incorrect name on an SDS could have serious safety consequences.

9.4 Materials Science and Polymers

Polymers are named using source-based or structure-based nomenclature. Source-based names use the prefix poly + monomer name: poly(ethylene), poly(vinyl chloride). Structure-based names describe the repeating unit: poly(methylene) for polyethylene. The IUPAC Commission on Macromolecular Nomenclature maintains specialized rules for this vast class of materials.

10. Python Simulation — Nomenclature Analysis Tools

The following Python simulation demonstrates key computational aspects of nomenclature: calculating the index of hydrogen deficiency, enumerating possible molecular formulas for a given carbon count, and analyzing the relationship between chain length and boiling point for straight-chain alkanes.

Python

script.py221 lines

#!/usr/bin/env python3
"""
nomenclature_tools.py
1) Index of Hydrogen Deficiency (IHD) calculator
2) Molecular formula generator for alkane isomers
3) Boiling point estimation for straight-chain alkanes
Uses numpy only (no scipy).
"""
import numpy as np

# ================================================================
# PART 1: Index of Hydrogen Deficiency (IHD) Calculator
# ================================================================
print("=" * 65)
print("PART 1: Index of Hydrogen Deficiency (IHD) Calculator")
print("=" * 65)
print()

def calc_ihd(c, h, n=0, o=0, x=0):
    """Calculate IHD from molecular formula CcHhNnOoXx."""
    return (2 * c + 2 - h - x + n) / 2.0

# Test cases: (name, C, H, N, O, X)
test_molecules = [
    ("Methane (CH4)",          1,  4, 0, 0, 0),
    ("Ethylene (C2H4)",        2,  4, 0, 0, 0),
    ("Acetylene (C2H2)",       2,  2, 0, 0, 0),
    ("Benzene (C6H6)",         6,  6, 0, 0, 0),
    ("Cyclohexane (C6H12)",    6, 12, 0, 0, 0),
    ("Naphthalene (C10H8)",   10,  8, 0, 0, 0),
    ("Acetic acid (C2H4O2)",   2,  4, 0, 2, 0),
    ("Aniline (C6H7N)",        6,  7, 1, 0, 0),
    ("Chloroform (CHCl3)",     1,  1, 0, 0, 3),
    ("Aspirin (C9H8O4)",       9,  8, 0, 4, 0),
]

print(f"{'Molecule':<28s} {'Formula':<14s} {'IHD':>5s}  Interpretation")
print("-" * 75)

for name, c, h, n, o, x in test_molecules:
    ihd = calc_ihd(c, h, n, o, x)
    parts = f"C{c}H{h}"
    if n > 0: parts += f"N{n}"
    if o > 0: parts += f"O{o}"
    if x > 0: parts += f"X{x}"

# Interpret
    if ihd == 0:
        interp = "Saturated, no rings"
    elif ihd == 1:
        interp = "1 double bond or 1 ring"
    elif ihd == 4 and c == 6:
        interp = "Benzene ring (3 C=C + 1 ring)"
    elif ihd == 7 and c == 10:
        interp = "Fused bicyclic aromatic"
    else:
        interp = f"{int(ihd)} degrees of unsaturation"

print(f"{name:<28s} {parts:<14s} {ihd:>5.1f}  {interp}")

print()

# ================================================================
# PART 2: Straight-chain Alkane Properties
# ================================================================
print("=" * 65)
print("PART 2: Straight-Chain Alkane Boiling Point Analysis")
print("=" * 65)
print()

# Known boiling points (deg C) for straight-chain alkanes C1-C12
n_carbons = np.arange(1, 13)
bp_data = np.array([
    -161.5,  # methane
    -88.6,   # ethane
    -42.1,   # propane
    -0.5,    # butane
    36.1,    # pentane
    68.7,    # hexane
    98.4,    # heptane
    125.7,   # octane
    150.8,   # nonane
    174.1,   # decane
    195.9,   # undecane
    216.3,   # dodecane
])

alkane_names = [
    "Methane", "Ethane", "Propane", "Butane", "Pentane", "Hexane",
    "Heptane", "Octane", "Nonane", "Decane", "Undecane", "Dodecane"
]

# Molecular formulas
formulas = [f"C{n}H{2*n+2}" for n in n_carbons]
mol_weights = 12.011 * n_carbons + 1.008 * (2 * n_carbons + 2)

print(f"{'n':>3s}  {'Name':<12s} {'Formula':<8s} {'MW':>8s} {'BP (C)':>8s}")
print("-" * 48)
for i, n in enumerate(n_carbons):
    print(f"{n:>3d}  {alkane_names[i]:<12s} {formulas[i]:<8s} {mol_weights[i]:>8.2f} {bp_data[i]:>8.1f}")

print()

# Fit a polynomial to BP vs n_carbons
# BP roughly follows: BP = a * n^(2/3) + b (empirical)
# We'll fit a quadratic: BP = a*n^2 + b*n + c
coeffs = np.polyfit(n_carbons, bp_data, 2)
bp_fit = np.polyval(coeffs, n_carbons)
residuals = bp_data - bp_fit
rmse = np.sqrt(np.mean(residuals**2))

print(f"Quadratic fit: BP = {coeffs[0]:.3f}*n^2 + {coeffs[1]:.3f}*n + {coeffs[2]:.3f}")
print(f"RMSE = {rmse:.2f} deg C")
print()

# Predict BP for C13-C20
n_pred = np.arange(13, 21)
bp_pred = np.polyval(coeffs, n_pred)
pred_names = [
    "Tridecane", "Tetradecane", "Pentadecane", "Hexadecane",
    "Heptadecane", "Octadecane", "Nonadecane", "Icosane"
]

print("Predicted boiling points for C13-C20:")
print(f"{'n':>3s}  {'Name':<14s} {'Predicted BP':>14s}")
print("-" * 36)
for i, n in enumerate(n_pred):
    print(f"{n:>3d}  {pred_names[i]:<14s} {bp_pred[i]:>10.1f} deg C")

print()

# ================================================================
# PART 3: Constitutional Isomer Count vs Carbon Number
# ================================================================
print("=" * 65)
print("PART 3: Constitutional Isomers of Alkanes")
print("=" * 65)
print()

# Known number of constitutional isomers for alkanes CnH(2n+2)
# These are exact counts from graph theory enumeration
n_vals = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20])
isomer_counts = np.array([
    1, 1, 1, 2, 3, 5, 9, 18, 35, 75, 355, 4347, 366319
])

print(f"{'Carbons':>8s}  {'Formula':<10s} {'Isomers':>12s}  {'log10(N)':>10s}")
print("-" * 48)
for i in range(len(n_vals)):
    n = n_vals[i]
    formula = f"C{n}H{2*n+2}"
    count = isomer_counts[i]
    log_count = np.log10(count) if count > 0 else 0
    print(f"{n:>8d}  {formula:<10s} {count:>12d}  {log_count:>10.3f}")

print()
print("The number of constitutional isomers grows approximately")
print("exponentially with carbon number, making systematic naming")
print("essential for distinguishing structures unambiguously.")
print()

# Fit exponential: log(N) = a*n + b
mask = isomer_counts > 1
log_isomers = np.log10(isomer_counts[mask].astype(float))
n_fit = n_vals[mask]
exp_coeffs = np.polyfit(n_fit, log_isomers, 1)
print(f"Exponential fit: log10(N) ~ {exp_coeffs[0]:.4f}*n + {exp_coeffs[1]:.4f}")
print(f"This implies N ~ 10^({exp_coeffs[0]:.4f}*n {exp_coeffs[1]:+.4f})")
print(f"Growth factor per carbon: ~{10**exp_coeffs[0]:.2f}x")
print()

# ================================================================
# PART 4: Functional Group Identification from Formula
# ================================================================
print("=" * 65)
print("PART 4: Functional Group Possibilities from Molecular Formula")
print("=" * 65)
print()

def analyze_formula(c, h, n_atoms=0, o=0, x=0):
    """Given a molecular formula, suggest possible functional groups."""
    ihd = (2 * c + 2 - h - x + n_atoms) / 2.0
    possibilities = []

if ihd == 0 and o == 0 and x == 0 and n_atoms == 0:
        possibilities.append("Alkane (saturated hydrocarbon)")
    if ihd == 0 and o == 1:
        possibilities.append("Alcohol or ether")
    if ihd == 0 and o == 0 and n_atoms == 1:
        possibilities.append("Amine")
    if ihd == 1 and o == 0:
        possibilities.append("Alkene or cycloalkane")
    if ihd == 1 and o == 1:
        possibilities.append("Aldehyde, ketone, enol, or cyclic ether")
    if ihd == 1 and o == 2:
        possibilities.append("Carboxylic acid or ester")
    if ihd == 2 and o == 0:
        possibilities.append("Alkyne, diene, or two rings/double bonds")
    if ihd >= 4 and c >= 6:
        possibilities.append(f"Possibly aromatic (IHD={ihd:.0f})")

return ihd, possibilities

test_formulas = [
    ("C6H14",      6, 14, 0, 0, 0),
    ("C6H12",      6, 12, 0, 0, 0),
    ("C6H6",       6,  6, 0, 0, 0),
    ("C3H6O",      3,  6, 0, 1, 0),
    ("C2H4O2",     2,  4, 0, 2, 0),
    ("C4H9N",      4,  9, 1, 0, 0),
    ("C8H8",       8,  8, 0, 0, 0),
    ("C7H14O2",    7, 14, 0, 2, 0),
]

for label, c, h, n_at, o, x in test_formulas:
    ihd, poss = analyze_formula(c, h, n_at, o, x)
    print(f"{label:>10s}  IHD = {ihd:.1f}")
    for p in poss:
        print(f"{'':>12s}  -> {p}")
    print()

Click Run to execute the Python code

Code will be executed with Python 3 on the server

← Previous: Functional Groups Next: Stereochemistry →

Share:X Reddit LinkedIn