Part I: Probability | Chapter 2

Conditional Probability

Bayes' theorem, independence, and the law of total probability

Historical Context

The Reverend Thomas Bayes (1701–1761) was an English Presbyterian minister and mathematician who formulated the first version of what we now call Bayes' theorem. His work, published posthumously in 1763 as “An Essay towards solving a Problem in the Doctrine of Chances,” addressed the inverse probability problem: given observed outcomes, what can we infer about the underlying probability? Pierre-Simon Laplace independently discovered and significantly generalized Bayes' result in 1774, applying it to problems in astronomy, population statistics, and jurisprudence. Laplace's formulation is closer to the modern statement of Bayes' theorem.

The concept of conditional probability was formalized within Kolmogorov's axiomatization in 1933. Kolmogorov defined conditional probability as a ratio of probabilities when the conditioning event has positive probability, and later extended the concept to conditioning on events of probability zero using the theory of conditional expectations and the Radon-Nikodym theorem. This extension is essential for continuous random variables and modern probability theory.

Today, Bayes' theorem is the cornerstone of Bayesian statistics, machine learning, medical diagnostics, spam filtering, and countless other applications. The “Bayesian revolution” in statistics, enabled by computational advances in MCMC methods since the 1990s, has made Bayesian inference one of the dominant paradigms in data science.

2.1 Definition of Conditional Probability

If we learn that event $B$ has occurred, how should we update the probability of event $A$? The answer is given by the conditional probability.

Definition: Conditional Probability

For events $A, B$ with $P(B) > 0$, the conditional probabilityof $A$ given $B$ is:

$$P(A \mid B) = \frac{P(A \cap B)}{P(B)}$$

Intuitively, given that $B$ has occurred, the new sample space is restricted to$B$. We rescale by $1/P(B)$ to maintain normalization. One can verify that for fixed $B$ with $P(B) > 0$, the function$A \mapsto P(A \mid B)$ is itself a probability measure on $(\Omega, \mathcal{F})$.

Derivation 1: Conditional Probability is a Probability Measure

We verify Kolmogorov's three axioms for $Q(A) := P(A \mid B)$:

(1) Non-negativity: $Q(A) = P(A \cap B)/P(B) \geq 0$ since both numerator and denominator are non-negative.

(2) Normalization: $Q(\Omega) = P(\Omega \cap B)/P(B) = P(B)/P(B) = 1$.

(3) Countable additivity: If $A_1, A_2, \ldots$ are pairwise disjoint:

$$Q\!\left(\bigcup_{n=1}^{\infty} A_n\right) = \frac{P\!\left(\left(\bigcup_{n} A_n\right) \cap B\right)}{P(B)} = \frac{P\!\left(\bigcup_{n} (A_n \cap B)\right)}{P(B)} = \frac{\sum_{n} P(A_n \cap B)}{P(B)} = \sum_{n} Q(A_n)$$

where we used the fact that the $A_n \cap B$ are also pairwise disjoint. Therefore $P(\cdot \mid B)$ satisfies all axioms and is a valid probability measure.

The Multiplication Rule

Rearranging the definition gives the multiplication rule:

$$P(A \cap B) = P(A \mid B) \, P(B) = P(B \mid A) \, P(A)$$

This extends to multiple events by the chain rule:

$$P(A_1 \cap A_2 \cap \cdots \cap A_n) = P(A_1) \, P(A_2 \mid A_1) \, P(A_3 \mid A_1 \cap A_2) \cdots P(A_n \mid A_1 \cap \cdots \cap A_{n-1})$$

2.2 The Law of Total Probability

The law of total probability allows us to compute the probability of an event by conditioning on a partition of the sample space.

Theorem: Law of Total Probability

If $B_1, B_2, \ldots, B_n$ form a partition of $\Omega$ (i.e., they are pairwise disjoint and $\bigcup_{i=1}^n B_i = \Omega$) with $P(B_i) > 0$for all $i$, then for any event $A$:

$$P(A) = \sum_{i=1}^n P(A \mid B_i) \, P(B_i)$$

Derivation 2: Proof of the Law of Total Probability

Since $\{B_i\}$ is a partition:

$$A = A \cap \Omega = A \cap \left(\bigcup_{i=1}^n B_i\right) = \bigcup_{i=1}^n (A \cap B_i)$$

The sets $A \cap B_i$ are pairwise disjoint (since the $B_i$ are), so by countable additivity:

$$P(A) = \sum_{i=1}^n P(A \cap B_i) = \sum_{i=1}^n P(A \mid B_i) \, P(B_i)$$

The last step uses the multiplication rule $P(A \cap B_i) = P(A \mid B_i) P(B_i)$.

Example: Medical Testing

A disease affects 1% of the population. A test has sensitivity 95% (true positive rate) and specificity 98% (true negative rate). What is the probability that a randomly selected person tests positive?

Let $D$ = has disease, $T$ = tests positive. The partition is$\{D, D^c\}$.

$$P(T) = P(T \mid D) P(D) + P(T \mid D^c) P(D^c) = 0.95 \times 0.01 + 0.02 \times 0.99 = 0.0293$$

About 2.93% of people test positive. This is crucial for the Bayesian calculation that follows.

2.3 Bayes' Theorem

Derivation 3: Full Derivation of Bayes' Theorem

Step 1: Start from the definition of conditional probability:

$$P(B_i \mid A) = \frac{P(A \cap B_i)}{P(A)}$$

Step 2: Apply the multiplication rule to the numerator:

$$P(A \cap B_i) = P(A \mid B_i) \, P(B_i)$$

Step 3: Apply the law of total probability to the denominator:

$$P(A) = \sum_{j=1}^n P(A \mid B_j) \, P(B_j)$$

Step 4: Combine to obtain Bayes' theorem:

$$\boxed{P(B_i \mid A) = \frac{P(A \mid B_i) \, P(B_i)}{\sum_{j=1}^n P(A \mid B_j) \, P(B_j)}}$$

Bayes' Theorem (Simple Form)

For two events with $P(A) > 0$ and $P(B) > 0$:

$$P(B \mid A) = \frac{P(A \mid B) \, P(B)}{P(A)}$$

Here $P(B)$ is the prior, $P(A \mid B)$ is the likelihood, $P(B \mid A)$ is the posterior, and$P(A)$ is the evidence (marginal likelihood).

Example: Medical Test (Continued)

Given a positive test result, what is the probability the person actually has the disease?

$$P(D \mid T) = \frac{P(T \mid D) \, P(D)}{P(T)} = \frac{0.95 \times 0.01}{0.0293} \approx 0.324$$

Despite a 95% sensitive test, only about 32.4% of positive results are true positives. This counter-intuitive result—the base rate fallacy—arises because the disease is rare. The 2% false positive rate applied to 99% of the healthy population generates more false positives than the 95% detection rate applied to the 1% diseased population.

2.4 Independence

Definition: Independence

Events $A$ and $B$ are independent if:

$$P(A \cap B) = P(A) \, P(B)$$

Equivalently, $P(A \mid B) = P(A)$ when $P(B) > 0$. Knowing$B$ has occurred does not change the probability of $A$.

Mutual vs. Pairwise Independence

Events $A_1, \ldots, A_n$ are mutually independent if for every subset $S \subseteq \{1, \ldots, n\}$:

$$P\!\left(\bigcap_{i \in S} A_i\right) = \prod_{i \in S} P(A_i)$$

This requires $2^n - n - 1$ conditions to hold, not just the $\binom{n}{2}$pairwise conditions. Pairwise independence does not imply mutual independence.

Counterexample: Pairwise but not Mutually Independent

Toss two fair coins. Let $A$ = first coin heads, $B$ = second coin heads,$C$ = both coins show the same face. Then:

$P(A) = P(B) = P(C) = 1/2$

$P(A \cap B) = 1/4 = P(A)P(B)$ (independent)

$P(A \cap C) = 1/4 = P(A)P(C)$ (independent)

$P(B \cap C) = 1/4 = P(B)P(C)$ (independent)

But $P(A \cap B \cap C) = 1/4 \neq 1/8 = P(A)P(B)P(C)$. All three pairwise conditions hold but the triple condition fails.

Derivation 4: Independence Implies Complement Independence

If $A$ and $B$ are independent, then so are $A$ and$B^c$:

$$P(A \cap B^c) = P(A) - P(A \cap B) = P(A) - P(A)P(B) = P(A)(1 - P(B)) = P(A)P(B^c)$$

By the same argument, $A^c$ and $B$ are independent, and$A^c$ and $B^c$ are independent.

2.5 Bayesian Updating

Bayesian updating is the process of sequentially applying Bayes' theorem as new data arrives. After observing data $D_1$, the posterior becomes the new prior for updating with$D_2$, and so on.

Derivation 5: Sequential Bayesian Updating

Let $H$ be a hypothesis and $D_1, D_2, \ldots, D_n$ be sequentially observed data points. After observing $D_1$:

$$P(H \mid D_1) = \frac{P(D_1 \mid H) \, P(H)}{P(D_1)}$$

After observing $D_2$, using $P(H \mid D_1)$ as the new prior:

$$P(H \mid D_1, D_2) = \frac{P(D_2 \mid H, D_1) \, P(H \mid D_1)}{P(D_2 \mid D_1)}$$

If the data are conditionally independent given $H$ (i.e.,$P(D_2 \mid H, D_1) = P(D_2 \mid H)$), then after $n$ observations:

$$P(H \mid D_1, \ldots, D_n) \propto P(H) \prod_{i=1}^n P(D_i \mid H)$$

The posterior is proportional to the prior times the product of likelihoods. The normalizing constant is determined by the requirement that probabilities sum (or integrate) to one over all hypotheses.

The Odds Form of Bayes' Theorem

An elegant alternative formulation uses odds. The posterior odds equal the prior odds times the likelihood ratio (Bayes factor):

$$\frac{P(H_1 \mid D)}{P(H_2 \mid D)} = \frac{P(D \mid H_1)}{P(D \mid H_2)} \cdot \frac{P(H_1)}{P(H_2)}$$

This form is particularly useful because the normalizing constant $P(D)$ cancels out, and it clearly separates the contribution of the data (likelihood ratio) from our prior beliefs.

Example: Updating Coin Bias

Suppose we suspect a coin may be biased. We consider two hypotheses:$H_0$: the coin is fair ($p = 0.5$) with prior probability 0.8, and $H_1$: the coin is biased ($p = 0.7$) with prior probability 0.2. We flip the coin 10 times and observe 8 heads.

The likelihood ratio is:

$$\frac{P(8H \mid H_1)}{P(8H \mid H_0)} = \frac{\binom{10}{8} (0.7)^8 (0.3)^2}{\binom{10}{8} (0.5)^{10}} = \frac{(0.7)^8 (0.3)^2}{(0.5)^{10}} \approx 5.18$$

The posterior odds are:

$$\frac{P(H_1 \mid D)}{P(H_0 \mid D)} = 5.18 \times \frac{0.2}{0.8} = 1.295$$

Converting back: $P(H_1 \mid D) = 1.295 / (1 + 1.295) \approx 0.564$. The data has shifted our belief from 20% to 56.4% that the coin is biased.

2.6 Applications

Application 1: Spam Filtering (Naive Bayes)

The Naive Bayes classifier uses Bayes' theorem with a conditional independence assumption to classify emails. Given words $w_1, \ldots, w_n$ in an email:

$$P(\text{spam} \mid w_1, \ldots, w_n) \propto P(\text{spam}) \prod_{i=1}^n P(w_i \mid \text{spam})$$

Despite the strong (and often violated) independence assumption, Naive Bayes classifiers perform remarkably well in practice, a phenomenon studied by Domingos and Pazzani (1997).

Application 2: The Monty Hall Problem

In this famous problem, a prize is behind one of three doors. You pick door 1. The host, who knows where the prize is, opens door 3 (which has no prize). Should you switch to door 2?

Let $C_i$ = prize behind door $i$, $H_3$ = host opens door 3. By Bayes' theorem:

$$P(C_1 \mid H_3) = \frac{P(H_3 \mid C_1) P(C_1)}{P(H_3)} = \frac{(1/2)(1/3)}{1/2} = \frac{1}{3}$$

$$P(C_2 \mid H_3) = \frac{P(H_3 \mid C_2) P(C_2)}{P(H_3)} = \frac{(1)(1/3)}{1/2} = \frac{2}{3}$$

Switching doubles your chance of winning from $1/3$ to $2/3$. The key is that the host's action provides information: $P(H_3 \mid C_2) = 1$ because the host must open door 3 if the prize is behind door 2.

Application 3: Prosecutor's Fallacy

In forensic science, confusing $P(\text{evidence} \mid \text{innocent})$ with$P(\text{innocent} \mid \text{evidence})$ is called the prosecutor's fallacy. If a DNA match occurs with probability $10^{-6}$ for an innocent person, this does not mean there is a one-in-a-million chance the defendant is innocent. By Bayes' theorem, the posterior depends critically on the prior probability of guilt, which depends on other evidence.

Application 4: Machine Learning and Classification

In Bayesian classification, we assign a new observation $\mathbf{x}$ to the class$c^*$ that maximizes the posterior:

$$c^* = \arg\max_c \, P(C = c \mid \mathbf{x}) = \arg\max_c \, P(\mathbf{x} \mid C = c) \, P(C = c)$$

This framework encompasses Naive Bayes, linear discriminant analysis, quadratic discriminant analysis, and Gaussian process classifiers. The choice of likelihood model$P(\mathbf{x} \mid C = c)$ determines the classifier's decision boundaries.

2.7 Python Simulation

This simulation demonstrates Bayes' theorem through the medical testing example, the Monty Hall problem, and sequential Bayesian updating.

Conditional Probability: Bayes Theorem and Bayesian Updating

Python

script.py142 lines

import random
import math

random.seed(42)

# --- Simulation 1: Medical Test (Bayes' Theorem) ---
print("=== Medical Test: Bayes' Theorem ===")
prevalence = 0.01
sensitivity = 0.95
specificity = 0.98

n_people = 500000
true_positive = 0
false_positive = 0
true_negative = 0
false_negative = 0

for _ in range(n_people):
    has_disease = random.random() < prevalence
    if has_disease:
        test_positive = random.random() < sensitivity
        if test_positive:
            true_positive += 1
        else:
            false_negative += 1
    else:
        test_positive = random.random() < (1 - specificity)
        if test_positive:
            false_positive += 1
        else:
            true_negative += 1

total_positive = true_positive + false_positive
ppv_sim = true_positive / total_positive if total_positive > 0 else 0
ppv_theo = (sensitivity * prevalence) / (sensitivity * prevalence + (1 - specificity) * (1 - prevalence))

print("True Positives:  %d" % true_positive)
print("False Positives: %d" % false_positive)
print("PPV (simulated):   %.4f" % ppv_sim)
print("PPV (theoretical): %.4f" % ppv_theo)

# --- Simulation 2: Monty Hall Problem ---
print("")
print("=== Monty Hall Problem ===")
n_trials = 200000
wins_stay = 0
wins_switch = 0

for _ in range(n_trials):
    prize = random.randint(0, 2)
    choice = random.randint(0, 2)
    # Host opens a door that is not the prize and not the choice
    doors = [0, 1, 2]
    host_options = [d for d in doors if d != prize and d != choice]
    host_opens = random.choice(host_options)
    # Switch to the remaining door
    switch_to = [d for d in doors if d != choice and d != host_opens][0]
    if choice == prize:
        wins_stay += 1
    if switch_to == prize:
        wins_switch += 1

print("Strategy    Wins      P(win)")
print("Stay:       %6d    %.4f" % (wins_stay, wins_stay / n_trials))
print("Switch:     %6d    %.4f" % (wins_switch, wins_switch / n_trials))
print("Theory:     Stay=0.3333  Switch=0.6667")

# --- Simulation 3: Sequential Bayesian Updating ---
print("")
print("=== Sequential Bayesian Updating ===")
print("Coin bias estimation: true p = 0.7")
true_p = 0.7
# Two hypotheses: H0 (p=0.5), H1 (p=0.7)
prior_h1 = 0.2
prior_h0 = 0.8
current_h1 = prior_h1
current_h0 = prior_h0

flips = []
posteriors = []
random.seed(123)
for i in range(20):
    flip = 1 if random.random() < true_p else 0
    flips.append(flip)
    # Likelihood of this flip under each hypothesis
    if flip == 1:
        l_h0 = 0.5
        l_h1 = 0.7
    else:
        l_h0 = 0.5
        l_h1 = 0.3
    # Update
    unnorm_h0 = l_h0 * current_h0
    unnorm_h1 = l_h1 * current_h1
    total = unnorm_h0 + unnorm_h1
    current_h0 = unnorm_h0 / total
    current_h1 = unnorm_h1 / total
    posteriors.append(current_h1)

print("Flip  Result  P(H1|data)")
for i in range(20):
    print("  %2d     %s      %.4f" % (i+1, "H" if flips[i] == 1 else "T", posteriors[i]))

print("")
print("After 20 flips: P(biased coin) = %.4f" % current_h1)
heads_count = sum(flips)
print("Observed: %d heads out of 20 flips" % heads_count)

# --- Simulation 4: Law of Total Probability ---
print("")
print("=== Law of Total Probability ===")
# Scenario: factory with 3 machines
# Machine A: 50% of output, 3% defect rate
# Machine B: 30% of output, 5% defect rate
# Machine C: 20% of output, 8% defect rate
n_items = 300000
defects = 0
for _ in range(n_items):
    r = random.random()
    if r < 0.50:
        defective = random.random() < 0.03
    elif r < 0.80:
        defective = random.random() < 0.05
    else:
        defective = random.random() < 0.08
    if defective:
        defects += 1

p_defect_sim = defects / n_items
p_defect_theo = 0.50 * 0.03 + 0.30 * 0.05 + 0.20 * 0.08
print("P(defect) simulated:   %.4f" % p_defect_sim)
print("P(defect) theoretical: %.4f" % p_defect_theo)

# Given defect, P(Machine A)?
# By Bayes: P(A|D) = P(D|A)P(A) / P(D) = 0.03*0.50 / 0.046
p_a_given_d = 0.03 * 0.50 / p_defect_theo
p_b_given_d = 0.05 * 0.30 / p_defect_theo
p_c_given_d = 0.08 * 0.20 / p_defect_theo
print("P(Machine A | defect) = %.4f" % p_a_given_d)
print("P(Machine B | defect) = %.4f" % p_b_given_d)
print("P(Machine C | defect) = %.4f" % p_c_given_d)

Click Run to execute the Python code

Code will be executed with Python 3 on the server

2.8 Summary and Key Takeaways

Conditional Probability

$P(A \mid B) = P(A \cap B) / P(B)$ defines a new probability measure on$(\Omega, \mathcal{F})$. It represents our updated belief about $A$after learning $B$.

Bayes' Theorem

The posterior is proportional to the likelihood times the prior. This is the foundation of all Bayesian reasoning and statistical inference.

Independence

Events are independent when $P(A \cap B) = P(A)P(B)$. Mutual independence is stronger than pairwise independence and requires $2^n - n - 1$ conditions.

Bayesian Updating

Sequential application of Bayes' theorem allows beliefs to be continuously updated as new data arrives, with the posterior becoming the prior for the next update.

Common Pitfalls

The base rate fallacy and prosecutor's fallacy arise from confusing$P(A \mid B)$ with $P(B \mid A)$. Always apply Bayes' theorem explicitly.

Practice Problems

Problem 1:A medical test for a disease has sensitivity 95% (true positive rate) and specificity 98% (true negative rate). If the disease prevalence is 0.1%, what is the probability a person who tests positive actually has the disease?

Solution:

1. Let $D$ = disease, $T^+$ = positive test. Given: $P(T^+ \mid D) = 0.95$, $P(T^- \mid D^c) = 0.98$, $P(D) = 0.001$.

2. Apply Bayes' theorem: $P(D \mid T^+) = \frac{P(T^+ \mid D)P(D)}{P(T^+)}$.

3. Total probability of positive test: $P(T^+) = P(T^+ \mid D)P(D) + P(T^+ \mid D^c)P(D^c)$.

4. $P(T^+) = 0.95 \times 0.001 + 0.02 \times 0.999 = 0.00095 + 0.01998 = 0.02093$.

5. $P(D \mid T^+) = \frac{0.00095}{0.02093} = 0.0454$ or about 4.5%.

6. Despite the excellent test (95%/98%), only 4.5% of positives are true positives! This is the base rate fallacy: the low prevalence means false positives vastly outnumber true positives. This is why confirmatory testing is essential.

Problem 2:In the Monty Hall problem, you choose door 1, the host opens door 3 (showing a goat). Should you switch to door 2? Prove the answer using Bayes' theorem.

Solution:

1. Prior probabilities: $P(C_i) = 1/3$ for each door $i = 1, 2, 3$ (car behind door $i$).

2. Let $H_3$ = host opens door 3. Likelihoods: $P(H_3 \mid C_1) = 1/2$ (host chooses randomly between doors 2 and 3); $P(H_3 \mid C_2) = 1$ (host must open door 3); $P(H_3 \mid C_3) = 0$ (host never reveals car).

3. $P(H_3) = P(H_3 \mid C_1)(1/3) + P(H_3 \mid C_2)(1/3) + P(H_3 \mid C_3)(1/3) = 1/6 + 1/3 + 0 = 1/2$.

4. Posterior: $P(C_1 \mid H_3) = \frac{(1/2)(1/3)}{1/2} = 1/3$.

5. $P(C_2 \mid H_3) = \frac{(1)(1/3)}{1/2} = 2/3$.

6. Switching to door 2 gives probability $2/3$ of winning vs. $1/3$ for staying. You should always switch. The key insight: the host's action provides information that concentrates the probability of the other two doors onto the remaining one.

Problem 3:In a room of $n$ people, what is the probability that at least two share a birthday? Find the smallest $n$ for which this exceeds 50%.

Solution:

1. It is easier to compute the complement: $P(\text{all different}) = \frac{365}{365} \cdot \frac{364}{365} \cdot \frac{363}{365} \cdots \frac{365-n+1}{365}$.

2. $P(\text{all different}) = \prod_{k=0}^{n-1}\left(1 - \frac{k}{365}\right)$.

3. $P(\text{at least one match}) = 1 - P(\text{all different})$.

4. Using the approximation $\ln(1-x) \approx -x$: $\ln P(\text{all diff}) \approx -\sum_{k=0}^{n-1}\frac{k}{365} = -\frac{n(n-1)}{730}$.

5. Setting $P(\text{match}) = 0.5$: $\frac{n(n-1)}{730} = \ln 2$, so $n^2 \approx 730 \times 0.693 = 505.7$, giving $n \approx 22.5$.

6. The exact answer is $n = 23$: $P(\text{match}) = 50.7\%$. This counterintuitive result occurs because there are $\binom{23}{2} = 253$ pairs to check, and even small pairwise collision probabilities accumulate rapidly.

Problem 4:Events $A$ and $B$ satisfy $P(A) = 0.4$, $P(B) = 0.5$, $P(A \cap B) = 0.2$. Are $A$ and $B$ independent? Find $P(A \mid B)$ and $P(A \cup B)$.

Solution:

1. Test independence: $P(A)P(B) = 0.4 \times 0.5 = 0.2 = P(A \cap B)$.

2. Since $P(A \cap B) = P(A)P(B)$, $A$ and $B$ are independent.

3. $P(A \mid B) = \frac{P(A \cap B)}{P(B)} = \frac{0.2}{0.5} = 0.4 = P(A)$. Consistent with independence: knowing $B$ occurred doesn't change the probability of $A$.

4. $P(A \cup B) = P(A) + P(B) - P(A \cap B) = 0.4 + 0.5 - 0.2 = 0.7$.

5. Also verify: $P(B \mid A) = P(A \cap B)/P(A) = 0.2/0.4 = 0.5 = P(B)$. Symmetric consistency check.

6. Note: independence ($P(A \cap B) = P(A)P(B)$) is different from mutual exclusivity ($P(A \cap B) = 0$). Independent events with positive probabilities are never mutually exclusive, and vice versa.

Problem 5:A factory has three machines producing 30%, 45%, and 25% of output with defect rates 2%, 3%, and 5% respectively. A randomly chosen item is defective. What is the probability it came from machine 3? (Law of total probability + Bayes' theorem)

Solution:

1. Let $M_i$ = item from machine $i$, $D$ = defective. Given: $P(M_1) = 0.30$, $P(M_2) = 0.45$, $P(M_3) = 0.25$.

2. Defect rates: $P(D \mid M_1) = 0.02$, $P(D \mid M_2) = 0.03$, $P(D \mid M_3) = 0.05$.

3. Total probability: $P(D) = \sum_i P(D \mid M_i)P(M_i) = 0.02(0.30) + 0.03(0.45) + 0.05(0.25)$.

4. $P(D) = 0.006 + 0.0135 + 0.0125 = 0.032$ (3.2% overall defect rate).

5. By Bayes' theorem: $P(M_3 \mid D) = \frac{P(D \mid M_3)P(M_3)}{P(D)} = \frac{0.05 \times 0.25}{0.032} = \frac{0.0125}{0.032} = 0.391$.

6. Machine 3 produces only 25% of items but is responsible for 39.1% of defects due to its higher defect rate. Similarly, $P(M_1 \mid D) = 0.188$ and $P(M_2 \mid D) = 0.422$. Verify: $0.188 + 0.422 + 0.391 = 1.001 \approx 1$.

Share:X Reddit LinkedIn