Potential Outcomes, Association vs Causation – A Minimal Demo in Python

Abstract

This analysis builds intuition for one of the most fundamental ideas in causal inference: why association is not causation.

Using a simple synthetic example, we illustrate how naive comparisons between treated and untreated groups can be misleading due to selection bias, and how the potential outcomes framework clarifies what we can and cannot learn from data.

Beyond the mechanics, the focus is on understanding:

What causal effects actually mean
Why they are not directly observable
How assumptions like randomization enable valid estimation

The goal is to answer a practical question:

When we observe differences in outcomes, when can we interpret them as causal effects?

Introduction

In many real-world problems, we observe correlations and are tempted to interpret them as causal relationships. However, without careful reasoning, these conclusions are often incorrect.

The core challenge is that for any individual unit, we can never observe both:

the outcome with treatment, and
the outcome without treatment

This missing data problem is at the heart of causal inference.

The potential outcomes framework provides a clean way to reason about this:

It defines causal effects at the unit level
Separates what is observable from what is not
Makes explicit the assumptions required for valid inference

In practice, mistakes arise not from computing metrics, but from misinterpreting comparisons:

Confusing correlation with causation
Ignoring selection bias
Failing to recognize required assumptions

What this analysis demonstrates

We build a minimal synthetic example to illustrate the key ideas:

Potential outcomes: each unit has two outcomes –

Y0 → outcome if NOT treated (control)
Y1 → outcome if treated

Realized outcome as a switch:
If Tᵢ = 0 → Yᵢ = Y₀ᵢ (control outcome) If Tᵢ = 1 → Yᵢ = Y₁ᵢ (treated outcome)

General form (switching equation): Yᵢ = (1 − Tᵢ) × Y₀ᵢ + Tᵢ × Y₁ᵢ

Selection bias: when treated and untreated units differ for reasons beyond treatment,
E[Y₀ | T = 1] ≠ E[Y₀ | T = 0],
so simple comparisons are biased.
Randomization / ignorability: when treatment is independent of potential outcomes,
(Y₀, Y₁) independent of T,
then True causal effect: E[Y₁ − Y₀] = E[Y | T = 1] − E[Y | T = 0].
SUTVA (Stable Unit Treatment Value Assumption):
- One unit’s treatment does not affect another’s outcome.
- There is only one “version” of treatment.
Assumptions are essential: causal inference always relies on assumptions to connect causal quantities to estimators.

Reference

This analysis builds on standard concepts in causal inference, particularly the potential outcomes framework popularized in:

Matheus Facure — Causal Inference for the Brave and True

2. Simulating potential outcomes

We start by simulating potential outcomes for each unit.

Think of the two variables as:

Y0 → conversion probability if the customer does not receive an email
Y1 → conversion probability if the customer does receive an email

Only one of these outcomes is observed for each customer — depending on whether they were treated.

True Average Treatment Effect (ATE)

The true causal effect is the average difference between treated and untreated potential outcomes:

ATE = mean(Y1 − Y0)

Or in full words:

The ATE tells us how much conversion would increase on average if all customers received the email.

#!pip install pandas

The number printed above is the ground-truth causal effect in this simulated world.

We never get Y0 and Y1 for the same individual. One of them always remains unobserved (the counterfactual).

3. Biased treatment assignment (selection bias)

Now let’s simulate a biased treatment assignment rule.

Suppose the marketing team tends to send the email to customers who would already have a higher chance of converting. In our toy world, we let treatment depend on Y1:

Customers with larger Y1 are more likely to get treated.

This creates a dependence between treatment and potential outcomes, so
(Y0, Y1) are NOT independent of T.

import numpy as np
import pandas as pd

np.random.seed(42)

pd.set_option("display.precision", 4)

The line

# number of units (e.g., customers)
N = 2000

# potential outcomes: Y0 = no treatment, Y1 = treatment
# here: think of them as probabilities of conversion
Y0 = np.random.normal(loc=0.50, scale=0.08, size=N)  # baseline
Y1 = np.random.normal(loc=0.65, scale=0.08, size=N)  # better with treatment

df = pd.DataFrame({"Y0": Y0, "Y1": Y1})

true_ate = (df["Y1"] - df["Y0"]).mean()
true_ate

np.float64(0.14587786814791084)

implements exactly the switch function:

Yᵢ = (1 − Tᵢ) × Y₀ᵢ + Tᵢ × Y₁ᵢ

4. Naive association vs true causal effect

Let’s compare:

The naive difference in observed outcomes between treated and untreated:
average(Y | T = 1) − average(Y | T = 0)
The true ATE that we know from the simulated potential outcomes:
average(Y1 − Y0)

# biased treatment: higher Y1 -> more likely to be treated
T = (df["Y1"] > np.quantile(df["Y1"], 0.5)).astype(int)
df["T"] = T

# realized outcome using the switch function
df["Y"] = (1 - df["T"]) * df["Y0"] + df["T"] * df["Y1"]

df.head()

	Y0	Y1	Y
0	0.5397	0.5960	0.5397
1	0.4889	0.6384	0.4889
2	0.5518	0.5866	0.5518
3	0.6218	0.6254	0.6218
4	0.4813	0.4985	0.4813

Because treatment was not randomized, treated customers have different potential outcomes than controls. Formally,average(Y0 | T = 1) != average(Y0 | T = 0).

So the naive comparison average(Y | T = 1) − average(Y | T = 0) is a biased estimator of the true causal effect.

5. Randomized experiment (association ~= causation)

Now we simulate a proper randomized experiment.

We keep the same potential outcomes (Y0, Y1), but assign treatment at random:

Random treatment assignment: T_rand ~ Bernoulli(p = 0.5)

This makes treatment independent of potential outcomes:

(Y0, Y1) are independent of T_rand

Under this condition, the treated and control groups are comparable, and

average(Y1 − Y0) = average(Y

T_rand = 1) − average(Y

T_rand = 0)

treated_mean = df.loc[df["T"] == 1, "Y"].mean()
control_mean = df.loc[df["T"] == 0, "Y"].mean()
naive_diff = treated_mean - control_mean

print(f"True ATE:              {true_ate: .4f}")
print(f"Naive treated - control: {naive_diff: .4f}")

True ATE:               0.1459
Naive treated - control:  0.2097

Now the difference in average outcomes between treated and control units is very close to the true ATE. Randomization made the groups comparable, so association now equals causation in this setup.

6. SUTVA and other key assumptions

To make sense of this framework, we quietly relied on some important assumptions:

SUTVA (Stable Unit Treatment Value Assumption)
- No interference: one unit’s treatment does not affect another unit’s outcome.
  E.g., sending an email to Customer A does not change what happens to Customer B.
- No hidden versions of treatment: “treatment” is well-defined and consistent.
Ignorability / Independence
- In the randomized case, treatment is independent of potential outcomes:
  (Y0, Y1).
  This is what makes simple differences in averages unbiased.
Consistency
- If a unit receives treatment level t, the observed outcome equals the corresponding potential outcome:
  If Tᵢ = t, then Yᵢ = Yᵢ(t).

These assumptions are what allow us to go from the causal quantity we care about
(e.g., average(Y1 − Y0)) to a statistical estimator based on observed data
(e.g., the average difference between treated and control).

7. Recap

In this small synthetic example, we:

Defined potential outcomes Y0 and Y1 for each unit
Used the switch function Y_i = (1 − T_i) × Y0_i + T_i × Y1_i to build observed outcomes
Saw how biased treatment assignment (non-random) leads to a biased estimate of the treatment effect
Showed that with a randomized experiment, the simple difference in means recovers the true ATE
Made explicit the role of assumptions (SUTVA, independence, consistency) in causal inference

This example captures the core intuition behind causal inference:
we are trying to estimate quantities that are fundamentally unobservable at the unit level, and can only be recovered under the right assumptions.

In practice, this distinction is critical in applied settings such as marketing and product analytics.
Observed differences in outcomes (e.g., conversion rates) are often driven by selection effects, not true causal impact.

Without careful experimental design or valid assumptions, models can confidently learn patterns that are fundamentally non-causal.

This is why randomized experiments—and more generally, causal inference frameworks—are essential for making reliable decisions.

At its core, causal inference is not about better models—it is about asking the right questions under the right assumptions.

# randomized treatment
df["T_rand"] = np.random.binomial(1, 0.5, size=N)

# realized outcome under randomized assignment
df["Y_rand"] = (1 - df["T_rand"]) * df["Y0"] + df["T_rand"] * df["Y1"]

treated_mean_rand = df.loc[df["T_rand"] == 1, "Y_rand"].mean()

control_mean_rand = df.loc[df["T_rand"] == 0, "Y_rand"].mean()

rand_diff = treated_mean_rand - control_mean_rand

print(f"True ATE:                    {true_ate: .4f}")
print(f"Randomized treated - control: {rand_diff: .4f}")

True ATE:                     0.1459
Randomized treated - control:  0.1443

Written on November 23, 2025