Bank Marketing — Causal Linear Regression

Estimating Treatment Effects with Linear Regression

Abstract

Estimating causal effects from observational data is challenging due to non-random treatment assignment.
In this post, we use a real-world bank marketing dataset to illustrate how linear regression can be used for causal inference under key assumptions such as conditional ignorability and overlap.

We contrast predictive vs causal modeling, highlight common pitfalls (e.g., selection bias and post-treatment controls), and provide practical diagnostics to assess whether regression estimates can be interpreted causally.
The goal is not just estimation, but building intuition for when such estimates are trustworthy.

Bank Marketing Case Study

This analysis applies causal inference concepts to the Bank Marketing dataset to estimate treatment effects in a practical setting.

Causal question

Does contacting customers by cellular instead of telephone increase the probability of subscription?

Treatment (T): contact = cellular (1) vs telephone (0)
Outcome (Y): subscription = yes (1) vs no (0)

Dataset (links + context)

We use the Bank Marketing dataset (Portuguese bank campaigns).

Official source (UCI):
https://archive.ics.uci.edu/dataset/222/bank+marketing

Each row = customer

Features = demographics, history, campaign behavior
Treatment = contact channel
Outcome = subscription

📚 Reference

Facure, Matheus. Causal Inference in Python: Applying Causal Inference in the Tech Industry.

Interpretation Layer

What problem are we solving?

We are estimating the causal effect of a treatment variable on an outcome using regression under causal assumptions.

Predictive vs Causal Regression

Predictive regression asks:

If X changes, how does Y move in historical data?

Causal regression asks:

If we intervene and change Treatment, how does Y change?

This distinction is critical.

Required Assumptions

To interpret regression causally:

No unobserved confounding (Conditional Ignorability)
Correct functional form (or reasonable approximation)
No post-treatment leakage
Sufficient overlap between treated and control populations

Business Interpretation

In marketing / fintech:

Treatment coefficient ≈ incremental lift
Positive → treatment helps
Negative → treatment hurts
Near zero → no incremental value

Key Causal Assumptions Being Used

1. Conditional Ignorability

After controlling for X:

Treatment ⟂ Potential Outcomes

If violated → biased estimates

2. Overlap (Positivity)

Every user must have a non-zero probability of both treatment and control.

If violated:

Extrapolation risk
Unstable estimates

3. No Post-Treatment Controls

Do NOT include variables affected by treatment.

This creates:

Collider bias
Underestimation of treatment effect

Diagnostics — Causal Meaning

Residual Diagnostics

Random residuals → model specification reasonable
Patterned residuals → possible nonlinearity or missing confounders

Coefficient Stability

Large swings across specs → weak identification or collinearity

Overlap Checks

If treated/control covariates differ heavily:

Model extrapolates → causal estimate becomes fragile

Linear regression can estimate causal effects—but only when assumptions hold.
The real skill is not fitting the model, but validating those assumptions.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
import os

np.random.seed(42)
FIG_DIR = "figures_ch4_bank_marketing"
os.makedirs(FIG_DIR, exist_ok=True)

!pip install ucimlrepo

Requirement already satisfied: ucimlrepo in c:\users\revan\minicondanew\lib\site-packages (0.0.7)
Requirement already satisfied: pandas>=1.0.0 in c:\users\revan\minicondanew\lib\site-packages (from ucimlrepo) (2.3.3)
Requirement already satisfied: certifi>=2020.12.5 in c:\users\revan\minicondanew\lib\site-packages (from ucimlrepo) (2025.11.12)
Requirement already satisfied: numpy>=1.26.0 in c:\users\revan\minicondanew\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2.3.5)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\revan\minicondanew\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in c:\users\revan\minicondanew\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in c:\users\revan\minicondanew\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2025.2)
Requirement already satisfied: six>=1.5 in c:\users\revan\minicondanew\lib\site-packages (from python-dateutil>=2.8.2->pandas>=1.0.0->ucimlrepo) (1.17.0)

Load public Bank Marketing data (UCI)

from ucimlrepo import fetch_ucirepo

bank = fetch_ucirepo(id=222)
df = pd.concat([bank.data.features, bank.data.targets], axis=1)

df["Y"] = (df["y"].astype(str).str.lower() == "yes").astype(int)
df["contact"] = df["contact"].astype(str).str.lower()
df = df[df["contact"].isin(["cellular","telephone"])].copy()
df["T"] = (df["contact"]=="cellular").astype(int)

df.head()

	age	job	marital	education	default	balance	housing	loan	contact	day_of_week	month	duration	campaign	pdays	poutcome	y	T
12657	27	management	single	secondary	no	35	no	no	cellular	4	jul	255	1	-1	NaN	no	1
12658	54	blue-collar	married	primary	no	466	no	no	cellular	4	jul	297	1	-1	NaN	no	1
12659	43	blue-collar	married	secondary	no	105	no	yes	cellular	4	jul	668	2	-1	NaN	no	1
12660	31	technician	single	secondary	no	19	no	no	telephone	4	jul	65	2	-1	NaN	no	0
12661	27	technician	single	secondary	no	126	yes	yes	cellular	4	jul	436	4	-1	NaN	no	1

Naive regression (difference in means)

📊 How to Interpret the Treatment Coefficient

Treatment Coefficient ≈ Average Treatment Effect (ATE) if assumptions hold

Example: Coefficient = 0.12

Interpretation: If treatment is applied, outcome increases by ~0.12 units on average, holding confounders constant.

Business Translation: Expected incremental lift per treated user ≈ coefficient value

m_naive = smf.ols("Y ~ T", data=df).fit()
m_naive.summary().tables[1]

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	0.1342	0.007	20.384	0.000	0.121	0.147
T	0.0150	0.007	2.171	0.030	0.001	0.029

Adjusted regression with month fixed effects

📊 How to Interpret the Treatment Coefficient

Treatment Coefficient ≈ Average Treatment Effect (ATE) if assumptions hold

Example: Coefficient = 0.12

Interpretation: If treatment is applied, outcome increases by ~0.12 units on average, holding confounders constant.

Business Translation: Expected incremental lift per treated user ≈ coefficient value

num_controls = ["age","balance","campaign","pdays","previous","day"]
num_controls = [c for c in num_controls if c in df.columns]

cat_controls = ["job","marital","education","housing","loan","month","poutcome"]
cat_controls = [c for c in cat_controls if c in df.columns]

formula = "Y ~ T"
for c in num_controls:
    formula += f" + {c}"
for c in cat_controls:
    formula += f" + C({c})"

m_adj = smf.ols(formula, data=df).fit()
m_adj.summary().tables[1]

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	0.0799	0.040	2.003	0.045	0.002	0.158
C(job)[T.blue-collar]	-0.0197	0.015	-1.313	0.189	-0.049	0.010
C(job)[T.entrepreneur]	-0.0317	0.027	-1.169	0.243	-0.085	0.022
C(job)[T.housemaid]	-0.0374	0.032	-1.178	0.239	-0.100	0.025
C(job)[T.management]	0.0140	0.016	0.865	0.387	-0.018	0.046
C(job)[T.retired]	0.0218	0.024	0.928	0.354	-0.024	0.068
C(job)[T.self-employed]	0.0060	0.025	0.239	0.811	-0.043	0.055
C(job)[T.services]	0.0029	0.017	0.167	0.867	-0.031	0.037
C(job)[T.student]	0.0855	0.027	3.213	0.001	0.033	0.138
C(job)[T.technician]	-0.0068	0.015	-0.458	0.647	-0.036	0.022
C(job)[T.unemployed]	0.0746	0.027	2.773	0.006	0.022	0.127
C(marital)[T.married]	0.0178	0.013	1.355	0.175	-0.008	0.043
C(marital)[T.single]	0.0261	0.015	1.744	0.081	-0.003	0.055
C(education)[T.secondary]	0.0108	0.014	0.798	0.425	-0.016	0.037
C(education)[T.tertiary]	0.0287	0.017	1.727	0.084	-0.004	0.061
C(housing)[T.yes]	-0.1011	0.010	-10.009	0.000	-0.121	-0.081
C(loan)[T.yes]	-0.0382	0.012	-3.237	0.001	-0.061	-0.015
C(month)[T.aug]	0.1050	0.020	5.240	0.000	0.066	0.144
C(month)[T.dec]	0.1243	0.036	3.495	0.000	0.055	0.194
C(month)[T.feb]	-0.0059	0.016	-0.359	0.720	-0.038	0.026
C(month)[T.jan]	-0.0727	0.020	-3.674	0.000	-0.112	-0.034
C(month)[T.jul]	0.1828	0.027	6.877	0.000	0.131	0.235
C(month)[T.jun]	0.1547	0.024	6.543	0.000	0.108	0.201
C(month)[T.mar]	0.2091	0.030	6.869	0.000	0.149	0.269
C(month)[T.may]	-0.0257	0.013	-1.969	0.049	-0.051	-0.000
C(month)[T.nov]	-0.0323	0.016	-2.044	0.041	-0.063	-0.001
C(month)[T.oct]	0.1450	0.023	6.182	0.000	0.099	0.191
C(month)[T.sep]	0.2023	0.024	8.306	0.000	0.155	0.250
C(poutcome)[T.other]	0.0334	0.010	3.318	0.001	0.014	0.053
C(poutcome)[T.success]	0.4063	0.012	34.528	0.000	0.383	0.429
T	0.0414	0.016	2.639	0.008	0.011	0.072
age	0.0008	0.000	1.569	0.117	-0.000	0.002
balance	3.072e-06	1.32e-06	2.320	0.020	4.76e-07	5.67e-06
campaign	-0.0144	0.003	-5.478	0.000	-0.020	-0.009
pdays	0.0001	4.29e-05	3.052	0.002	4.69e-05	0.000
previous	0.0018	0.001	2.059	0.040	8.62e-05	0.004

Frisch–Waugh–Lovell (FWL) theorem

📊 How to Interpret the Treatment Coefficient

Treatment Coefficient ≈ Average Treatment Effect (ATE) if assumptions hold

Example: Coefficient = 0.12

Interpretation: If treatment is applied, outcome increases by ~0.12 units on average, holding confounders constant.

Business Translation: Expected incremental lift per treated user ≈ coefficient value

🧪 Diagnostics — Causal Meaning

Residual Diagnostics:

Random residuals → Model specification reasonable
Patterned residuals → Possible nonlinearity or missing confounder

Coefficient Stability:

Large swings across specs → weak identification or collinearity

Overlap Checks: If treated and control covariate distributions differ heavily: → Model extrapolates → causal estimate fragile

f_T = "T ~ age + balance + C(month)"
f_Y = "Y ~ age + balance + C(month)"

mT = smf.ols(f_T, data=df).fit()
mY = smf.ols(f_Y, data=df).fit()

df["T_res"] = mT.resid
df["Y_res"] = mY.resid

m_fwl = smf.ols("Y_res ~ T_res", data=df).fit()
m_fwl.summary().tables[1]

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	7.752e-15	0.002	4.09e-12	1.000	-0.004	0.004
T_res	0.0348	0.007	5.130	0.000	0.022	0.048

Heterogeneous effects (interaction with month)

📊 How to Interpret the Treatment Coefficient

Treatment Coefficient ≈ Average Treatment Effect (ATE) if assumptions hold

Example: Coefficient = 0.12

Interpretation: If treatment is applied, outcome increases by ~0.12 units on average, holding confounders constant.

Business Translation: Expected incremental lift per treated user ≈ coefficient value

m_inter = smf.ols("Y ~ T*C(month) + age + balance", data=df).fit()
m_inter.summary().tables[1]

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	0.1503	0.026	5.840	0.000	0.100	0.201
C(month)[T.aug]	-0.0609	0.032	-1.880	0.060	-0.124	0.003
C(month)[T.dec]	0.1768	0.061	2.907	0.004	0.058	0.296
C(month)[T.feb]	-0.0500	0.032	-1.579	0.114	-0.112	0.012
C(month)[T.jan]	-0.0935	0.038	-2.433	0.015	-0.169	-0.018
C(month)[T.jul]	-0.1160	0.027	-4.333	0.000	-0.168	-0.064
C(month)[T.jun]	0.0530	0.045	1.179	0.239	-0.035	0.141
C(month)[T.mar]	0.2151	0.053	4.096	0.000	0.112	0.318
C(month)[T.may]	-0.1171	0.029	-4.057	0.000	-0.174	-0.061
C(month)[T.nov]	-0.0911	0.030	-3.063	0.002	-0.149	-0.033
C(month)[T.oct]	0.2386	0.038	6.242	0.000	0.164	0.314
C(month)[T.sep]	0.1794	0.048	3.713	0.000	0.085	0.274
T	0.0127	0.025	0.508	0.612	-0.036	0.062
T:C(month)[T.aug]	-0.0275	0.033	-0.826	0.409	-0.093	0.038
T:C(month)[T.dec]	0.1171	0.066	1.764	0.078	-0.013	0.247
T:C(month)[T.feb]	0.0232	0.033	0.703	0.482	-0.042	0.088
T:C(month)[T.jan]	0.0015	0.040	0.036	0.971	-0.077	0.080
T:C(month)[T.jul]	0.0183	0.028	0.656	0.512	-0.036	0.073
T:C(month)[T.jun]	0.1884	0.047	3.995	0.000	0.096	0.281
T:C(month)[T.mar]	0.1182	0.055	2.130	0.033	0.009	0.227
T:C(month)[T.may]	0.0429	0.030	1.431	0.152	-0.016	0.102
T:C(month)[T.nov]	-0.0110	0.031	-0.355	0.723	-0.072	0.050
T:C(month)[T.oct]	0.0058	0.041	0.141	0.888	-0.075	0.087
T:C(month)[T.sep]	0.1375	0.051	2.684	0.007	0.037	0.238
age	0.0007	0.000	3.828	0.000	0.000	0.001
balance	4.007e-06	6.04e-07	6.631	0.000	2.82e-06	5.19e-06

🧠 What the Results Actually Show

Across specifications, the estimated treatment effect remains positive but modest:

Naive regression: ~0.015
Fully adjusted model: ~0.04
FWL residual regression: ~0.035–0.04
Interaction model: heterogeneous but directionally consistent

👉 This stability is important.

It suggests that:

The initial bias in naive estimates is not extreme
Adding controls and fixed effects refines rather than overturns the result
The treatment effect is real but relatively small

Additionally, the increase from naive to adjusted estimates suggests the presence of negative selection bias in treatment assignment.

🔑 Key Takeaways

Regression = adjusted comparison, not causality by default
Controls matter → effect size increases after adjustment (bias correction)
Month fixed effects matter → seasonality was confounding the estimate
FWL confirms equivalence → partialling out yields consistent estimates
Heterogeneity exists → treatment effectiveness varies across months

📊 Business Interpretation

Estimated lift ≈ 3–4 percentage points
Effect is statistically significant but economically modest
Not a “silver bullet” channel — but directionally beneficial

👉 Translation:
Cellular contact improves conversion, but its impact depends on timing and context.

⚠️ When Should You Trust This?

Linear regression works well here because:

✅ Large sample size
✅ Reasonable overlap between treated and control groups
✅ Rich covariates capturing key confounders
✅ Effects approximately linear

❌ When This Would Break

Be cautious if:

Hidden confounders exist (e.g., targeting based on unobserved intent)
Strong nonlinear effects dominate
Treatment assignment is highly selective
Post-treatment variables are included as controls

🔗 Where This Fits in the Bigger Picture

Linear regression is the starting point for causal inference:

If assumptions hold → simple, interpretable, and fast
If assumptions weaken → move to:
- Inverse Propensity Weighting (IPW)
- Doubly Robust methods
- Meta-learners (S / T / X learners)
- Causal forests

🎯 Final Takeaway

Linear regression can recover meaningful causal effects in observational marketing data —
but its reliability comes not from the model itself, but from:

the validity of assumptions and the consistency of estimates across specifications.

Written on December 22, 2025