Bank Marketing — Causal Linear Regression
Estimating Treatment Effects with Linear Regression
Abstract
Estimating causal effects from observational data is challenging due to non-random treatment assignment.
In this post, we use a real-world bank marketing dataset to illustrate how linear regression can be used for causal inference under key assumptions such as conditional ignorability and overlap.
We contrast predictive vs causal modeling, highlight common pitfalls (e.g., selection bias and post-treatment controls), and provide practical diagnostics to assess whether regression estimates can be interpreted causally.
The goal is not just estimation, but building intuition for when such estimates are trustworthy.
Bank Marketing Case Study
This analysis applies causal inference concepts to the Bank Marketing dataset to estimate treatment effects in a practical setting.
Causal question
Does contacting customers by cellular instead of telephone increase the probability of subscription?
- Treatment (T): contact = cellular (1) vs telephone (0)
- Outcome (Y): subscription = yes (1) vs no (0)
Dataset (links + context)
We use the Bank Marketing dataset (Portuguese bank campaigns).
- Official source (UCI):
https://archive.ics.uci.edu/dataset/222/bank+marketing
Each row = customer
- Features = demographics, history, campaign behavior
- Treatment = contact channel
- Outcome = subscription
📚 Reference
Facure, Matheus. Causal Inference in Python: Applying Causal Inference in the Tech Industry.
Interpretation Layer
What problem are we solving?
We are estimating the causal effect of a treatment variable on an outcome using regression under causal assumptions.
Predictive vs Causal Regression
Predictive regression asks:
If X changes, how does Y move in historical data?
Causal regression asks:
If we intervene and change Treatment, how does Y change?
This distinction is critical.
Required Assumptions
To interpret regression causally:
- No unobserved confounding (Conditional Ignorability)
- Correct functional form (or reasonable approximation)
- No post-treatment leakage
- Sufficient overlap between treated and control populations
Business Interpretation
In marketing / fintech:
- Treatment coefficient ≈ incremental lift
- Positive → treatment helps
- Negative → treatment hurts
- Near zero → no incremental value
Key Causal Assumptions Being Used
1. Conditional Ignorability
After controlling for X:
Treatment ⟂ Potential Outcomes
If violated → biased estimates
2. Overlap (Positivity)
Every user must have a non-zero probability of both treatment and control.
If violated:
- Extrapolation risk
- Unstable estimates
3. No Post-Treatment Controls
Do NOT include variables affected by treatment.
This creates:
- Collider bias
- Underestimation of treatment effect
Diagnostics — Causal Meaning
Residual Diagnostics
- Random residuals → model specification reasonable
- Patterned residuals → possible nonlinearity or missing confounders
Coefficient Stability
- Large swings across specs → weak identification or collinearity
Overlap Checks
If treated/control covariates differ heavily:
Model extrapolates → causal estimate becomes fragile
Linear regression can estimate causal effects—but only when assumptions hold.
The real skill is not fitting the model, but validating those assumptions.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
import os
np.random.seed(42)
FIG_DIR = "figures_ch4_bank_marketing"
os.makedirs(FIG_DIR, exist_ok=True)
!pip install ucimlrepo
Requirement already satisfied: ucimlrepo in c:\users\revan\minicondanew\lib\site-packages (0.0.7)
Requirement already satisfied: pandas>=1.0.0 in c:\users\revan\minicondanew\lib\site-packages (from ucimlrepo) (2.3.3)
Requirement already satisfied: certifi>=2020.12.5 in c:\users\revan\minicondanew\lib\site-packages (from ucimlrepo) (2025.11.12)
Requirement already satisfied: numpy>=1.26.0 in c:\users\revan\minicondanew\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2.3.5)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\revan\minicondanew\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in c:\users\revan\minicondanew\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in c:\users\revan\minicondanew\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2025.2)
Requirement already satisfied: six>=1.5 in c:\users\revan\minicondanew\lib\site-packages (from python-dateutil>=2.8.2->pandas>=1.0.0->ucimlrepo) (1.17.0)
Load public Bank Marketing data (UCI)
from ucimlrepo import fetch_ucirepo
bank = fetch_ucirepo(id=222)
df = pd.concat([bank.data.features, bank.data.targets], axis=1)
df["Y"] = (df["y"].astype(str).str.lower() == "yes").astype(int)
df["contact"] = df["contact"].astype(str).str.lower()
df = df[df["contact"].isin(["cellular","telephone"])].copy()
df["T"] = (df["contact"]=="cellular").astype(int)
df.head()
| age | job | marital | education | default | balance | housing | loan | contact | day_of_week | month | duration | campaign | pdays | previous | poutcome | y | Y | T | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 12657 | 27 | management | single | secondary | no | 35 | no | no | cellular | 4 | jul | 255 | 1 | -1 | 0 | NaN | no | 0 | 1 |
| 12658 | 54 | blue-collar | married | primary | no | 466 | no | no | cellular | 4 | jul | 297 | 1 | -1 | 0 | NaN | no | 0 | 1 |
| 12659 | 43 | blue-collar | married | secondary | no | 105 | no | yes | cellular | 4 | jul | 668 | 2 | -1 | 0 | NaN | no | 0 | 1 |
| 12660 | 31 | technician | single | secondary | no | 19 | no | no | telephone | 4 | jul | 65 | 2 | -1 | 0 | NaN | no | 0 | 0 |
| 12661 | 27 | technician | single | secondary | no | 126 | yes | yes | cellular | 4 | jul | 436 | 4 | -1 | 0 | NaN | no | 0 | 1 |
Naive regression (difference in means)
📊 How to Interpret the Treatment Coefficient
Treatment Coefficient ≈ Average Treatment Effect (ATE) if assumptions hold
Example: Coefficient = 0.12
Interpretation: If treatment is applied, outcome increases by ~0.12 units on average, holding confounders constant.
Business Translation: Expected incremental lift per treated user ≈ coefficient value
m_naive = smf.ols("Y ~ T", data=df).fit()
m_naive.summary().tables[1]
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| Intercept | 0.1342 | 0.007 | 20.384 | 0.000 | 0.121 | 0.147 |
| T | 0.0150 | 0.007 | 2.171 | 0.030 | 0.001 | 0.029 |
Adjusted regression with month fixed effects
📊 How to Interpret the Treatment Coefficient
Treatment Coefficient ≈ Average Treatment Effect (ATE) if assumptions hold
Example: Coefficient = 0.12
Interpretation: If treatment is applied, outcome increases by ~0.12 units on average, holding confounders constant.
Business Translation: Expected incremental lift per treated user ≈ coefficient value
num_controls = ["age","balance","campaign","pdays","previous","day"]
num_controls = [c for c in num_controls if c in df.columns]
cat_controls = ["job","marital","education","housing","loan","month","poutcome"]
cat_controls = [c for c in cat_controls if c in df.columns]
formula = "Y ~ T"
for c in num_controls:
formula += f" + {c}"
for c in cat_controls:
formula += f" + C({c})"
m_adj = smf.ols(formula, data=df).fit()
m_adj.summary().tables[1]
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| Intercept | 0.0799 | 0.040 | 2.003 | 0.045 | 0.002 | 0.158 |
| C(job)[T.blue-collar] | -0.0197 | 0.015 | -1.313 | 0.189 | -0.049 | 0.010 |
| C(job)[T.entrepreneur] | -0.0317 | 0.027 | -1.169 | 0.243 | -0.085 | 0.022 |
| C(job)[T.housemaid] | -0.0374 | 0.032 | -1.178 | 0.239 | -0.100 | 0.025 |
| C(job)[T.management] | 0.0140 | 0.016 | 0.865 | 0.387 | -0.018 | 0.046 |
| C(job)[T.retired] | 0.0218 | 0.024 | 0.928 | 0.354 | -0.024 | 0.068 |
| C(job)[T.self-employed] | 0.0060 | 0.025 | 0.239 | 0.811 | -0.043 | 0.055 |
| C(job)[T.services] | 0.0029 | 0.017 | 0.167 | 0.867 | -0.031 | 0.037 |
| C(job)[T.student] | 0.0855 | 0.027 | 3.213 | 0.001 | 0.033 | 0.138 |
| C(job)[T.technician] | -0.0068 | 0.015 | -0.458 | 0.647 | -0.036 | 0.022 |
| C(job)[T.unemployed] | 0.0746 | 0.027 | 2.773 | 0.006 | 0.022 | 0.127 |
| C(marital)[T.married] | 0.0178 | 0.013 | 1.355 | 0.175 | -0.008 | 0.043 |
| C(marital)[T.single] | 0.0261 | 0.015 | 1.744 | 0.081 | -0.003 | 0.055 |
| C(education)[T.secondary] | 0.0108 | 0.014 | 0.798 | 0.425 | -0.016 | 0.037 |
| C(education)[T.tertiary] | 0.0287 | 0.017 | 1.727 | 0.084 | -0.004 | 0.061 |
| C(housing)[T.yes] | -0.1011 | 0.010 | -10.009 | 0.000 | -0.121 | -0.081 |
| C(loan)[T.yes] | -0.0382 | 0.012 | -3.237 | 0.001 | -0.061 | -0.015 |
| C(month)[T.aug] | 0.1050 | 0.020 | 5.240 | 0.000 | 0.066 | 0.144 |
| C(month)[T.dec] | 0.1243 | 0.036 | 3.495 | 0.000 | 0.055 | 0.194 |
| C(month)[T.feb] | -0.0059 | 0.016 | -0.359 | 0.720 | -0.038 | 0.026 |
| C(month)[T.jan] | -0.0727 | 0.020 | -3.674 | 0.000 | -0.112 | -0.034 |
| C(month)[T.jul] | 0.1828 | 0.027 | 6.877 | 0.000 | 0.131 | 0.235 |
| C(month)[T.jun] | 0.1547 | 0.024 | 6.543 | 0.000 | 0.108 | 0.201 |
| C(month)[T.mar] | 0.2091 | 0.030 | 6.869 | 0.000 | 0.149 | 0.269 |
| C(month)[T.may] | -0.0257 | 0.013 | -1.969 | 0.049 | -0.051 | -0.000 |
| C(month)[T.nov] | -0.0323 | 0.016 | -2.044 | 0.041 | -0.063 | -0.001 |
| C(month)[T.oct] | 0.1450 | 0.023 | 6.182 | 0.000 | 0.099 | 0.191 |
| C(month)[T.sep] | 0.2023 | 0.024 | 8.306 | 0.000 | 0.155 | 0.250 |
| C(poutcome)[T.other] | 0.0334 | 0.010 | 3.318 | 0.001 | 0.014 | 0.053 |
| C(poutcome)[T.success] | 0.4063 | 0.012 | 34.528 | 0.000 | 0.383 | 0.429 |
| T | 0.0414 | 0.016 | 2.639 | 0.008 | 0.011 | 0.072 |
| age | 0.0008 | 0.000 | 1.569 | 0.117 | -0.000 | 0.002 |
| balance | 3.072e-06 | 1.32e-06 | 2.320 | 0.020 | 4.76e-07 | 5.67e-06 |
| campaign | -0.0144 | 0.003 | -5.478 | 0.000 | -0.020 | -0.009 |
| pdays | 0.0001 | 4.29e-05 | 3.052 | 0.002 | 4.69e-05 | 0.000 |
| previous | 0.0018 | 0.001 | 2.059 | 0.040 | 8.62e-05 | 0.004 |
Frisch–Waugh–Lovell (FWL) theorem
📊 How to Interpret the Treatment Coefficient
Treatment Coefficient ≈ Average Treatment Effect (ATE) if assumptions hold
Example: Coefficient = 0.12
Interpretation: If treatment is applied, outcome increases by ~0.12 units on average, holding confounders constant.
Business Translation: Expected incremental lift per treated user ≈ coefficient value
🧪 Diagnostics — Causal Meaning
Residual Diagnostics:
- Random residuals → Model specification reasonable
- Patterned residuals → Possible nonlinearity or missing confounder
Coefficient Stability:
- Large swings across specs → weak identification or collinearity
Overlap Checks: If treated and control covariate distributions differ heavily: → Model extrapolates → causal estimate fragile
f_T = "T ~ age + balance + C(month)"
f_Y = "Y ~ age + balance + C(month)"
mT = smf.ols(f_T, data=df).fit()
mY = smf.ols(f_Y, data=df).fit()
df["T_res"] = mT.resid
df["Y_res"] = mY.resid
m_fwl = smf.ols("Y_res ~ T_res", data=df).fit()
m_fwl.summary().tables[1]
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| Intercept | 7.752e-15 | 0.002 | 4.09e-12 | 1.000 | -0.004 | 0.004 |
| T_res | 0.0348 | 0.007 | 5.130 | 0.000 | 0.022 | 0.048 |
Heterogeneous effects (interaction with month)
📊 How to Interpret the Treatment Coefficient
Treatment Coefficient ≈ Average Treatment Effect (ATE) if assumptions hold
Example: Coefficient = 0.12
Interpretation: If treatment is applied, outcome increases by ~0.12 units on average, holding confounders constant.
Business Translation: Expected incremental lift per treated user ≈ coefficient value
m_inter = smf.ols("Y ~ T*C(month) + age + balance", data=df).fit()
m_inter.summary().tables[1]
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| Intercept | 0.1503 | 0.026 | 5.840 | 0.000 | 0.100 | 0.201 |
| C(month)[T.aug] | -0.0609 | 0.032 | -1.880 | 0.060 | -0.124 | 0.003 |
| C(month)[T.dec] | 0.1768 | 0.061 | 2.907 | 0.004 | 0.058 | 0.296 |
| C(month)[T.feb] | -0.0500 | 0.032 | -1.579 | 0.114 | -0.112 | 0.012 |
| C(month)[T.jan] | -0.0935 | 0.038 | -2.433 | 0.015 | -0.169 | -0.018 |
| C(month)[T.jul] | -0.1160 | 0.027 | -4.333 | 0.000 | -0.168 | -0.064 |
| C(month)[T.jun] | 0.0530 | 0.045 | 1.179 | 0.239 | -0.035 | 0.141 |
| C(month)[T.mar] | 0.2151 | 0.053 | 4.096 | 0.000 | 0.112 | 0.318 |
| C(month)[T.may] | -0.1171 | 0.029 | -4.057 | 0.000 | -0.174 | -0.061 |
| C(month)[T.nov] | -0.0911 | 0.030 | -3.063 | 0.002 | -0.149 | -0.033 |
| C(month)[T.oct] | 0.2386 | 0.038 | 6.242 | 0.000 | 0.164 | 0.314 |
| C(month)[T.sep] | 0.1794 | 0.048 | 3.713 | 0.000 | 0.085 | 0.274 |
| T | 0.0127 | 0.025 | 0.508 | 0.612 | -0.036 | 0.062 |
| T:C(month)[T.aug] | -0.0275 | 0.033 | -0.826 | 0.409 | -0.093 | 0.038 |
| T:C(month)[T.dec] | 0.1171 | 0.066 | 1.764 | 0.078 | -0.013 | 0.247 |
| T:C(month)[T.feb] | 0.0232 | 0.033 | 0.703 | 0.482 | -0.042 | 0.088 |
| T:C(month)[T.jan] | 0.0015 | 0.040 | 0.036 | 0.971 | -0.077 | 0.080 |
| T:C(month)[T.jul] | 0.0183 | 0.028 | 0.656 | 0.512 | -0.036 | 0.073 |
| T:C(month)[T.jun] | 0.1884 | 0.047 | 3.995 | 0.000 | 0.096 | 0.281 |
| T:C(month)[T.mar] | 0.1182 | 0.055 | 2.130 | 0.033 | 0.009 | 0.227 |
| T:C(month)[T.may] | 0.0429 | 0.030 | 1.431 | 0.152 | -0.016 | 0.102 |
| T:C(month)[T.nov] | -0.0110 | 0.031 | -0.355 | 0.723 | -0.072 | 0.050 |
| T:C(month)[T.oct] | 0.0058 | 0.041 | 0.141 | 0.888 | -0.075 | 0.087 |
| T:C(month)[T.sep] | 0.1375 | 0.051 | 2.684 | 0.007 | 0.037 | 0.238 |
| age | 0.0007 | 0.000 | 3.828 | 0.000 | 0.000 | 0.001 |
| balance | 4.007e-06 | 6.04e-07 | 6.631 | 0.000 | 2.82e-06 | 5.19e-06 |
🧠 What the Results Actually Show
Across specifications, the estimated treatment effect remains positive but modest:
- Naive regression: ~0.015
- Fully adjusted model: ~0.04
- FWL residual regression: ~0.035–0.04
- Interaction model: heterogeneous but directionally consistent
👉 This stability is important.
It suggests that:
- The initial bias in naive estimates is not extreme
- Adding controls and fixed effects refines rather than overturns the result
- The treatment effect is real but relatively small
Additionally, the increase from naive to adjusted estimates suggests the presence of negative selection bias in treatment assignment.
🔑 Key Takeaways
- Regression = adjusted comparison, not causality by default
- Controls matter → effect size increases after adjustment (bias correction)
- Month fixed effects matter → seasonality was confounding the estimate
- FWL confirms equivalence → partialling out yields consistent estimates
- Heterogeneity exists → treatment effectiveness varies across months
📊 Business Interpretation
- Estimated lift ≈ 3–4 percentage points
- Effect is statistically significant but economically modest
- Not a “silver bullet” channel — but directionally beneficial
👉 Translation:
Cellular contact improves conversion, but its impact depends on timing and context.
⚠️ When Should You Trust This?
Linear regression works well here because:
- ✅ Large sample size
- ✅ Reasonable overlap between treated and control groups
- ✅ Rich covariates capturing key confounders
- ✅ Effects approximately linear
❌ When This Would Break
Be cautious if:
- Hidden confounders exist (e.g., targeting based on unobserved intent)
- Strong nonlinear effects dominate
- Treatment assignment is highly selective
- Post-treatment variables are included as controls
🔗 Where This Fits in the Bigger Picture
Linear regression is the starting point for causal inference:
- If assumptions hold → simple, interpretable, and fast
- If assumptions weaken → move to:
- Inverse Propensity Weighting (IPW)
- Doubly Robust methods
- Meta-learners (S / T / X learners)
- Causal forests
🎯 Final Takeaway
Linear regression can recover meaningful causal effects in observational marketing data —
but its reliability comes not from the model itself, but from:
the validity of assumptions and the consistency of estimates across specifications.