Bank_marketing_causal_regression
Chapter 4 — Linear Regression for Causal Inference
Bank Marketing Case Study
This notebook illustrates Chapter 4 of Matheus Facure’s Causal Inference in Python using the Bank Marketing dataset.
Causal question:
Does contacting customers by cellular instead of telephone increase the probability
of subscribing to a term deposit?
- Treatment
T: contact = cellular (1) vs telephone (0) - Outcome
Y: subscription = yes (1) vs no (0)
Dataset (links + download options)
We use the Bank Marketing dataset (Portuguese bank direct marketing campaigns).
Official source (UCI Machine Learning Repository)
- Dataset page: https://archive.ics.uci.edu/dataset/222/bank+marketing
(Direct downloadable files are linked on that page.)
Kaggle mirror (CSV download; requires Kaggle login)
- https://www.kaggle.com/datasets/janiobachmann/bank-marketing-dataset
Which file should you use?
- Kaggle typically provides
bank-full.csv/bank.csv - The UCI dataset provides multiple formats; this notebook loads from UCI via
ucimlrepo
📘 Causal Linear Regression — Blog Interpretation Layer
What problem are we solving?
We are trying to estimate the causal effect of a treatment variable on an outcome variable using linear regression under causal assumptions.
Predictive vs Causal Regression
Predictive regression answers:
If X changes, how does Y move in historical data?
Causal regression answers:
If we intervene and change Treatment, how does Y change?
Required assumptions:
- No unobserved confounding (Conditional Ignorability)
- Correct functional form (or good approximation)
- No post-treatment leakage
- Sufficient overlap between treated and control populations
Business Interpretation
In marketing / fintech:
- Treatment coefficient ≈ Incremental lift
- Positive → treatment helps
- Negative → treatment hurts
- Near zero → no incremental value
⚠️ Key Causal Assumptions Being Used Here
1️⃣ Conditional Ignorability
After controlling for covariates X: Treatment ⟂ Potential Outcomes
If violated → biased effect estimate
2️⃣ Overlap (Positivity)
Every user has some probability of treatment and control
If violated:
- Extrapolation risk
- Unstable coefficients
3️⃣ No Post-Treatment Controls
Do NOT include variables influenced by treatment This creates collider bias or blocks part of the treatment effect
🧪 Diagnostics — Causal Meaning
Residual Diagnostics:
- Random residuals → Model specification reasonable
- Patterned residuals → Possible nonlinearity or missing confounder
Coefficient Stability:
- Large swings across specs → weak identification or collinearity
Overlap Checks: If treated and control covariate distributions differ heavily: → Model extrapolates → causal estimate fragile
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
import os
np.random.seed(42)
FIG_DIR = "figures_ch4_bank_marketing"
os.makedirs(FIG_DIR, exist_ok=True)
!pip install ucimlrepo
Requirement already satisfied: ucimlrepo in c:\users\revan\minicondanew\lib\site-packages (0.0.7)
Requirement already satisfied: pandas>=1.0.0 in c:\users\revan\minicondanew\lib\site-packages (from ucimlrepo) (2.3.3)
Requirement already satisfied: certifi>=2020.12.5 in c:\users\revan\minicondanew\lib\site-packages (from ucimlrepo) (2025.11.12)
Requirement already satisfied: numpy>=1.26.0 in c:\users\revan\minicondanew\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2.3.5)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\revan\minicondanew\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in c:\users\revan\minicondanew\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in c:\users\revan\minicondanew\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2025.2)
Requirement already satisfied: six>=1.5 in c:\users\revan\minicondanew\lib\site-packages (from python-dateutil>=2.8.2->pandas>=1.0.0->ucimlrepo) (1.17.0)
Load public Bank Marketing data (UCI)
from ucimlrepo import fetch_ucirepo
bank = fetch_ucirepo(id=222)
df = pd.concat([bank.data.features, bank.data.targets], axis=1)
df["Y"] = (df["y"].astype(str).str.lower() == "yes").astype(int)
df["contact"] = df["contact"].astype(str).str.lower()
df = df[df["contact"].isin(["cellular","telephone"])].copy()
df["T"] = (df["contact"]=="cellular").astype(int)
df.head()
| age | job | marital | education | default | balance | housing | loan | contact | day_of_week | month | duration | campaign | pdays | previous | poutcome | y | Y | T | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 12657 | 27 | management | single | secondary | no | 35 | no | no | cellular | 4 | jul | 255 | 1 | -1 | 0 | NaN | no | 0 | 1 |
| 12658 | 54 | blue-collar | married | primary | no | 466 | no | no | cellular | 4 | jul | 297 | 1 | -1 | 0 | NaN | no | 0 | 1 |
| 12659 | 43 | blue-collar | married | secondary | no | 105 | no | yes | cellular | 4 | jul | 668 | 2 | -1 | 0 | NaN | no | 0 | 1 |
| 12660 | 31 | technician | single | secondary | no | 19 | no | no | telephone | 4 | jul | 65 | 2 | -1 | 0 | NaN | no | 0 | 0 |
| 12661 | 27 | technician | single | secondary | no | 126 | yes | yes | cellular | 4 | jul | 436 | 4 | -1 | 0 | NaN | no | 0 | 1 |
Naive regression (difference in means)
📊 How to Interpret the Treatment Coefficient
Treatment Coefficient ≈ Average Treatment Effect (ATE) if assumptions hold
Example: Coefficient = 0.12
Interpretation: If treatment is applied, outcome increases by ~0.12 units on average, holding confounders constant.
Business Translation: Expected incremental lift per treated user ≈ coefficient value
m_naive = smf.ols("Y ~ T", data=df).fit()
m_naive.summary().tables[1]
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| Intercept | 0.1342 | 0.007 | 20.384 | 0.000 | 0.121 | 0.147 |
| T | 0.0150 | 0.007 | 2.171 | 0.030 | 0.001 | 0.029 |
Adjusted regression with month fixed effects
📊 How to Interpret the Treatment Coefficient
Treatment Coefficient ≈ Average Treatment Effect (ATE) if assumptions hold
Example: Coefficient = 0.12
Interpretation: If treatment is applied, outcome increases by ~0.12 units on average, holding confounders constant.
Business Translation: Expected incremental lift per treated user ≈ coefficient value
num_controls = ["age","balance","campaign","pdays","previous","day"]
num_controls = [c for c in num_controls if c in df.columns]
cat_controls = ["job","marital","education","housing","loan","month","poutcome"]
cat_controls = [c for c in cat_controls if c in df.columns]
formula = "Y ~ T"
for c in num_controls:
formula += f" + {c}"
for c in cat_controls:
formula += f" + C({c})"
m_adj = smf.ols(formula, data=df).fit()
m_adj.summary().tables[1]
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| Intercept | 0.0799 | 0.040 | 2.003 | 0.045 | 0.002 | 0.158 |
| C(job)[T.blue-collar] | -0.0197 | 0.015 | -1.313 | 0.189 | -0.049 | 0.010 |
| C(job)[T.entrepreneur] | -0.0317 | 0.027 | -1.169 | 0.243 | -0.085 | 0.022 |
| C(job)[T.housemaid] | -0.0374 | 0.032 | -1.178 | 0.239 | -0.100 | 0.025 |
| C(job)[T.management] | 0.0140 | 0.016 | 0.865 | 0.387 | -0.018 | 0.046 |
| C(job)[T.retired] | 0.0218 | 0.024 | 0.928 | 0.354 | -0.024 | 0.068 |
| C(job)[T.self-employed] | 0.0060 | 0.025 | 0.239 | 0.811 | -0.043 | 0.055 |
| C(job)[T.services] | 0.0029 | 0.017 | 0.167 | 0.867 | -0.031 | 0.037 |
| C(job)[T.student] | 0.0855 | 0.027 | 3.213 | 0.001 | 0.033 | 0.138 |
| C(job)[T.technician] | -0.0068 | 0.015 | -0.458 | 0.647 | -0.036 | 0.022 |
| C(job)[T.unemployed] | 0.0746 | 0.027 | 2.773 | 0.006 | 0.022 | 0.127 |
| C(marital)[T.married] | 0.0178 | 0.013 | 1.355 | 0.175 | -0.008 | 0.043 |
| C(marital)[T.single] | 0.0261 | 0.015 | 1.744 | 0.081 | -0.003 | 0.055 |
| C(education)[T.secondary] | 0.0108 | 0.014 | 0.798 | 0.425 | -0.016 | 0.037 |
| C(education)[T.tertiary] | 0.0287 | 0.017 | 1.727 | 0.084 | -0.004 | 0.061 |
| C(housing)[T.yes] | -0.1011 | 0.010 | -10.009 | 0.000 | -0.121 | -0.081 |
| C(loan)[T.yes] | -0.0382 | 0.012 | -3.237 | 0.001 | -0.061 | -0.015 |
| C(month)[T.aug] | 0.1050 | 0.020 | 5.240 | 0.000 | 0.066 | 0.144 |
| C(month)[T.dec] | 0.1243 | 0.036 | 3.495 | 0.000 | 0.055 | 0.194 |
| C(month)[T.feb] | -0.0059 | 0.016 | -0.359 | 0.720 | -0.038 | 0.026 |
| C(month)[T.jan] | -0.0727 | 0.020 | -3.674 | 0.000 | -0.112 | -0.034 |
| C(month)[T.jul] | 0.1828 | 0.027 | 6.877 | 0.000 | 0.131 | 0.235 |
| C(month)[T.jun] | 0.1547 | 0.024 | 6.543 | 0.000 | 0.108 | 0.201 |
| C(month)[T.mar] | 0.2091 | 0.030 | 6.869 | 0.000 | 0.149 | 0.269 |
| C(month)[T.may] | -0.0257 | 0.013 | -1.969 | 0.049 | -0.051 | -0.000 |
| C(month)[T.nov] | -0.0323 | 0.016 | -2.044 | 0.041 | -0.063 | -0.001 |
| C(month)[T.oct] | 0.1450 | 0.023 | 6.182 | 0.000 | 0.099 | 0.191 |
| C(month)[T.sep] | 0.2023 | 0.024 | 8.306 | 0.000 | 0.155 | 0.250 |
| C(poutcome)[T.other] | 0.0334 | 0.010 | 3.318 | 0.001 | 0.014 | 0.053 |
| C(poutcome)[T.success] | 0.4063 | 0.012 | 34.528 | 0.000 | 0.383 | 0.429 |
| T | 0.0414 | 0.016 | 2.639 | 0.008 | 0.011 | 0.072 |
| age | 0.0008 | 0.000 | 1.569 | 0.117 | -0.000 | 0.002 |
| balance | 3.072e-06 | 1.32e-06 | 2.320 | 0.020 | 4.76e-07 | 5.67e-06 |
| campaign | -0.0144 | 0.003 | -5.478 | 0.000 | -0.020 | -0.009 |
| pdays | 0.0001 | 4.29e-05 | 3.052 | 0.002 | 4.69e-05 | 0.000 |
| previous | 0.0018 | 0.001 | 2.059 | 0.040 | 8.62e-05 | 0.004 |
Frisch–Waugh–Lovell (FWL) theorem
📊 How to Interpret the Treatment Coefficient
Treatment Coefficient ≈ Average Treatment Effect (ATE) if assumptions hold
Example: Coefficient = 0.12
Interpretation: If treatment is applied, outcome increases by ~0.12 units on average, holding confounders constant.
Business Translation: Expected incremental lift per treated user ≈ coefficient value
🧪 Diagnostics — Causal Meaning
Residual Diagnostics:
- Random residuals → Model specification reasonable
- Patterned residuals → Possible nonlinearity or missing confounder
Coefficient Stability:
- Large swings across specs → weak identification or collinearity
Overlap Checks: If treated and control covariate distributions differ heavily: → Model extrapolates → causal estimate fragile
f_T = "T ~ age + balance + C(month)"
f_Y = "Y ~ age + balance + C(month)"
mT = smf.ols(f_T, data=df).fit()
mY = smf.ols(f_Y, data=df).fit()
df["T_res"] = mT.resid
df["Y_res"] = mY.resid
m_fwl = smf.ols("Y_res ~ T_res", data=df).fit()
m_fwl.summary().tables[1]
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| Intercept | 7.752e-15 | 0.002 | 4.09e-12 | 1.000 | -0.004 | 0.004 |
| T_res | 0.0348 | 0.007 | 5.130 | 0.000 | 0.022 | 0.048 |
Heterogeneous effects (interaction with month)
📊 How to Interpret the Treatment Coefficient
Treatment Coefficient ≈ Average Treatment Effect (ATE) if assumptions hold
Example: Coefficient = 0.12
Interpretation: If treatment is applied, outcome increases by ~0.12 units on average, holding confounders constant.
Business Translation: Expected incremental lift per treated user ≈ coefficient value
m_inter = smf.ols("Y ~ T*C(month) + age + balance", data=df).fit()
m_inter.summary().tables[1]
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| Intercept | 0.1503 | 0.026 | 5.840 | 0.000 | 0.100 | 0.201 |
| C(month)[T.aug] | -0.0609 | 0.032 | -1.880 | 0.060 | -0.124 | 0.003 |
| C(month)[T.dec] | 0.1768 | 0.061 | 2.907 | 0.004 | 0.058 | 0.296 |
| C(month)[T.feb] | -0.0500 | 0.032 | -1.579 | 0.114 | -0.112 | 0.012 |
| C(month)[T.jan] | -0.0935 | 0.038 | -2.433 | 0.015 | -0.169 | -0.018 |
| C(month)[T.jul] | -0.1160 | 0.027 | -4.333 | 0.000 | -0.168 | -0.064 |
| C(month)[T.jun] | 0.0530 | 0.045 | 1.179 | 0.239 | -0.035 | 0.141 |
| C(month)[T.mar] | 0.2151 | 0.053 | 4.096 | 0.000 | 0.112 | 0.318 |
| C(month)[T.may] | -0.1171 | 0.029 | -4.057 | 0.000 | -0.174 | -0.061 |
| C(month)[T.nov] | -0.0911 | 0.030 | -3.063 | 0.002 | -0.149 | -0.033 |
| C(month)[T.oct] | 0.2386 | 0.038 | 6.242 | 0.000 | 0.164 | 0.314 |
| C(month)[T.sep] | 0.1794 | 0.048 | 3.713 | 0.000 | 0.085 | 0.274 |
| T | 0.0127 | 0.025 | 0.508 | 0.612 | -0.036 | 0.062 |
| T:C(month)[T.aug] | -0.0275 | 0.033 | -0.826 | 0.409 | -0.093 | 0.038 |
| T:C(month)[T.dec] | 0.1171 | 0.066 | 1.764 | 0.078 | -0.013 | 0.247 |
| T:C(month)[T.feb] | 0.0232 | 0.033 | 0.703 | 0.482 | -0.042 | 0.088 |
| T:C(month)[T.jan] | 0.0015 | 0.040 | 0.036 | 0.971 | -0.077 | 0.080 |
| T:C(month)[T.jul] | 0.0183 | 0.028 | 0.656 | 0.512 | -0.036 | 0.073 |
| T:C(month)[T.jun] | 0.1884 | 0.047 | 3.995 | 0.000 | 0.096 | 0.281 |
| T:C(month)[T.mar] | 0.1182 | 0.055 | 2.130 | 0.033 | 0.009 | 0.227 |
| T:C(month)[T.may] | 0.0429 | 0.030 | 1.431 | 0.152 | -0.016 | 0.102 |
| T:C(month)[T.nov] | -0.0110 | 0.031 | -0.355 | 0.723 | -0.072 | 0.050 |
| T:C(month)[T.oct] | 0.0058 | 0.041 | 0.141 | 0.888 | -0.075 | 0.087 |
| T:C(month)[T.sep] | 0.1375 | 0.051 | 2.684 | 0.007 | 0.037 | 0.238 |
| age | 0.0007 | 0.000 | 3.828 | 0.000 | 0.000 | 0.001 |
| balance | 4.007e-06 | 6.04e-07 | 6.631 | 0.000 | 2.82e-06 | 5.19e-06 |
Key takeaways
- Regression = adjusted comparison
- Month fixed effects remove seasonality bias
- FWL explains why controls work
- Interactions show when marketing works better
🧠 When Linear Regression Works Well for Causal Inference
✅ Large sample size
✅ Good overlap
✅ Strong confounder coverage
✅ Approximately linear effect
🚫 When It Struggles
❌ Strong nonlinear HTE
❌ Hidden confounders
❌ Extreme treatment imbalance
❌ Post-treatment variable leakage
🔄 Bridge to Meta-Learners and Forests
If linear model struggles:
→ S-Learner / T-Learner
→ X-Learner
→ Causal Forests