Bank Marketing — Causal Linear Regression

Estimating Treatment Effects with Linear Regression

Abstract

Estimating causal effects from observational data is challenging due to non-random treatment assignment.
In this post, we use a real-world bank marketing dataset to illustrate how linear regression can be used for causal inference under key assumptions such as conditional ignorability and overlap.

We contrast predictive vs causal modeling, highlight common pitfalls (e.g., selection bias and post-treatment controls), and provide practical diagnostics to assess whether regression estimates can be interpreted causally.
The goal is not just estimation, but building intuition for when such estimates are trustworthy.


Bank Marketing Case Study

This analysis applies causal inference concepts to the Bank Marketing dataset to estimate treatment effects in a practical setting.

Causal question

Does contacting customers by cellular instead of telephone increase the probability of subscription?

  • Treatment (T): contact = cellular (1) vs telephone (0)
  • Outcome (Y): subscription = yes (1) vs no (0)

We use the Bank Marketing dataset (Portuguese bank campaigns).

  • Official source (UCI):
    https://archive.ics.uci.edu/dataset/222/bank+marketing

Each row = customer

  • Features = demographics, history, campaign behavior
  • Treatment = contact channel
  • Outcome = subscription

📚 Reference

Facure, Matheus. Causal Inference in Python: Applying Causal Inference in the Tech Industry.

Interpretation Layer

What problem are we solving?

We are estimating the causal effect of a treatment variable on an outcome using regression under causal assumptions.


Predictive vs Causal Regression

Predictive regression asks:

If X changes, how does Y move in historical data?

Causal regression asks:

If we intervene and change Treatment, how does Y change?

This distinction is critical.


Required Assumptions

To interpret regression causally:

  • No unobserved confounding (Conditional Ignorability)
  • Correct functional form (or reasonable approximation)
  • No post-treatment leakage
  • Sufficient overlap between treated and control populations

Business Interpretation

In marketing / fintech:

  • Treatment coefficient ≈ incremental lift
  • Positive → treatment helps
  • Negative → treatment hurts
  • Near zero → no incremental value

Key Causal Assumptions Being Used

1. Conditional Ignorability

After controlling for X:

Treatment ⟂ Potential Outcomes

If violated → biased estimates


2. Overlap (Positivity)

Every user must have a non-zero probability of both treatment and control.

If violated:

  • Extrapolation risk
  • Unstable estimates

3. No Post-Treatment Controls

Do NOT include variables affected by treatment.

This creates:

  • Collider bias
  • Underestimation of treatment effect

Diagnostics — Causal Meaning

Residual Diagnostics

  • Random residuals → model specification reasonable
  • Patterned residuals → possible nonlinearity or missing confounders

Coefficient Stability

  • Large swings across specs → weak identification or collinearity

Overlap Checks

If treated/control covariates differ heavily:

Model extrapolates → causal estimate becomes fragile


Linear regression can estimate causal effects—but only when assumptions hold.
The real skill is not fitting the model, but validating those assumptions.


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
import os

np.random.seed(42)
FIG_DIR = "figures_ch4_bank_marketing"
os.makedirs(FIG_DIR, exist_ok=True)

!pip install ucimlrepo
Requirement already satisfied: ucimlrepo in c:\users\revan\minicondanew\lib\site-packages (0.0.7)
Requirement already satisfied: pandas>=1.0.0 in c:\users\revan\minicondanew\lib\site-packages (from ucimlrepo) (2.3.3)
Requirement already satisfied: certifi>=2020.12.5 in c:\users\revan\minicondanew\lib\site-packages (from ucimlrepo) (2025.11.12)
Requirement already satisfied: numpy>=1.26.0 in c:\users\revan\minicondanew\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2.3.5)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\revan\minicondanew\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in c:\users\revan\minicondanew\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in c:\users\revan\minicondanew\lib\site-packages (from pandas>=1.0.0->ucimlrepo) (2025.2)
Requirement already satisfied: six>=1.5 in c:\users\revan\minicondanew\lib\site-packages (from python-dateutil>=2.8.2->pandas>=1.0.0->ucimlrepo) (1.17.0)

Load public Bank Marketing data (UCI)


from ucimlrepo import fetch_ucirepo

bank = fetch_ucirepo(id=222)
df = pd.concat([bank.data.features, bank.data.targets], axis=1)

df["Y"] = (df["y"].astype(str).str.lower() == "yes").astype(int)
df["contact"] = df["contact"].astype(str).str.lower()
df = df[df["contact"].isin(["cellular","telephone"])].copy()
df["T"] = (df["contact"]=="cellular").astype(int)

df.head()

age job marital education default balance housing loan contact day_of_week month duration campaign pdays previous poutcome y Y T
12657 27 management single secondary no 35 no no cellular 4 jul 255 1 -1 0 NaN no 0 1
12658 54 blue-collar married primary no 466 no no cellular 4 jul 297 1 -1 0 NaN no 0 1
12659 43 blue-collar married secondary no 105 no yes cellular 4 jul 668 2 -1 0 NaN no 0 1
12660 31 technician single secondary no 19 no no telephone 4 jul 65 2 -1 0 NaN no 0 0
12661 27 technician single secondary no 126 yes yes cellular 4 jul 436 4 -1 0 NaN no 0 1

Naive regression (difference in means)

📊 How to Interpret the Treatment Coefficient

Treatment Coefficient ≈ Average Treatment Effect (ATE) if assumptions hold

Example: Coefficient = 0.12

Interpretation: If treatment is applied, outcome increases by ~0.12 units on average, holding confounders constant.

Business Translation: Expected incremental lift per treated user ≈ coefficient value


m_naive = smf.ols("Y ~ T", data=df).fit()
m_naive.summary().tables[1]

coef std err t P>|t| [0.025 0.975]
Intercept 0.1342 0.007 20.384 0.000 0.121 0.147
T 0.0150 0.007 2.171 0.030 0.001 0.029

Adjusted regression with month fixed effects

📊 How to Interpret the Treatment Coefficient

Treatment Coefficient ≈ Average Treatment Effect (ATE) if assumptions hold

Example: Coefficient = 0.12

Interpretation: If treatment is applied, outcome increases by ~0.12 units on average, holding confounders constant.

Business Translation: Expected incremental lift per treated user ≈ coefficient value


num_controls = ["age","balance","campaign","pdays","previous","day"]
num_controls = [c for c in num_controls if c in df.columns]

cat_controls = ["job","marital","education","housing","loan","month","poutcome"]
cat_controls = [c for c in cat_controls if c in df.columns]

formula = "Y ~ T"
for c in num_controls:
    formula += f" + {c}"
for c in cat_controls:
    formula += f" + C({c})"

m_adj = smf.ols(formula, data=df).fit()
m_adj.summary().tables[1]

coef std err t P>|t| [0.025 0.975]
Intercept 0.0799 0.040 2.003 0.045 0.002 0.158
C(job)[T.blue-collar] -0.0197 0.015 -1.313 0.189 -0.049 0.010
C(job)[T.entrepreneur] -0.0317 0.027 -1.169 0.243 -0.085 0.022
C(job)[T.housemaid] -0.0374 0.032 -1.178 0.239 -0.100 0.025
C(job)[T.management] 0.0140 0.016 0.865 0.387 -0.018 0.046
C(job)[T.retired] 0.0218 0.024 0.928 0.354 -0.024 0.068
C(job)[T.self-employed] 0.0060 0.025 0.239 0.811 -0.043 0.055
C(job)[T.services] 0.0029 0.017 0.167 0.867 -0.031 0.037
C(job)[T.student] 0.0855 0.027 3.213 0.001 0.033 0.138
C(job)[T.technician] -0.0068 0.015 -0.458 0.647 -0.036 0.022
C(job)[T.unemployed] 0.0746 0.027 2.773 0.006 0.022 0.127
C(marital)[T.married] 0.0178 0.013 1.355 0.175 -0.008 0.043
C(marital)[T.single] 0.0261 0.015 1.744 0.081 -0.003 0.055
C(education)[T.secondary] 0.0108 0.014 0.798 0.425 -0.016 0.037
C(education)[T.tertiary] 0.0287 0.017 1.727 0.084 -0.004 0.061
C(housing)[T.yes] -0.1011 0.010 -10.009 0.000 -0.121 -0.081
C(loan)[T.yes] -0.0382 0.012 -3.237 0.001 -0.061 -0.015
C(month)[T.aug] 0.1050 0.020 5.240 0.000 0.066 0.144
C(month)[T.dec] 0.1243 0.036 3.495 0.000 0.055 0.194
C(month)[T.feb] -0.0059 0.016 -0.359 0.720 -0.038 0.026
C(month)[T.jan] -0.0727 0.020 -3.674 0.000 -0.112 -0.034
C(month)[T.jul] 0.1828 0.027 6.877 0.000 0.131 0.235
C(month)[T.jun] 0.1547 0.024 6.543 0.000 0.108 0.201
C(month)[T.mar] 0.2091 0.030 6.869 0.000 0.149 0.269
C(month)[T.may] -0.0257 0.013 -1.969 0.049 -0.051 -0.000
C(month)[T.nov] -0.0323 0.016 -2.044 0.041 -0.063 -0.001
C(month)[T.oct] 0.1450 0.023 6.182 0.000 0.099 0.191
C(month)[T.sep] 0.2023 0.024 8.306 0.000 0.155 0.250
C(poutcome)[T.other] 0.0334 0.010 3.318 0.001 0.014 0.053
C(poutcome)[T.success] 0.4063 0.012 34.528 0.000 0.383 0.429
T 0.0414 0.016 2.639 0.008 0.011 0.072
age 0.0008 0.000 1.569 0.117 -0.000 0.002
balance 3.072e-06 1.32e-06 2.320 0.020 4.76e-07 5.67e-06
campaign -0.0144 0.003 -5.478 0.000 -0.020 -0.009
pdays 0.0001 4.29e-05 3.052 0.002 4.69e-05 0.000
previous 0.0018 0.001 2.059 0.040 8.62e-05 0.004

Frisch–Waugh–Lovell (FWL) theorem

📊 How to Interpret the Treatment Coefficient

Treatment Coefficient ≈ Average Treatment Effect (ATE) if assumptions hold

Example: Coefficient = 0.12

Interpretation: If treatment is applied, outcome increases by ~0.12 units on average, holding confounders constant.

Business Translation: Expected incremental lift per treated user ≈ coefficient value

🧪 Diagnostics — Causal Meaning

Residual Diagnostics:

  • Random residuals → Model specification reasonable
  • Patterned residuals → Possible nonlinearity or missing confounder

Coefficient Stability:

  • Large swings across specs → weak identification or collinearity

Overlap Checks: If treated and control covariate distributions differ heavily: → Model extrapolates → causal estimate fragile


f_T = "T ~ age + balance + C(month)"
f_Y = "Y ~ age + balance + C(month)"

mT = smf.ols(f_T, data=df).fit()
mY = smf.ols(f_Y, data=df).fit()

df["T_res"] = mT.resid
df["Y_res"] = mY.resid

m_fwl = smf.ols("Y_res ~ T_res", data=df).fit()
m_fwl.summary().tables[1]

coef std err t P>|t| [0.025 0.975]
Intercept 7.752e-15 0.002 4.09e-12 1.000 -0.004 0.004
T_res 0.0348 0.007 5.130 0.000 0.022 0.048

Heterogeneous effects (interaction with month)

📊 How to Interpret the Treatment Coefficient

Treatment Coefficient ≈ Average Treatment Effect (ATE) if assumptions hold

Example: Coefficient = 0.12

Interpretation: If treatment is applied, outcome increases by ~0.12 units on average, holding confounders constant.

Business Translation: Expected incremental lift per treated user ≈ coefficient value


m_inter = smf.ols("Y ~ T*C(month) + age + balance", data=df).fit()
m_inter.summary().tables[1]

coef std err t P>|t| [0.025 0.975]
Intercept 0.1503 0.026 5.840 0.000 0.100 0.201
C(month)[T.aug] -0.0609 0.032 -1.880 0.060 -0.124 0.003
C(month)[T.dec] 0.1768 0.061 2.907 0.004 0.058 0.296
C(month)[T.feb] -0.0500 0.032 -1.579 0.114 -0.112 0.012
C(month)[T.jan] -0.0935 0.038 -2.433 0.015 -0.169 -0.018
C(month)[T.jul] -0.1160 0.027 -4.333 0.000 -0.168 -0.064
C(month)[T.jun] 0.0530 0.045 1.179 0.239 -0.035 0.141
C(month)[T.mar] 0.2151 0.053 4.096 0.000 0.112 0.318
C(month)[T.may] -0.1171 0.029 -4.057 0.000 -0.174 -0.061
C(month)[T.nov] -0.0911 0.030 -3.063 0.002 -0.149 -0.033
C(month)[T.oct] 0.2386 0.038 6.242 0.000 0.164 0.314
C(month)[T.sep] 0.1794 0.048 3.713 0.000 0.085 0.274
T 0.0127 0.025 0.508 0.612 -0.036 0.062
T:C(month)[T.aug] -0.0275 0.033 -0.826 0.409 -0.093 0.038
T:C(month)[T.dec] 0.1171 0.066 1.764 0.078 -0.013 0.247
T:C(month)[T.feb] 0.0232 0.033 0.703 0.482 -0.042 0.088
T:C(month)[T.jan] 0.0015 0.040 0.036 0.971 -0.077 0.080
T:C(month)[T.jul] 0.0183 0.028 0.656 0.512 -0.036 0.073
T:C(month)[T.jun] 0.1884 0.047 3.995 0.000 0.096 0.281
T:C(month)[T.mar] 0.1182 0.055 2.130 0.033 0.009 0.227
T:C(month)[T.may] 0.0429 0.030 1.431 0.152 -0.016 0.102
T:C(month)[T.nov] -0.0110 0.031 -0.355 0.723 -0.072 0.050
T:C(month)[T.oct] 0.0058 0.041 0.141 0.888 -0.075 0.087
T:C(month)[T.sep] 0.1375 0.051 2.684 0.007 0.037 0.238
age 0.0007 0.000 3.828 0.000 0.000 0.001
balance 4.007e-06 6.04e-07 6.631 0.000 2.82e-06 5.19e-06

🧠 What the Results Actually Show

Across specifications, the estimated treatment effect remains positive but modest:

  • Naive regression: ~0.015
  • Fully adjusted model: ~0.04
  • FWL residual regression: ~0.035–0.04
  • Interaction model: heterogeneous but directionally consistent

👉 This stability is important.

It suggests that:

  • The initial bias in naive estimates is not extreme
  • Adding controls and fixed effects refines rather than overturns the result
  • The treatment effect is real but relatively small

Additionally, the increase from naive to adjusted estimates suggests the presence of negative selection bias in treatment assignment.


🔑 Key Takeaways

  • Regression = adjusted comparison, not causality by default
  • Controls matter → effect size increases after adjustment (bias correction)
  • Month fixed effects matter → seasonality was confounding the estimate
  • FWL confirms equivalence → partialling out yields consistent estimates
  • Heterogeneity exists → treatment effectiveness varies across months

📊 Business Interpretation

  • Estimated lift ≈ 3–4 percentage points
  • Effect is statistically significant but economically modest
  • Not a “silver bullet” channel — but directionally beneficial

👉 Translation:
Cellular contact improves conversion, but its impact depends on timing and context.


⚠️ When Should You Trust This?

Linear regression works well here because:

  • ✅ Large sample size
  • ✅ Reasonable overlap between treated and control groups
  • ✅ Rich covariates capturing key confounders
  • ✅ Effects approximately linear

❌ When This Would Break

Be cautious if:

  • Hidden confounders exist (e.g., targeting based on unobserved intent)
  • Strong nonlinear effects dominate
  • Treatment assignment is highly selective
  • Post-treatment variables are included as controls

🔗 Where This Fits in the Bigger Picture

Linear regression is the starting point for causal inference:

  • If assumptions hold → simple, interpretable, and fast
  • If assumptions weaken → move to:
    • Inverse Propensity Weighting (IPW)
    • Doubly Robust methods
    • Meta-learners (S / T / X learners)
    • Causal forests

🎯 Final Takeaway

Linear regression can recover meaningful causal effects in observational marketing data —
but its reliability comes not from the model itself, but from:

the validity of assumptions and the consistency of estimates across specifications.


Written on December 22, 2025