Experiments with EconML and Causal Learning: Customer Segmentation

Health economics

Author

Drew Day

Published

January 31, 2025

Introduction

This document will cover practical examples of the use of causal machine learning in the context of business strategy. It draws heavily from and then expands on the “Customer Scenarios” included in the EconML Python package documentation.

Case Study 1: Customer Segmentation

When developing a strategy to

Getting started: Importing and cleaning data

Source Code

#key imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.ensemble import GradientBoostingRegressor
from econml.dml import LinearDML, CausalForestDML
from econml.cate_interpreter import SingleTreeCateInterpreter, SingleTreePolicyInterpreter

#Load data
file_url = "https://msalicedatapublic.z5.web.core.windows.net/datasets/Pricing/pricing_sample.csv"
train_data = pd.read_csv(file_url)

train_data.head()

   account_age  age  avg_hours  ...    income  price     demand
0            3   53   1.834234  ...  0.960863    1.0   3.917117
1            5   54   7.171411  ...  0.732487    1.0  11.585706
2            3   33   5.351920  ...  1.130937    1.0  24.675960
3            2   34   6.723551  ...  0.929197    1.0   6.361776
4            4   30   2.448247  ...  0.533527    0.8  12.624123

[5 rows x 11 columns]

Source Code

#Define estimator inputs
Y = train_data["demand"] #main outcome
T = train_data["price"] #treatment
X = train_data[["income"]] #features
W = train_data.drop(columns = ["demand", "price", "income"]) #covariates

# Get test data
X_test = np.linspace(0, 5, 100).reshape(-1, 1)
X_test_data = pd.DataFrame(X_test, columns = ["income"])