An end-to-end churn prediction workflow built for a subscription business — identifying customers most at risk of leaving, understanding the key drivers, and translating model outputs into targeted retention actions by segment.
Pythonpandasscikit-learnXGBoostscipymatplotlib
Client work is confidential. This case study uses the publicly available IBM Telco Customer Churn dataset to demonstrate the analytical approach.
The Challenge
The business problem
Subscription businesses lose revenue when customers leave — and the cost of acquiring a new customer far exceeds the cost of retaining an existing one. The challenge is identifying churn risk early enough to act, and understanding which customers to prioritise.
Which customers are most likely to churn?
What attributes are most strongly linked to leaving?
Which segments should be prioritised for retention?
How can model outputs be turned into targeted business action?
Approach
How the work was structured
01
Data Cleaning
Converted TotalCharges to numeric, removed 11 incomplete records, validated the resulting 7,032-row dataset.
02
Exploratory Analysis
Examined churn rates across contract type, tenure, monthly charges, internet service, payment method, and demographics.
03
Feature Engineering
Created tenure bands, service count, contract risk flag, support flag, and high-charge indicator to aid model interpretability.
04
Predictive Modelling
Trained and compared Logistic Regression, Random Forest, and XGBoost using class-balancing to handle the 74/26 churn imbalance.
05
Retention Strategy
Mapped model outputs and EDA findings to segment-level retention actions with an illustrative revenue impact estimate.
Exploratory Analysis
What the data revealed
Predictive Modelling
Model performance
Three models compared on an 80/20 stratified split with class-balancing
to handle the 74/26 churn imbalance. For churn prediction,
recall and ROC-AUC are the primary metrics — identifying
at-risk customers matters more than raw accuracy.
Model
Accuracy
Precision
Recall
F1
ROC-AUC
Logistic Regression
0.726
0.491
0.794
0.607
0.834
Random Forest
0.784
0.622
0.476
0.539
0.816
XGBoost
0.737
0.504
0.607
0.551
0.794
Business Insights
What this means in plain English
Contract type is the strongest churn predictor
Month-to-month customers churn at significantly higher rates than those on annual or two-year contracts. Flexible contracts reduce commitment and dramatically increase exit risk.
The first 12 months are the highest-risk window
Short-tenure customers show the highest churn rates. Customers who reach 12 months are substantially more likely to stay — early experience and onboarding are critical.
Higher charges increase risk among newer customers
Customers paying above-median monthly charges — particularly those with shorter tenure — show elevated churn. They may not yet perceive enough value to justify the cost.
Absence of support services increases churn risk
Customers without tech support or online security are more likely to leave, especially fibre subscribers. These customers encounter friction with no safety net.
Electronic check payers churn more
This payment method is associated with higher churn than automatic payment options, potentially reflecting lower service engagement or commitment.
Senior citizens show elevated churn risk
Senior citizens churn at a higher rate than non-senior customers, suggesting potential accessibility gaps or unmet needs in this segment.