A churn prediction model tells you which customers are likely to leave before they do. In theory, straightforward. In practice, most organizations build the wrong model — or no model at all — because they start with the algorithm instead of the data.
The algorithm choice matters less than most people think. Logistic regression and gradient boosting both produce useful churn predictions if the input data is right. The same models produce useless predictions if the input data is wrong. The work of building a churn model that actually drives retention outcomes is mostly about data preparation, feature engineering, and outcome definition — not algorithm selection.
This guide covers the data you need, how to define the problem correctly, and how to connect predictions to actual retention actions.
Define Churn Before Building Anything
This sounds obvious. Most teams skip it, then wonder why their model doesn't work.
Churn has different definitions depending on your business model, and the definition you choose determines both the model structure and which customers it helps you keep:
- Subscription cancellation: Customer explicitly ends their contract. Clean signal. Works well for SaaS, subscription boxes, recurring service contracts.
- Non-renewal: Customer doesn't renew at contract end. Distinguishes between active and passive exits — important for annual contract businesses.
- Behavioral churn: Customer stops engaging (last login 90 days ago, purchase frequency drops below threshold). No explicit signal — requires defining the threshold.
- Revenue churn: Customer reduces spend below a defined threshold. Important for variable contract or usage-based pricing models.
Pick one. Build your model around it. Multi-definition models are harder to interpret and harder to act on.
Once you've defined churn, define the prediction horizon. "Will this customer churn in the next 30 days?" is a different model than "Will this customer churn in the next 6 months?" — not because the data is different, but because the action threshold is different. A 30-day prediction drives immediate outreach. A 6-month prediction drives long-term engagement programs. Be clear about which one you need before you build.
The Data Inputs That Actually Predict Churn
Not all customer data predicts churn equally. These feature categories consistently show up in models with high predictive accuracy:
Engagement signals
How frequently the customer interacts with your product or service is the strongest predictor in nearly every churn model. For software: login frequency, feature usage breadth, sessions per week. For services: meeting attendance, document review completion, task response times. A customer who used to meet weekly and hasn't responded in three weeks is signaling something specific — your model should capture that signal precisely.
Bain & Company's 2023 customer retention research found that a 5% increase in customer retention produces a 25–95% increase in profits. The wide range reflects industry variability, but the directional relationship is consistent: retention has asymmetric value. The precision of your engagement signal tracking directly determines how early you can intervene.
Support and complaint history
Volume and type of support interactions correlate with churn more than most organizations realize. A customer who filed two support tickets last month is not the same as a customer who filed two tickets about billing specifically. Model these separately: general ticket volume, unresolved ticket count, complaint classification, and resolution time all carry distinct predictive value.
Product adoption breadth
Customers who use only one feature of a multi-feature product are significantly more likely to churn than customers who use three or more. Forrester's 2024 SaaS Customer Success report found that customers using 3+ core product features have a 67% lower churn rate than single-feature users. This "feature breadth" signal is consistently one of the top predictors in software churn models.
Relationship signals
Who your primary contact is, whether they've changed recently, and whether you're connected to a champion at the account. Contact turnover at a customer account is a leading indicator of churn — when the person who bought your product leaves, the replacement often re-evaluates the purchase. CRM data on contact role changes and account coverage belongs in the feature set. See our guide to predictive churn modelling for how to structure this data collection in your CRM.
Payment behavior
Late payments, failed payment attempts, and requests for billing plan changes are strong churn predictors — not just immediately (they're about to leave) but as early signals. Billing friction often precedes a cancellation decision by 60–90 days. If your billing system tracks payment attempt outcomes, include this in your model.
What Doesn't Predict Churn
Some features that appear useful turn out not to be:
NPS scores alone. Net Promoter Score is weakly correlated with individual churn — it's designed to measure population-level sentiment, not individual customer risk. A customer who gave you a 9 two months ago can still churn. Include it in your model if you have it, but don't weight it heavily.
Demographics and firmographics at onboarding. Company size, industry, and location at contract signing have minimal predictive power for churn once a customer is live. They matter for acquisition targeting, not retention prediction.
Self-reported satisfaction from check-in calls. Qualitative check-in calls tend to produce positive-biased responses. Customers who are about to churn often say things are fine because they haven't made the final decision yet. Don't over-weight this data — it's a lagging indicator, not a leading one.
Choosing the Right Model
For most B2B churn prediction use cases, you don't need a complex model:
Logistic regression is the right starting point. It's interpretable — you can explain exactly why the model assigned a given customer high risk — fast to train, and performs well when features are well-engineered. If your team has a data analyst but not a machine learning engineer, logistic regression is the practical choice.
Gradient boosting (XGBoost, LightGBM) performs better on complex non-linear relationships, particularly useful when you have a large number of features with significant interaction effects. More difficult to interpret, but SHAP values can be used to explain individual predictions. Appropriate when logistic regression performance plateaus.
Random forests sit between the two: better performance than logistic regression on complex relationships, more interpretable than gradient boosting. A reasonable middle ground if you need more performance than logistic regression provides but want explainability.
MIT's 2023 research on customer analytics implementation found that organizations using simpler, more interpretable models consistently drove better retention outcomes than those using more complex models — because front-line teams (sales, customer success) could understand and act on the predictions. The best model is the one your team will actually use.
Connecting Predictions to Actions
A churn model that lives in a spreadsheet or a data team dashboard doesn't prevent churn. The prediction has to connect to the system where retention actions happen.
The most practical implementation: feed churn risk scores directly into your CRM as a custom property, updated weekly. In HubSpot or Salesforce, create automated workflows that trigger specific actions when a customer's churn risk score crosses defined thresholds:
- Elevated risk (score 60–80%): Automated check-in task assigned to account owner. Flag in weekly account review.
- High risk (score 80–90%): Executive business review scheduled automatically. Escalation to customer success leadership.
- Critical risk (90%+): Immediate human outreach required. Saved-customer playbook activated.
The connection between CRM and model output is where most churn prediction projects fail. The model gets built, a dashboard gets created, and the customer success team continues working from the same manual list they've always used. For practical guidance on CRM integrations, our HubSpot implementation practice covers account health scoring and automated workflows in detail.
The customer lifecycle context also matters. Churn risk looks different at month 3 post-onboarding than at month 18. See our customer lifecycle stages guide for the framework that determines how to interpret churn signals at each stage.
Gartner's 2025 Customer Success Technology report found that organizations with automated churn early-warning systems reduce annual churn rates by an average of 3.1 percentage points compared to those using manual retention monitoring. At a $1M ARR book of business, 3 percentage points is $30,000/year — from a model that, once built, runs automatically.

