Building a Churn Prediction Model: What Data You Actually Need

March 31, 2026|By Brantley Davidson|CEO & Founder, Prometheus Agency

AI Strategy•

Customer Retention•

Revenue Operations

8 min

Key Takeaways

Start with the data and the churn definition — not the algorithm. The algorithm matters far less than the quality of your input features.
Define one churn type (subscription cancellation, non-renewal, behavioral, or revenue churn) and one prediction horizon before building anything
Top predictive features: engagement frequency, product adoption breadth, support history, contact turnover, and payment behavior
Avoid over-weighting NPS scores, firmographics, and self-reported satisfaction — all are weak or lagging predictors
Logistic regression is the right starting point for most teams — MIT 2023 found simpler models consistently drove better retention outcomes because teams could actually act on them
Gartner 2025: automated churn early-warning systems reduce annual churn by 3.1 percentage points on average — at $1M ARR, that is $30,000/year from a model that runs automatically

How to build a churn prediction model that actually works — the right data inputs, how to define churn correctly, which algorithm to choose, and how to connect predictions to CRM actions.

Cover image for churn prediction model guide

How to build a churn prediction model that actually works — the right data inputs, how to define churn correctly, which algorithm to choose, and how to connect predictions to CRM actions.

A churn prediction model tells you which customers are likely to leave before they do. In theory, straightforward. In practice, most organizations build the wrong model — or no model at all — because they start with the algorithm instead of the data.

The algorithm choice matters less than most people think. Logistic regression and gradient boosting both produce useful churn predictions if the input data is right. The same models produce useless predictions if the input data is wrong. The work of building a churn model that actually drives retention outcomes is mostly about data preparation, feature engineering, and outcome definition — not algorithm selection.

This guide covers the data you need, how to define the problem correctly, and how to connect predictions to actual retention actions.

Define Churn Before Building Anything

This sounds obvious. Most teams skip it, then wonder why their model doesn't work.

Churn has different definitions depending on your business model, and the definition you choose determines both the model structure and which customers it helps you keep:

Subscription cancellation: Customer explicitly ends their contract. Clean signal. Works well for SaaS, subscription boxes, recurring service contracts.
Non-renewal: Customer doesn't renew at contract end. Distinguishes between active and passive exits — important for annual contract businesses.
Behavioral churn: Customer stops engaging (last login 90 days ago, purchase frequency drops below threshold). No explicit signal — requires defining the threshold.
Revenue churn: Customer reduces spend below a defined threshold. Important for variable contract or usage-based pricing models.

Pick one. Build your model around it. Multi-definition models are harder to interpret and harder to act on.

Once you've defined churn, define the prediction horizon. "Will this customer churn in the next 30 days?" is a different model than "Will this customer churn in the next 6 months?" — not because the data is different, but because the action threshold is different. A 30-day prediction drives immediate outreach. A 6-month prediction drives long-term engagement programs. Be clear about which one you need before you build.

The Data Inputs That Actually Predict Churn

Not all customer data predicts churn equally. These feature categories consistently show up in models with high predictive accuracy:

Engagement signals

How frequently the customer interacts with your product or service is the strongest predictor in nearly every churn model. For software: login frequency, feature usage breadth, sessions per week. For services: meeting attendance, document review completion, task response times. A customer who used to meet weekly and hasn't responded in three weeks is signaling something specific — your model should capture that signal precisely.

Bain & Company's 2023 customer retention research found that a 5% increase in customer retention produces a 25–95% increase in profits. The wide range reflects industry variability, but the directional relationship is consistent: retention has asymmetric value. The precision of your engagement signal tracking directly determines how early you can intervene.

Support and complaint history

Volume and type of support interactions correlate with churn more than most organizations realize. A customer who filed two support tickets last month is not the same as a customer who filed two tickets about billing specifically. Model these separately: general ticket volume, unresolved ticket count, complaint classification, and resolution time all carry distinct predictive value.

Product adoption breadth

Customers who use only one feature of a multi-feature product are significantly more likely to churn than customers who use three or more. Forrester's 2024 SaaS Customer Success report found that customers using 3+ core product features have a 67% lower churn rate than single-feature users. This "feature breadth" signal is consistently one of the top predictors in software churn models.

Relationship signals

Who your primary contact is, whether they've changed recently, and whether you're connected to a champion at the account. Contact turnover at a customer account is a leading indicator of churn — when the person who bought your product leaves, the replacement often re-evaluates the purchase. CRM data on contact role changes and account coverage belongs in the feature set. See our guide to predictive churn modelling for how to structure this data collection in your CRM.

Payment behavior

Late payments, failed payment attempts, and requests for billing plan changes are strong churn predictors — not just immediately (they're about to leave) but as early signals. Billing friction often precedes a cancellation decision by 60–90 days. If your billing system tracks payment attempt outcomes, include this in your model.

What Doesn't Predict Churn

Some features that appear useful turn out not to be:

NPS scores alone. Net Promoter Score is weakly correlated with individual churn — it's designed to measure population-level sentiment, not individual customer risk. A customer who gave you a 9 two months ago can still churn. Include it in your model if you have it, but don't weight it heavily.

Demographics and firmographics at onboarding. Company size, industry, and location at contract signing have minimal predictive power for churn once a customer is live. They matter for acquisition targeting, not retention prediction.

Self-reported satisfaction from check-in calls. Qualitative check-in calls tend to produce positive-biased responses. Customers who are about to churn often say things are fine because they haven't made the final decision yet. Don't over-weight this data — it's a lagging indicator, not a leading one.

Choosing the Right Model

For most B2B churn prediction use cases, you don't need a complex model:

Logistic regression is the right starting point. It's interpretable — you can explain exactly why the model assigned a given customer high risk — fast to train, and performs well when features are well-engineered. If your team has a data analyst but not a machine learning engineer, logistic regression is the practical choice.

Gradient boosting (XGBoost, LightGBM) performs better on complex non-linear relationships, particularly useful when you have a large number of features with significant interaction effects. More difficult to interpret, but SHAP values can be used to explain individual predictions. Appropriate when logistic regression performance plateaus.

Random forests sit between the two: better performance than logistic regression on complex relationships, more interpretable than gradient boosting. A reasonable middle ground if you need more performance than logistic regression provides but want explainability.

MIT's 2023 research on customer analytics implementation found that organizations using simpler, more interpretable models consistently drove better retention outcomes than those using more complex models — because front-line teams (sales, customer success) could understand and act on the predictions. The best model is the one your team will actually use.

Connecting Predictions to Actions

A churn model that lives in a spreadsheet or a data team dashboard doesn't prevent churn. The prediction has to connect to the system where retention actions happen.

The most practical implementation: feed churn risk scores directly into your CRM as a custom property, updated weekly. In HubSpot or Salesforce, create automated workflows that trigger specific actions when a customer's churn risk score crosses defined thresholds:

Elevated risk (score 60–80%): Automated check-in task assigned to account owner. Flag in weekly account review.
High risk (score 80–90%): Executive business review scheduled automatically. Escalation to customer success leadership.
Critical risk (90%+): Immediate human outreach required. Saved-customer playbook activated.

The connection between CRM and model output is where most churn prediction projects fail. The model gets built, a dashboard gets created, and the customer success team continues working from the same manual list they've always used. For practical guidance on CRM integrations, our HubSpot implementation practice covers account health scoring and automated workflows in detail.

The customer lifecycle context also matters. Churn risk looks different at month 3 post-onboarding than at month 18. See our customer lifecycle stages guide for the framework that determines how to interpret churn signals at each stage.

Gartner's 2025 Customer Success Technology report found that organizations with automated churn early-warning systems reduce annual churn rates by an average of 3.1 percentage points compared to those using manual retention monitoring. At a $1M ARR book of business, 3 percentage points is $30,000/year — from a model that, once built, runs automatically.

Brantley Davidson

CEO & Founder, Prometheus Agency

FAQs

What data do you need to build a churn prediction model?

The highest-signal features in most churn models are: engagement frequency (login frequency, feature usage, session volume), product adoption breadth (number of features used), support history (ticket volume, unresolved tickets, complaint type), relationship signals (contact turnover at the account), and payment behavior (late payments, failed payment attempts). NPS scores, firmographics from onboarding, and self-reported satisfaction from check-in calls have weak predictive value and should be weighted lightly.

What is the best algorithm for churn prediction?

For most B2B churn prediction use cases, logistic regression is the right starting point — it's interpretable, fast to train, and performs well with well-engineered features. Gradient boosting (XGBoost, LightGBM) performs better on complex non-linear relationships but is harder to interpret. MIT 2023 research found that organizations using simpler, interpretable models consistently drove better retention outcomes than those using complex models, because front-line teams could understand and act on the predictions.

How do you connect a churn prediction model to your CRM?

Feed churn risk scores into your CRM as a custom property updated on a weekly schedule. Then create automated workflows that trigger retention actions when scores cross defined thresholds — for example: elevated risk (60–80%) triggers a check-in task assignment; high risk (80–90%) schedules an executive business review; critical risk (90%+) activates an immediate outreach playbook. The model only prevents churn if it connects to the system where retention actions happen.

How do you define churn for a churn prediction model?

Pick one definition and build your model around it: subscription cancellation (explicit contract end), non-renewal (passive exit at contract term), behavioral churn (engagement drops below a defined threshold), or revenue churn (spend drops below a threshold). Also define the prediction horizon upfront — a 30-day model drives immediate outreach, a 6-month model drives long-term engagement programs. Multi-definition models are harder to interpret and harder to act on.

Connect Churn Prediction to Your CRM

Prometheus builds customer health scoring and churn prediction integrations inside HubSpot and Salesforce — turning model outputs into automated retention workflows your team will actually use.

Related Insights

Predictive Churn Modelling: The Complete Guide

A comprehensive guide to predictive churn modelling — methodology, data requirements, and implementation for B2B organizations.

The Stages of the Customer Lifecycle

How to map the customer lifecycle and what retention looks like at each stage.

Sales Funnel Optimization Strategies

Proven strategies for identifying and closing conversion gaps at every stage of your sales funnel.

About Prometheus Agency: We are the technology team middle-market operators don’t have — embedded in their business, accountable for their results. AI, CRM, and ERP transformation for manufacturing, construction, distribution, and logistics companies.

Book a 30-minute discovery call