Ai churn prediction for Shopify stores: 90-day playbook

ai churn prediction for Shopify stores: a warehouse-first 90-day playbook

This playbook gets you a working ai churn prediction for Shopify stores into Shopify and Klaviyo inside 90 days, with an experiment to measure real lift before committing to a custom model. It is deliverables-first: CSV and spreadsheet templates, SQL snippets (including temporal splits), Zapier payloads, an experiment sample-size worksheet, monitoring thresholds, and a vendor DPA clause.

Contents

ai churn prediction for Shopify stores: a warehouse-first 90-day playbook
Deliverables and prerequisites
Operator decision map
90-day MVP playbook: ship a rule-based baseline and an experiment
Reproducible artifact bundle
Labeling and validation best practices
Experiment design and measurement worksheet
Production monitoring and maintenance playbook
Privacy and compliance checklist plus vendor DPA clause
Scale path templates and hiring checklist
Appendix: artifacts and quick reference

Deliverables and prerequisites

What you will ship in 90 days

Customer and Orders CSV schema plus two annotated example rows.
Spreadsheet template with formulas to compute recency, frequency, monetary, and a simple rule-based risk score.
SQL snippets and a notebook skeleton for feature and label generation with temporal holdout.
Zapier webhook payload and Shopify metafield naming conventions to sync scores to profiles.
Experiment worksheet with a sample-size calculator and analysis checklist.
Monitoring dashboard spec and alert thresholds to detect pipeline failures and drift.
Privacy checklist and a short vendor DPA clause plus a deletion workflow example.

Prerequisites checklist

Data horizon: at least 6 months of customer and order history; 12 months preferred.
Customer count: the playbook is realistic for stores with roughly 5k to 50k unique customers. For under 1k, use rule-based tactics and measure before investing in modeling.
Stack: Shopify store + one export path (warehouse or CSV) + Klaviyo (or equivalent ESP) + Zapier or an automation layer for pushing scores.
Owner: growth lead or a small data engineer/contractor who can run SQL and schedule jobs.

Operator decision map

Quick questions to choose a path.

Do you need the score in ads or onsite personalization? If yes, choose warehouse-first ML (Path B).
Do you want the fastest time-to-value and no dev work? If yes, use a no-code vendor or built-in Klaviyo predictive segments (Path A).
Do you have a data owner and want control and cross-channel reuse? If yes, choose Path B and plan for Path C (bespoke modeling) only after proving lift.

Tradeoffs in brief

Path A: fastest launch, higher vendor lock and less transparency, low ops cost.
Path B: medium launch time, reusable scores across channels, requires a warehouse and small ops budget.
Path C: bespoke modeling, long build time, higher accuracy potential, needs ongoing ML ops and a clear ROI signal.

90-day MVP playbook: ship a rule-based baseline and an experiment

Phase 0: Week 0 – setup and exports

Export Customers and Orders CSV from Shopify for the last 12 months. Include customer_id, email, created_at, last_order_at, total_spent, orders_count, tags, and default_address_postal_code.
If you have subscriptions, also export subscription events: subscription_id, customer_id, status, canceled_at.
Create a weekly job to export and push these CSVs to your warehouse or shared drive.

Phase 1: Week 1-3 – spreadsheet sanity-check and rule score

Use the included spreadsheet template to compute simple features and a 0-10 rule score. Ship the rule-based segment to Klaviyo as a profile property or segment.

Phase 2: Week 3-6 – SQL features and label generation

Run the SQL snippets to compute the same features in your warehouse. Generate labels with a strict cutoff date and create a temporal holdout period for test/validation.

Phase 3: Week 6-10 – push scores to Shopify and Klaviyo

Map score to Shopify customer metafield name: metafield.namespace = lifecycle, metafield.key = churn_score_v1, store scores as integers 0 to 100.
Push via Zapier or a small script to Klaviyo profile property: profile.churn_score_v1.
Create two segments in Klaviyo: treatment (top 20% risk) and holdout (20% of those high-risk customers withheld for experiment).

Phase 4: Week 10-12 – run experiment and analyze

Deploy a win-back email/SMS cadence to treatment cohort only; keep holdout untouched for measurement.
Collect metrics for 30 days post-send; perform pre-specified analysis.
Decide: if uplift exceeds your threshold after adjusting for costs, move to Path B or C for a model. Otherwise, iterate on the rule.

Reproducible artifact bundle

CSV schema and annotated sample rows

Customers.csv header (UTF-8, comma separated)

customer_id, email, created_at, last_order_at, total_spent, orders_count, tags, default_postal_code

Products.csv header (if you need product-mix features)

order_id, product_id, sku, product_title, quantity, price, discount_amount

Example rows (annotated)

CUST_0001, alice@example.com, 2021-03-10, 2026-04-02, 420.50, 5, "first_time_buyer", 02139
CUST_0002, bob@example.com, 2022-11-18, 2026-02-10, 89.00, 1, "", 94105

Spreadsheet formulas (Google Sheets)

days_since_last_order: =DATEDIF(last_order_at_cell, TODAY(), “D”)
orders_last_90d: =COUNTIFS(order_dates_range, “>” & TODAY()-90, customer_ids_range, customer_id_cell)
rule_score (0-10) example: =MIN(10, FLOOR( (IF(orders_last_90d>2,3,IF(orders_last_90d=1,1,0)) + IF(days_since_last_order<30,3,IF(days_since_last_order<90,2,0)) + IF(total_spent>200,4,0) ) ,1) )

Ready-to-run SQL snippets

Canonical feature query (Postgres flavor)

WITH orders AS (
  SELECT id AS order_id, customer_id, created_at::date AS order_date, total_price::numeric AS order_value
  FROM raw.orders
  WHERE created_at >= (CURRENT_DATE - INTERVAL '18 months')
),
customers AS (
  SELECT id AS customer_id, email, created_at::date AS created_date
  FROM raw.customers
)
SELECT
  c.customer_id,
  c.email,
  MAX(o.order_date) AS last_order_date,
  EXTRACT(DAY FROM (CURRENT_DATE - MAX(o.order_date)))::int AS days_since_last_order,
  COUNT(o.order_id)::int AS orders_count,
  SUM(o.order_value)::numeric AS total_spent,
  SUM(CASE WHEN o.order_date >= CURRENT_DATE - INTERVAL '90 days' THEN 1 ELSE 0 END)::int AS orders_90d
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
GROUP BY c.customer_id, c.email;

Label generation with temporal cutoff and holdout split

-- Suppose label cutoff is 2026-03-01; label window is 90 days after cutoff
WITH base AS (
  SELECT customer_id, MAX(order_date) AS last_order_date
  FROM raw.orders
  WHERE order_date < '2026-03-01'::date
  GROUP BY customer_id
),
label_window AS (
  SELECT c.customer_id,
    CASE WHEN EXISTS (
      SELECT 1 FROM raw.orders o
      WHERE o.customer_id = c.customer_id
        AND o.order_date >= '2026-03-01'::date
        AND o.order_date < '2026-03-01'::date + INTERVAL '90 days'
    ) THEN 0 ELSE 1 END AS churn_label -- 1 means churned (no repeat)
  FROM base c
)
SELECT *, NTILE(5) OVER (ORDER BY last_order_date) AS recency_bin
FROM label_window;

Temporal holdout policy: use data with cutoff dates separated by at least label window length. For example, train on cutoffs through 2025-12-01, validate on cutoff 2026-01-01, test on cutoff 2026-03-01.

Zapier webhook payload to push score to Klaviyo or Shopify

POST /hooks HTTP/1.1
Content-Type: application/json

{
  "customer_id": "CUST_0001",
  "email": "alice@example.com",
  "churn_score_v1": 78,
  "score_bucket": "80-100",
  "timestamp": "2026-06-01T12:00:00Z"
}

Shopify metafield naming convention

namespace: lifecycle
key: churn_score_v1
value_type: integer

Labeling and validation best practices

Anti-leakage guardrails

Enforce a hard label cutoff date. Build features only from events strictly before that date.
Do not use derived fields that rely on post-cutoff events, for example do not include "days_to_next_order".
Make feature pipelines immutable for a given cutoff; record pipeline version in metadata.

Temporal holdout plan

Split by cutoff date, not random customer ID. Example timeline approach

Train: cutoffs ending up to 6 months prior to the most recent period.
Validation: a non-overlapping month after training cutoffs.
Test: the most recent cutoff period to approximate production timing.

Validation checks

Pre-period balance: compare average orders and recency between treatment and holdout before sending messages.
Label quality audit: sample 100 records and check raw order history to verify churn label correctness.
Feature leakage scan: flag any feature with correlation >0.95 to the label in the pre-cutoff data as suspicious.

Experiment design and measurement worksheet

Design

Unit: customer profile (use email hash or Shopify customer_id consistently).
Treatment: win-back campaign (email + optional SMS) triggered for high-risk segment.
Holdout: randomly withhold a fixed percentage of the high-risk segment (example: 20%) for measurement only.
Observation window: at least equal to typical reorder window or 30 days minimum for quick tests. Use longer windows for products with longer cadence.

KPI formulas

Repeat rate = customers_with_repeat_order_in_window / customers_in_group
Revenue per recipient = revenue_from_group / customers_in_group
Incremental revenue = revenue_treatment - revenue_holdout

Sample-size calculator (proportion test)

Use the standard two-proportion formula. For practical use, here is the formula and a worked hypothetical example.

Formula (per group sample size)

n = ( (Z_alpha/2 + Z_beta)^2 * (p1*(1-p1) + p2*(1-p2)) ) / (p1 - p2)^2

Parameters

Z_alpha/2 = 1.96 for 95% confidence.
Z_beta = 0.84 for 80% power.
p1 = baseline repeat rate, p2 = expected repeat rate under treatment.

Hypothetical example. Suppose baseline repeat rate p1 = 0.12. You aim for a 20 percent relative uplift, so p2 = 0.144. Plugging numbers gives an approximate n per group. Use a spreadsheet to compute exact values. If you cannot reach this n, either increase the effect you aim to detect or run a longer experiment.

Analysis checklist

Pre-registered analysis window and KPIs.
Confirm randomization integrity and no cross-contamination.
Compute confidence intervals and p-values; report absolute and relative uplift and cost per incremental order.
Check intermediate metrics like open and click rates to debug null results.

Production monitoring and maintenance playbook

Monitoring dashboard spec (minimum)

Data completeness: percent of profiles with non-null customer_id; alert if < 99%.
Join key health: percent null on customer_id or email after ETL; alert if > 10%.
Feature null rates: alert if any key feature null rate exceeds 10%.
Score distribution: daily histogram and percentiles; alert if median shifts by more than 0.5 standard deviations vs baseline.
Label drift: proportion of labeled churn events per window; alert if shift > 15 percent relative.
Sync failures: number of failed pushes to Klaviyo or Shopify; alert on any non-zero failures for more than one job cycle.

Remediation playbook

Missing join keys: run mapping job to backfill with email as secondary key; pause write to downstream systems until mapping is fixed.
Schema change detection: fail the job and notify owner; include previous and new schema sample in the alert.
Drift detected: freeze rollout of new model or score and revert to previous stable version; run diagnostics to identify feature changes.

Retrain cadence and rollback policy

Start with monthly retrain windows for the first three months, then move to quarterly if drift is low. Version every model and keep a rollback flag that can reassign the last stable score to profiles if production issues occur.

Privacy and compliance checklist plus vendor DPA clause

Minimum privacy steps

Minimize PII sent to vendors. Prefer hashed customer_id/email when possible.
Document data flows and retention in an internal register.
Maintain an auditable deletion workflow: map customer_id to vendor identifier and confirm deletion within agreed SLA.
For SMS, ensure opt-in consent and maintain suppression lists synced to your ESP before campaigns.

Sample DPA clause (short form)

Vendor will process Personal Data only on Controller's documented instructions. Vendor will not combine Controller data with other customers' data. Vendor will implement reasonable technical and organizational measures to protect data. Vendor will, within 30 days of Controller request, return or delete Personal Data and confirm completion. Vendor will notify Controller within 72 hours of any confirmed data breach affecting Personal Data.

Note: adjust timeframes to match your legal policy. The clause above is a starting point for negotiation.

Sample deletion workflow

Owner receives deletion request for customer_id X.
Owner runs query to collect vendor identifiers and export to secure channel.
Owner posts deletion request via vendor API and records request ID.
Vendor confirms deletion; owner rechecks vendor dataset and finalizes audit log within SLA.

Scale path templates and hiring checklist

When to scale beyond the 90-day test

Choose Path B or C only if: the experiment shows positive incremental lift after costs, you have a data owner for ongoing ops, and you can support retraining and monitoring. If any of those are missing, iterate on rules instead.

Vendor selection cheat-sheet

Vendor A (no-code): fastest launch, limited transparency, minimal engineering required.
Warehouse-first provider: medium build, scores reusable, requires SQL and sync engineering.
Bespoke agency: high cost, longer delivery, best when product complexity demands it.

Appendix: artifacts and quick reference

Notebook skeleton (Python pseudocode)

# connect to warehouse
# extract features using SQL snippets
# apply versioned feature transformations
# load labels with cutoff date
# split by cutoff for temporal validation
# train simple logistic regression as baseline
# evaluate and export scores to CSV

Common failure modes and quick fixes

Shopify merged customers: detect duplicate emails and map to canonical customer_id weekly.
Field rename in app export: keep a mapping layer and schema checksum job to catch changes within one job cycle.
Unexpected nulls after a promo: check discount-related features and cap extreme values before modeling.

Final practical decision

Run the 90-day rule-based test if speed and low overhead matter. Move to warehouse-first ML only if you prove incremental lift and you have a data owner to operate the pipeline. Avoid bespoke models until the experiment shows payback that justifies ongoing ML ops cost.