ai churn prediction for Shopify stores: a warehouse-first 90-day playbook
This playbook gets you a working ai churn prediction for Shopify stores into Shopify and Klaviyo inside 90 days, with an experiment to measure real lift before committing to a custom model. It is deliverables-first: CSV and spreadsheet templates, SQL snippets (including temporal splits), Zapier payloads, an experiment sample-size worksheet, monitoring thresholds, and a vendor DPA clause.
- ai churn prediction for Shopify stores: a warehouse-first 90-day playbook
- Deliverables and prerequisites
- Operator decision map
- 90-day MVP playbook: ship a rule-based baseline and an experiment
- Reproducible artifact bundle
- Labeling and validation best practices
- Experiment design and measurement worksheet
- Production monitoring and maintenance playbook
- Privacy and compliance checklist plus vendor DPA clause
- Scale path templates and hiring checklist
- Appendix: artifacts and quick reference
Deliverables and prerequisites
What you will ship in 90 days
- Customer and Orders CSV schema plus two annotated example rows.
- Spreadsheet template with formulas to compute recency, frequency, monetary, and a simple rule-based risk score.
- SQL snippets and a notebook skeleton for feature and label generation with temporal holdout.
- Zapier webhook payload and Shopify metafield naming conventions to sync scores to profiles.
- Experiment worksheet with a sample-size calculator and analysis checklist.
- Monitoring dashboard spec and alert thresholds to detect pipeline failures and drift.
- Privacy checklist and a short vendor DPA clause plus a deletion workflow example.
Prerequisites checklist
- Data horizon: at least 6 months of customer and order history; 12 months preferred.
- Customer count: the playbook is realistic for stores with roughly 5k to 50k unique customers. For under 1k, use rule-based tactics and measure before investing in modeling.
- Stack: Shopify store + one export path (warehouse or CSV) + Klaviyo (or equivalent ESP) + Zapier or an automation layer for pushing scores.
- Owner: growth lead or a small data engineer/contractor who can run SQL and schedule jobs.
Operator decision map
Quick questions to choose a path.
- Do you need the score in ads or onsite personalization? If yes, choose warehouse-first ML (Path B).
- Do you want the fastest time-to-value and no dev work? If yes, use a no-code vendor or built-in Klaviyo predictive segments (Path A).
- Do you have a data owner and want control and cross-channel reuse? If yes, choose Path B and plan for Path C (bespoke modeling) only after proving lift.
Tradeoffs in brief
- Path A: fastest launch, higher vendor lock and less transparency, low ops cost.
- Path B: medium launch time, reusable scores across channels, requires a warehouse and small ops budget.
- Path C: bespoke modeling, long build time, higher accuracy potential, needs ongoing ML ops and a clear ROI signal.
90-day MVP playbook: ship a rule-based baseline and an experiment
Phase 0: Week 0 – setup and exports
- Export Customers and Orders CSV from Shopify for the last 12 months. Include customer_id, email, created_at, last_order_at, total_spent, orders_count, tags, and default_address_postal_code.
- If you have subscriptions, also export subscription events: subscription_id, customer_id, status, canceled_at.
- Create a weekly job to export and push these CSVs to your warehouse or shared drive.
Phase 1: Week 1-3 – spreadsheet sanity-check and rule score
Use the included spreadsheet template to compute simple features and a 0-10 rule score. Ship the rule-based segment to Klaviyo as a profile property or segment.
Phase 2: Week 3-6 – SQL features and label generation
Run the SQL snippets to compute the same features in your warehouse. Generate labels with a strict cutoff date and create a temporal holdout period for test/validation.
Phase 3: Week 6-10 – push scores to Shopify and Klaviyo
- Map score to Shopify customer metafield name: metafield.namespace = lifecycle, metafield.key = churn_score_v1, store scores as integers 0 to 100.
- Push via Zapier or a small script to Klaviyo profile property: profile.churn_score_v1.
- Create two segments in Klaviyo: treatment (top 20% risk) and holdout (20% of those high-risk customers withheld for experiment).
Phase 4: Week 10-12 – run experiment and analyze
- Deploy a win-back email/SMS cadence to treatment cohort only; keep holdout untouched for measurement.
- Collect metrics for 30 days post-send; perform pre-specified analysis.
- Decide: if uplift exceeds your threshold after adjusting for costs, move to Path B or C for a model. Otherwise, iterate on the rule.
Reproducible artifact bundle
CSV schema and annotated sample rows
Customers.csv header (UTF-8, comma separated)
customer_id, email, created_at, last_order_at, total_spent, orders_count, tags, default_postal_code
Products.csv header (if you need product-mix features)
order_id, product_id, sku, product_title, quantity, price, discount_amount
Example rows (annotated)
CUST_0001, alice@example.com, 2021-03-10, 2026-04-02, 420.50, 5, "first_time_buyer", 02139 CUST_0002, bob@example.com, 2022-11-18, 2026-02-10, 89.00, 1, "", 94105
Spreadsheet formulas (Google Sheets)
- days_since_last_order: =DATEDIF(last_order_at_cell, TODAY(), “D”)
- orders_last_90d: =COUNTIFS(order_dates_range, “>” & TODAY()-90, customer_ids_range, customer_id_cell)
- rule_score (0-10) example: =MIN(10, FLOOR( (IF(orders_last_90d>2,3,IF(orders_last_90d=1,1,0)) + IF(days_since_last_order<30,3,IF(days_since_last_order<90,2,0)) + IF(total_spent>200,4,0) ) ,1) )
Ready-to-run SQL snippets
Canonical feature query (Postgres flavor)
WITH orders AS ( SELECT id AS order_id, customer_id, created_at::date AS order_date, total_price::numeric AS order_value FROM raw.orders WHERE created_at >= (CURRENT_DATE - INTERVAL '18 months') ), customers AS ( SELECT id AS customer_id, email, created_at::date AS created_date FROM raw.customers ) SELECT c.customer_id, c.email, MAX(o.order_date) AS last_order_date, EXTRACT(DAY FROM (CURRENT_DATE - MAX(o.order_date)))::int AS days_since_last_order, COUNT(o.order_id)::int AS orders_count, SUM(o.order_value)::numeric AS total_spent, SUM(CASE WHEN o.order_date >= CURRENT_DATE - INTERVAL '90 days' THEN 1 ELSE 0 END)::int AS orders_90d FROM customers c LEFT JOIN orders o ON o.customer_id = c.customer_id GROUP BY c.customer_id, c.email;
Label generation with temporal cutoff and holdout split
-- Suppose label cutoff is 2026-03-01; label window is 90 days after cutoff
WITH base AS (
SELECT customer_id, MAX(order_date) AS last_order_date
FROM raw.orders
WHERE order_date < '2026-03-01'::date
GROUP BY customer_id
),
label_window AS (
SELECT c.customer_id,
CASE WHEN EXISTS (
SELECT 1 FROM raw.orders o
WHERE o.customer_id = c.customer_id
AND o.order_date >= '2026-03-01'::date
AND o.order_date < '2026-03-01'::date + INTERVAL '90 days'
) THEN 0 ELSE 1 END AS churn_label -- 1 means churned (no repeat)
FROM base c
)
SELECT *, NTILE(5) OVER (ORDER BY last_order_date) AS recency_bin
FROM label_window;
Temporal holdout policy: use data with cutoff dates separated by at least label window length. For example, train on cutoffs through 2025-12-01, validate on cutoff 2026-01-01, test on cutoff 2026-03-01.
Zapier webhook payload to push score to Klaviyo or Shopify
POST /hooks HTTP/1.1
Content-Type: application/json
{
"customer_id": "CUST_0001",
"email": "alice@example.com",
"churn_score_v1": 78,
"score_bucket": "80-100",
"timestamp": "2026-06-01T12:00:00Z"
}
Shopify metafield naming convention
- namespace: lifecycle
- key: churn_score_v1
- value_type: integer
Labeling and validation best practices
Anti-leakage guardrails
- Enforce a hard label cutoff date. Build features only from events strictly before that date.
- Do not use derived fields that rely on post-cutoff events, for example do not include "days_to_next_order".
- Make feature pipelines immutable for a given cutoff; record pipeline version in metadata.
Temporal holdout plan
Split by cutoff date, not random customer ID. Example timeline approach
- Train: cutoffs ending up to 6 months prior to the most recent period.
- Validation: a non-overlapping month after training cutoffs.
- Test: the most recent cutoff period to approximate production timing.
Validation checks
- Pre-period balance: compare average orders and recency between treatment and holdout before sending messages.
- Label quality audit: sample 100 records and check raw order history to verify churn label correctness.
- Feature leakage scan: flag any feature with correlation >0.95 to the label in the pre-cutoff data as suspicious.
Experiment design and measurement worksheet
Design
- Unit: customer profile (use email hash or Shopify customer_id consistently).
- Treatment: win-back campaign (email + optional SMS) triggered for high-risk segment.
- Holdout: randomly withhold a fixed percentage of the high-risk segment (example: 20%) for measurement only.
- Observation window: at least equal to typical reorder window or 30 days minimum for quick tests. Use longer windows for products with longer cadence.
KPI formulas
- Repeat rate = customers_with_repeat_order_in_window / customers_in_group
- Revenue per recipient = revenue_from_group / customers_in_group
- Incremental revenue = revenue_treatment - revenue_holdout
Sample-size calculator (proportion test)
Use the standard two-proportion formula. For practical use, here is the formula and a worked hypothetical example.
Formula (per group sample size)
n = ( (Z_alpha/2 + Z_beta)^2 * (p1*(1-p1) + p2*(1-p2)) ) / (p1 - p2)^2
Parameters
- Z_alpha/2 = 1.96 for 95% confidence.
- Z_beta = 0.84 for 80% power.
- p1 = baseline repeat rate, p2 = expected repeat rate under treatment.
Hypothetical example. Suppose baseline repeat rate p1 = 0.12. You aim for a 20 percent relative uplift, so p2 = 0.144. Plugging numbers gives an approximate n per group. Use a spreadsheet to compute exact values. If you cannot reach this n, either increase the effect you aim to detect or run a longer experiment.
Analysis checklist
- Pre-registered analysis window and KPIs.
- Confirm randomization integrity and no cross-contamination.
- Compute confidence intervals and p-values; report absolute and relative uplift and cost per incremental order.
- Check intermediate metrics like open and click rates to debug null results.
Production monitoring and maintenance playbook
Monitoring dashboard spec (minimum)
- Data completeness: percent of profiles with non-null customer_id; alert if < 99%.
- Join key health: percent null on customer_id or email after ETL; alert if > 10%.
- Feature null rates: alert if any key feature null rate exceeds 10%.
- Score distribution: daily histogram and percentiles; alert if median shifts by more than 0.5 standard deviations vs baseline.
- Label drift: proportion of labeled churn events per window; alert if shift > 15 percent relative.
- Sync failures: number of failed pushes to Klaviyo or Shopify; alert on any non-zero failures for more than one job cycle.
Remediation playbook
- Missing join keys: run mapping job to backfill with email as secondary key; pause write to downstream systems until mapping is fixed.
- Schema change detection: fail the job and notify owner; include previous and new schema sample in the alert.
- Drift detected: freeze rollout of new model or score and revert to previous stable version; run diagnostics to identify feature changes.
Retrain cadence and rollback policy
Start with monthly retrain windows for the first three months, then move to quarterly if drift is low. Version every model and keep a rollback flag that can reassign the last stable score to profiles if production issues occur.
Privacy and compliance checklist plus vendor DPA clause
Minimum privacy steps
- Minimize PII sent to vendors. Prefer hashed customer_id/email when possible.
- Document data flows and retention in an internal register.
- Maintain an auditable deletion workflow: map customer_id to vendor identifier and confirm deletion within agreed SLA.
- For SMS, ensure opt-in consent and maintain suppression lists synced to your ESP before campaigns.
Sample DPA clause (short form)
Vendor will process Personal Data only on Controller's documented instructions. Vendor will not combine Controller data with other customers' data. Vendor will implement reasonable technical and organizational measures to protect data. Vendor will, within 30 days of Controller request, return or delete Personal Data and confirm completion. Vendor will notify Controller within 72 hours of any confirmed data breach affecting Personal Data.
Note: adjust timeframes to match your legal policy. The clause above is a starting point for negotiation.
Sample deletion workflow
- Owner receives deletion request for customer_id X.
- Owner runs query to collect vendor identifiers and export to secure channel.
- Owner posts deletion request via vendor API and records request ID.
- Vendor confirms deletion; owner rechecks vendor dataset and finalizes audit log within SLA.
Scale path templates and hiring checklist
When to scale beyond the 90-day test
Choose Path B or C only if: the experiment shows positive incremental lift after costs, you have a data owner for ongoing ops, and you can support retraining and monitoring. If any of those are missing, iterate on rules instead.
Vendor selection cheat-sheet
- Vendor A (no-code): fastest launch, limited transparency, minimal engineering required.
- Warehouse-first provider: medium build, scores reusable, requires SQL and sync engineering.
- Bespoke agency: high cost, longer delivery, best when product complexity demands it.
Appendix: artifacts and quick reference
Notebook skeleton (Python pseudocode)
# connect to warehouse # extract features using SQL snippets # apply versioned feature transformations # load labels with cutoff date # split by cutoff for temporal validation # train simple logistic regression as baseline # evaluate and export scores to CSV
Common failure modes and quick fixes
- Shopify merged customers: detect duplicate emails and map to canonical customer_id weekly.
- Field rename in app export: keep a mapping layer and schema checksum job to catch changes within one job cycle.
- Unexpected nulls after a promo: check discount-related features and cap extreme values before modeling.
Final practical decision
Run the 90-day rule-based test if speed and low overhead matter. Move to warehouse-first ML only if you prove incremental lift and you have a data owner to operate the pipeline. Avoid bespoke models until the experiment shows payback that justifies ongoing ML ops cost.
