Production-ready AI cart abandonment email sequence for Shopify

Most teams trying to build an AI cart abandonment email sequence for Shopify do not have a creativity problem. They have a production safety problem.

Contents

Minimal runnable quickstart: ship a 1% canary in under a day
Webhook handlers: safe Shopify verification in Node and Python
Normalization and PII redaction: keep the model blind to identities
Decision service: provider specific model calls behind a thin abstraction
Schema enforcement and guardrails: overwrite the model when needed
ESP adapters: Klaviyo and Shopify Email
Idempotency and discount orchestration
Observability and alerts: what to watch and how to react
Privacy, DPA language, and DSAR operational scripts
A/B test design for revenue per recipient (RPR)
Operational rollout and rollback checklist
CI contract tests for model schema and normalization invariants
Suggested internal testbench and ownership
Practical decision: how to ship this safely

The failure pattern is predictable: someone glues a model call into a webhook, forwards full customer data including email, and starts issuing discounts with no idempotency. It works for a demo. It is unmaintainable in production.

This guide takes the opposite approach. Treat the model as a small, constrained decision engine behind a hard shell of schemas, rules, and monitoring. You get:

a minimal, runnable cart abandonment flow you can ship in hours or days
copy paste webhook verification in Node and Python, with replay protection
a normalization layer that strips PII before any model call
idempotent discount orchestration with a clear database schema
observability queries, alert ideas, and a rollback playbook
privacy language for DPAs and DSAR handling scripts

Prerequisites before you start:

Shopify admin access and API credentials
an ESP account such as Klaviyo or Shopify Email
access to at least one model provider (for example OpenAI or Anthropic)
a place to run a small service (serverless or container) and a CI pipeline

Minimal runnable quickstart: ship a 1% canary in under a day

The fastest credible way to ship is a thin vertical slice. One play. One email. One experiment flag at 1 percent of eligible traffic.

Step 1: clone the reference repo

Use a small, focused reference project. For illustration, imagine a repository with this layout:

shopify-ai-cart-abandon/
  README.md
  package.json
  src/
    index.ts
    webhook/
      shopifyCheckout.ts
    decision/
      normalize.ts
      modelClient.ts
      schema.ts
      guardrails.ts
    discounts/
      orchestrator.ts
      db.ts
    esp/
      klaviyo.ts
      shopifyEmail.ts
  python/
    webhook_fastapi.py
    verify_shopify.py
  migrations/
    001_create_idempotency_table.sql
  test/
    normalize.fixtures.json
    normalize.test.ts
    webhookVerify.test.ts
    contractModelSchema.test.ts
  .github/
    workflows/ci.yml

You can adapt this structure directly. The rest of the guide matches these filenames so you can copy paste with minimal edits.

Step 2: configure environment

Define configuration using environment variables, not hardcoded secrets:

SHOPIFY_API_KEY=<server side key>
SHOPIFY_API_SECRET=<shared secret for webhook HMAC>
OPENAI_API_KEY=<or ANTHROPIC_API_KEY etc>
ESP_PROVIDER=klaviyo
KLAVIYO_API_KEY=<optional>
DB_URL=postgres://...
ABANDONMENT_ENABLE_CANARY=true
ABANDONMENT_CANARY_PERCENT=1

Expose a single public HTTP endpoint for Shopify to call, for example /webhooks/shopify/checkout.

Step 3: wire the Shopify webhook

In Shopify admin:

Create a webhook on the event that best matches your flow. Common options:
- checkouts/update
- carts/update
- orders/create with abandoned checkouts data
Point it at your endpoint URL over HTTPS
Use JSON and set the shared secret to your SHOPIFY_API_SECRET

During the canary phase, keep the flow simple: trigger only when there is an email on the checkout and the cart is abandoned for at least a threshold, for example 30 minutes. The service will enforce these rules; the webhook just delivers signals.

Step 4: run tests, then deploy a small slice

Before you send real emails:

run the unit tests locally: npm test or pnpm test
use ngrok or similar to expose your local endpoint and test Shopify webhook delivery
confirm that invalid HMACs are rejected with 401 and valid ones are accepted
verify that model calls never see raw email addresses in logs

Deploy to your target environment only after these pass. Start with ABANDONMENT_CANARY_PERCENT=1 for live traffic. The rollout section later explains how to graduate this to 10, 50, then 100 percent.

Webhook handlers: safe Shopify verification in Node and Python

Incorrect HMAC handling is where many cart abandonment hooks go wrong. Signature verification must operate on the raw request body bytes, not a parsed object. It also needs timing safe comparison and some basic replay protection.

Node example with Express and raw body capture

Set up Express to store the raw body before any JSON parsing.

// src/index.ts
import express from "express";
import crypto from "crypto";
import { handleShopifyCheckout } from "./webhook/shopifyCheckout";

const app = express();

// raw body saver
app.use(
  express.json({
    verify: (req: any, res, buf) => {
      req.rawBody = buf;
    },
  })
);

function timingSafeEqual(a: Buffer, b: Buffer): boolean {
  if (a.length !== b.length) return false;
  return crypto.timingSafeEqual(a, b);
}

function verifyShopifyHmac(
  rawBody: Buffer,
  headerHmac: string | undefined,
  secret: string
): boolean {
  if (!headerHmac) return false;
  const digest = crypto
    .createHmac("sha256", secret)
    .update(rawBody)
    .digest("base64");

  const expected = Buffer.from(digest, "utf8");
  const provided = Buffer.from(headerHmac, "utf8");
  return timingSafeEqual(expected, provided);
}

app.post("/webhooks/shopify/checkout", async (req: any, res) => {
  const hmacHeader = req.get("X-Shopify-Hmac-Sha256");
  const secret = process.env.SHOPIFY_API_SECRET || "";

  if (!verifyShopifyHmac(req.rawBody, hmacHeader, secret)) {
    return res.status(401).send("Invalid signature");
  }

  // Optional simple replay protection: check timestamp header
  const timestamp = req.get("X-Shopify-Webhook-Id");
  // For a stronger design, persist webhook ids and reject duplicates.

  try {
    await handleShopifyCheckout(req.body);
    return res.status(200).send("ok");
  } catch (err) {
    console.error("Webhook error", err);
    return res.status(500).send("Internal error");
  }
});

app.listen(3000, () => {
  console.log("Server on :3000");
});

export { verifyShopifyHmac };

Key points:

req.rawBody is the exact byte sequence Shopify signed
use base64 digest with HMAC SHA256, as Shopify expects
use constant time comparison to avoid timing side channels
log only high level webhook results, never the full payload

Python example with FastAPI

# python/verify_shopify.py
import base64
import hashlib
import hmac
from typing import Optional

def verify_shopify_hmac(raw_body: bytes,
                        header_hmac: Optional[str],
                        secret: str) -> bool:
    if not header_hmac:
        return False
    digest = hmac.new(
        key=secret.encode("utf-8"),
        msg=raw_body,
        digestmod=hashlib.sha256,
    ).digest()
    expected_b64 = base64.b64encode(digest)
    provided_b64 = header_hmac.encode("utf-8")
    if len(expected_b64) != len(provided_b64):
        return False
    return hmac.compare_digest(expected_b64, provided_b64)

# python/webhook_fastapi.py
from fastapi import FastAPI, Request, HTTPException
from verify_shopify import verify_shopify_hmac
import os
import json

app = FastAPI()

@app.post("/webhooks/shopify/checkout")
async def shopify_checkout(request: Request):
    raw_body = await request.body()
    header_hmac = request.headers.get("X-Shopify-Hmac-Sha256")
    secret = os.environ.get("SHOPIFY_API_SECRET", "")

    if not verify_shopify_hmac(raw_body, header_hmac, secret):
        raise HTTPException(status_code=401, detail="Invalid signature")

    payload = json.loads(raw_body)

    # Call into shared business logic (could reuse Node service patterns)
    # handle_shopify_checkout(payload)

    return {"status": "ok"}

For replay protection, you can:

store webhook ids from X-Shopify-Webhook-Id in a small table
reject repeats within a retention window, for example 24 hours

The safest pattern is a three layer envelope:

minimize inputs and strip PII before inference
enforce JSON schema at prompt time and at runtime
override the model with business rules on output

The normalization layer sits between the webhook and the model. It creates derived features like cart value buckets and product category summaries, and it redacts or hashes any identifiers.

Normalization design

A practical normalized input shape to feed to the model might look like this:

export type NormalizedCheckout = {
  hashed_customer_id: string | null;
  is_returning_customer: boolean;
  cart_item_count: number;
  cart_value_bucket: "low" | "medium" | "high";
  cart_currency: string;
  product_category_counts: Record<string, number>;
  time_since_last_order_hours: number | null;
  historical_discount_usage_rate_bucket: "none" | "low" | "medium" | "high";
  prior_complaint_flag: boolean;
  in_sale_segment: boolean;
  is_high_risk_segment: boolean;
};

normalize implementation with PII stripping

// src/decision/normalize.ts
import crypto from "crypto";

type ShopifyCheckoutPayload = {
  id: number;
  customer?: {
    id?: number;
    email?: string;
    tags?: string[];
    orders_count?: number;
  };
  line_items: {
    product_id: number;
    title: string;
    product_type: string;
    quantity: number;
    price: string;
    total_discount: string;
  }[];
  currency: string;
  subtotal_price: string;
  customer_locale?: string;
  // ... other fields not needed for the model
};

export function hashIdentifier(value: string | number | undefined | null) {
  if (value === undefined || value === null) return null;
  const str = String(value);
  return crypto.createHash("sha256").update(str).digest("hex");
}

export function normalizeCheckout(
  payload: ShopifyCheckoutPayload
): NormalizedCheckout {
  const subtotal = parseFloat(payload.subtotal_price || "0");
  const cart_value_bucket =
    subtotal < 50 ? "low" : subtotal < 200 ? "medium" : "high";

  const product_category_counts: Record<string, number> = {};
  let itemCount = 0;
  for (const item of payload.line_items) {
    const category = item.product_type || "unknown";
    product_category_counts[category] =
      (product_category_counts[category] || 0) + item.quantity;
    itemCount += item.quantity;
  }

  const customer = payload.customer;
  const is_returning_customer =
    (customer?.orders_count || 0) > 0 ? true : false;

  // In a real system these would come from your own data store
  const historical_discount_usage_rate_bucket = "low";
  const prior_complaint_flag = false;
  const in_sale_segment = false;
  const is_high_risk_segment = false;

  return {
    hashed_customer_id: hashIdentifier(customer?.id || customer?.email),
    is_returning_customer,
    cart_item_count: itemCount,
    cart_value_bucket,
    cart_currency: payload.currency,
    product_category_counts,
    time_since_last_order_hours: null,
    historical_discount_usage_rate_bucket,
    prior_complaint_flag,
    in_sale_segment,
    is_high_risk_segment,
  };
}

The model never sees any of:

email
name
address
full checkout id

The service that writes to your ESP keeps that mapping. That service does not use the model provider as a processor for PII. This separation is important for risk and for vendor contracts.

PII redaction tests and fixtures

Create fixtures that include clear PII, then assert the normalized outputs do not carry it.

// test/normalize.fixtures.json
{
  "simple_checkout": {
    "id": 123,
    "customer": {
      "id": 999,
      "email": "alice@example.com",
      "orders_count": 3
    },
    "line_items": [
      {
        "product_id": 1,
        "title": "T shirt red",
        "product_type": "apparel",
        "quantity": 2,
        "price": "20.00",
        "total_discount": "0.00"
      }
    ],
    "currency": "USD",
    "subtotal_price": "40.00"
  }
}

// test/normalize.test.ts
import { normalizeCheckout, hashIdentifier } from "../src/decision/normalize";
import fixtures from "./normalize.fixtures.json";

describe("normalizeCheckout", () => {
  it("hashes customer identifiers and strips PII", () => {
    const payload: any = (fixtures as any).simple_checkout;
    const normalized = normalizeCheckout(payload);

    expect(normalized.hashed_customer_id).toBeDefined();
    expect(typeof normalized.hashed_customer_id).toBe("string");
    // hash must not match raw email or id
    expect(normalized.hashed_customer_id).not.toContain("alice");
    expect(normalized.hashed_customer_id).not.toBe("999");
  });

  it("creates stable buckets", () => {
    const payload: any = (fixtures as any).simple_checkout;
    const normalized = normalizeCheckout(payload);
    expect(normalized.cart_value_bucket).toBe("low");
    expect(normalized.cart_item_count).toBe(2);
    expect(normalized.product_category_counts["apparel"]).toBe(2);
  });
});

describe("hashIdentifier", () => {
  it("is deterministic", () => {
    const a = hashIdentifier("alice@example.com");
    const b = hashIdentifier("alice@example.com");
    expect(a).toBe(b);
  });
});

Decision service: provider specific model calls behind a thin abstraction

Do not spread provider SDK calls all over your codebase. Keep a small abstraction that:

accepts a NormalizedCheckout
returns a strongly typed decision object
hides provider specific request shapes and error handling

Decision schema

// src/decision/schema.ts
export type PlayType = "remind_only" | "small_discount" | "large_discount";

export type Decision = {
  version: string;
  play: PlayType;
  discount_percentage: number | null;
  reason_code: string;
};

You enforce that shape through JSON schema validation and guardrails.

OpenAI example

// src/decision/modelClient.ts
import OpenAI from "openai";
import { Decision, PlayType } from "./schema";
import { z } from "zod";

const decisionSchema = z.object({
  version: z.string(),
  play: z.enum(["remind_only", "small_discount", "large_discount"]),
  discount_percentage: z.number().nullable(),
  reason_code: z.string(),
});

const openaiClient = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

export type Provider = "openai" | "anthropic";

export async function getDecision(
  normalized: NormalizedCheckout,
  provider: Provider = "openai"
): Promise<Decision> {
  if (provider === "openai") {
    return await getDecisionOpenAI(normalized);
  }
  if (provider === "anthropic") {
    return await getDecisionAnthropic(normalized);
  }
  throw new Error("Unsupported provider");
}

async function getDecisionOpenAI(
  normalized: NormalizedCheckout
): Promise<Decision> {
  const prompt = buildPrompt(normalized);

  const response = await openaiClient.responses.create({
    model: "gpt-4.1-mini",
    input: prompt,
    response_format: {
      type: "json_schema",
      json_schema: {
        name: "cart_abandon_decision",
        schema: {
          type: "object",
          additionalProperties: false,
          properties: {
            version: { type: "string" },
            play: {
              type: "string",
              enum: ["remind_only", "small_discount", "large_discount"],
            },
            discount_percentage: {
              anyOf: [{ type: "number" }, { type: "null" }],
            },
            reason_code: { type: "string" },
          },
          required: ["version", "play", "discount_percentage", "reason_code"],
        },
        strict: true,
      },
    },
    temperature: 0.1,
  });

  const outputText =
    response.output[0].content[0].type === "output_text"
      ? response.output[0].content[0].text
      : "";

  let parsed: unknown;
  try {
    parsed = JSON.parse(outputText);
  } catch (err) {
    throw new Error("Model did not return valid JSON");
  }

  const result = decisionSchema.safeParse(parsed);
  if (!result.success) {
    throw new Error("Decision schema validation failed");
  }
  return result.data;
}

function buildPrompt(normalized: NormalizedCheckout): string {
  return `
You are a decision engine for cart abandonment emails.

Input is a JSON object with derived, non PII signals.

Decide:
- play: one of "remind_only", "small_discount", "large_discount"
- discount_percentage: null or a number between 5 and 25
- version: a static string "v1"
- reason_code: short code like "new_low_value", "loyal_high_value", "high_risk"

Business constraints:
- if in_sale_segment is true, play must be "remind_only" and discount_percentage must be null
- if is_high_risk_segment is true, play must be "remind_only"
- if historical_discount_usage_rate_bucket is "high", discount_percentage must be <= 10
- prefer "remind_only" for "low" cart_value_bucket and new customers

Return only JSON.

Input:
${JSON.stringify(normalized)}
`;
}

You keep temperature low for determinism. You also constrain the shape via response format and zod validation before doing anything with the result.

Anthropic example

A similar wrapper for Anthropic keeps the rest of your service unchanged.

import Anthropic from "@anthropic-ai/sdk";

const anthropicClient = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

async function getDecisionAnthropic(
  normalized: NormalizedCheckout
): Promise<Decision> {
  const prompt = buildPrompt(normalized);

  const msg = await anthropicClient.responses.create({
    model: "claude-3-5-sonnet-latest",
    input: prompt,
    max_output_tokens: 512,
  });

  // Anthropic may return JSON directly in text content
  const outputText =
    msg.output[0].content[0].type === "output_text"
      ? msg.output[0].content[0].text
      : "";

  let parsed: unknown;
  try {
    parsed = JSON.parse(outputText);
  } catch {
    throw new Error("Anthropic response is not valid JSON");
  }

  const result = decisionSchema.safeParse(parsed);
  if (!result.success) {
    throw new Error("Decision schema validation failed");
  }
  return result.data;
}

You can switch providers at deployment time without touching business logic.

Schema enforcement and guardrails: overwrite the model when needed

Even with strict schemas, you should treat the model as a suggestion engine. The guardrail layer is where policy lives.

Guardrail implementation

// src/decision/guardrails.ts
import { Decision } from "./schema";

type GuardrailContext = {
  normalized: NormalizedCheckout;
};

export function applyGuardrails(
  decision: Decision,
  ctx: GuardrailContext
): Decision {
  let result = { ...decision };

  // Hard business rules
  if (ctx.normalized.in_sale_segment) {
    result.play = "remind_only";
    result.discount_percentage = null;
    result.reason_code = "override_sale_segment";
  }

  if (ctx.normalized.is_high_risk_segment) {
    result.play = "remind_only";
    result.discount_percentage = null;
    result.reason_code = "override_high_risk";
  }

  // Range checks
  if (result.discount_percentage !== null) {
    if (result.discount_percentage < 0 || result.discount_percentage > 40) {
      result.play = "remind_only";
      result.discount_percentage = null;
      result.reason_code = "override_invalid_percentage";
    }
  }

  // Complaint protection
  if (ctx.normalized.prior_complaint_flag) {
    result.play = "remind_only";
    result.discount_percentage = null;
    result.reason_code = "override_prior_complaint";
  }

  return result;
}

Your decision pipeline now looks like:

normalize checkout
call model
validate JSON schema
apply guardrails and overrides
record final decision
orchestrate discount and send email

ESP adapters: Klaviyo and Shopify Email

Once you have a safe decision, you emit a single instruction to your ESP. A simple strategy is to map plays to templates or flows.

Klaviyo server side event example

// src/esp/klaviyo.ts
import fetch from "node-fetch";

type KlaviyoConfig = {
  apiKey: string;
};

type EspEvent = {
  profileEmail: string;
  event: string;
  properties: Record<string, any>;
};

export async function sendKlaviyoEvent(
  cfg: KlaviyoConfig,
  event: EspEvent
): Promise<void> {
  const res = await fetch("https://a.klaviyo.com/api/events", {
    method: "POST",
    headers: {
      Authorization: `Klaviyo-API-Key ${cfg.apiKey}`,
      "Content-Type": "application/json",
      Accept: "application/json",
    },
    body: JSON.stringify({
      data: {
        type: "event",
        attributes: {
          metric: {
            name: event.event,
          },
          properties: event.properties,
          profile: {
            email: event.profileEmail,
          },
        },
      },
    }),
  });

  if (!res.ok) {
    const body = await res.text();
    throw new Error(`Klaviyo error ${res.status}: ${body}`);
  }
}

You would call this only from your trusted server side code, and only with PII there. The decision service gives you the play and discount; the ESP adapter uses your own customer id to look up the email in your database rather than forwarding email into model calls.

Shopify Email pattern

For Shopify Email, a practical pattern is to write metafields or tags on the customer or checkout, such as:

ai_abandonment_play: remind_only | small_discount | large_discount
ai_abandonment_decision_id: <uuid>

Your email flow can use those metafields as triggers or segmentation criteria.

Idempotency and discount orchestration

Duplicate discounts are one of the highest risk failure modes. Retries from Shopify, from your queue, or from manual replays can all issue multiple codes if you do not store decisions and discount mappings.

Database schema

Use a small idempotency table keyed by a deterministic hash.

-- migrations/001_create_idempotency_table.sql
CREATE TABLE IF NOT EXISTS discount_idempotency (
  idempotency_key VARCHAR(128) PRIMARY KEY,
  shopify_shop_domain VARCHAR(255) NOT NULL,
  checkout_token VARCHAR(255) NOT NULL,
  decision_hash VARCHAR(64) NOT NULL,
  price_rule_id VARCHAR(64) NOT NULL,
  discount_code VARCHAR(64) NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX IF NOT EXISTS idx_discount_idempotency_checkout
  ON discount_idempotency (shopify_shop_domain, checkout_token);

Key generation pattern

A simple key idea: sha256(shop_domain + ":" + checkout_token + ":" + decision_hash). The decision hash can be a digest of the normalized input and final decision.

// src/discounts/db.ts
import crypto from "crypto";
import { Pool } from "pg";
import { Decision } from "../decision/schema";

const pool = new Pool({ connectionString: process.env.DB_URL });

export function buildDecisionHash(decision: Decision): string {
  const json = JSON.stringify(decision);
  return crypto.createHash("sha256").update(json).digest("hex");
}

export function buildIdempotencyKey(
  shopDomain: string,
  checkoutToken: string,
  decision: Decision
): string {
  const decisionHash = buildDecisionHash(decision);
  return crypto
    .createHash("sha256")
    .update(`${shopDomain}:${checkoutToken}:${decisionHash}`)
    .digest("hex");
}

export async function getExistingDiscount(
  idempotencyKey: string
): Promise<{ price_rule_id: string; discount_code: string } | null> {
  const res = await pool.query(
    "SELECT price_rule_id, discount_code FROM discount_idempotency WHERE idempotency_key = $1",
    [idempotencyKey]
  );
  if (res.rowCount === 0) return null;
  return res.rows[0];
}

export async function saveDiscount(
  idempotencyKey: string,
  shopDomain: string,
  checkoutToken: string,
  decisionHash: string,
  priceRuleId: string,
  discountCode: string
): Promise<void> {
  await pool.query(
    `INSERT INTO discount_idempotency
     (idempotency_key, shopify_shop_domain, checkout_token,
      decision_hash, price_rule_id, discount_code)
     VALUES ($1, $2, $3, $4, $5, $6)
     ON CONFLICT (idempotency_key) DO NOTHING`,
    [
      idempotencyKey,
      shopDomain,
      checkoutToken,
      decisionHash,
      priceRuleId,
      discountCode,
    ]
  );
}

Orchestration with 409 handling

If your Shopify API call to create a price rule or discount hits a conflict (for example you used a duplicate code), you fetch the mapping instead of retrying blindly.

// src/discounts/orchestrator.ts
import { Decision } from "../decision/schema";
import {
  buildDecisionHash,
  buildIdempotencyKey,
  getExistingDiscount,
  saveDiscount,
} from "./db";

type DiscountResult = {
  priceRuleId: string | null;
  discountCode: string | null;
};

export async function ensureDiscountForDecision(
  shopDomain: string,
  checkoutToken: string,
  decision: Decision
): Promise<DiscountResult> {
  if (decision.play === "remind_only" || decision.discount_percentage === null) {
    return { priceRuleId: null, discountCode: null };
  }

  const decisionHash = buildDecisionHash(decision);
  const key = buildIdempotencyKey(shopDomain, checkoutToken, decision);

  const existing = await getExistingDiscount(key);
  if (existing) {
    return {
      priceRuleId: existing.price_rule_id,
      discountCode: existing.discount_code,
    };
  }

  // Create price rule + discount code via Shopify Admin API
  // This is a simplified placeholder; in real code use the official SDK.
  const { priceRuleId, discountCode } =
    await createShopifyDiscount(shopDomain, decision.discount_percentage);

  // Persist idempotent mapping
  await saveDiscount(
    key,
    shopDomain,
    checkoutToken,
    decisionHash,
    priceRuleId,
    discountCode
  );

  return { priceRuleId, discountCode };
}

async function createShopifyDiscount(
  shopDomain: string,
  percentage: number
): Promise<{ priceRuleId: string; discountCode: string }> {
  // Pseudo code: use fetch or Shopify SDK
  // Handle 409 conflicts by reading existing rule if applicable.
  const priceRuleId = "pr_123";
  const discountCode = "SAVE10";
  return { priceRuleId, discountCode };
}

This pattern makes retries safe. Even if your worker processes the same checkout five times, the customer sees only one code.

Observability and alerts: what to watch and how to react

If you cannot see it, you cannot safely run it. You need metrics for errors, discount spend, play distribution, and deliverability.

Core metrics

webhook_verification_failure_rate
ratio of 401 responses on the webhook endpoint
model_call_error_rate
parse failures, schema rejects, provider errors
decision_play_distribution
percentage of plays per type per hour
discount_issue_rate
discounts created per minute
discount_spend_velocity
approximate gross discount amount per hour (using historical averages if you cannot compute exact numbers at first)
email_complaint_rate
complaints per thousand sends
unsubscribe_rate
unsubscribes per thousand sends

Example Prometheus style metrics

# increments per webhook request
counter shopify_webhook_requests_total{status="ok"}
counter shopify_webhook_requests_total{status="invalid_signature"}

# model
counter cart_ai_model_calls_total{provider="openai", outcome="success"}
counter cart_ai_model_calls_total{provider="openai", outcome="error"}
counter cart_ai_schema_failures_total

# decisions
counter cart_ai_decisions_total{play="remind_only"}
counter cart_ai_decisions_total{play="small_discount"}
counter cart_ai_decisions_total{play="large_discount"}

# discount orchestration
counter cart_ai_discounts_created_total
counter cart_ai_discounts_reused_total
counter cart_ai_discount_errors_total{type="shopify_409"}

# email outcomes (fed from ESP webhooks)
counter cart_ai_email_sent_total
counter cart_ai_email_complaint_total
counter cart_ai_email_unsub_total

Sample Prometheus alert ideas

High schema failure
Query example:
sum(rate(cart_ai_schema_failures_total[15m])) / sum(rate(cart_ai_model_calls_total[15m])) > 0.005
Effect: page or alert operators, automatically flip traffic to a static fallback if you can.
Discount spike
sum(rate(cart_ai_discounts_created_total[5m])) > some_threshold
Set threshold based on historical manual discount volumes. If triggered, freeze discount issuance and fall back to remind only.
Play distribution drift
You can compare the ratio of each play type in the last 15 minutes to a baseline. A simple heuristic is to alert if a play type more than doubles its share for an hour. This catches model drift or prompt changes.

Datadog query examples

Webhook verification failure rate
sum:shopify_webhook_requests_total{status:invalid_signature}.rollup(sum, 300) / sum:shopify_webhook_requests_total{*}.rollup(sum, 300)
Complaint rate
1000 * sum:cart_ai_email_complaint_total.rollup(sum, 3600) / sum:cart_ai_email_sent_total.rollup(sum, 3600)

Attach a simple runbook to each alert that answers: what might cause this and what is the safe immediate action. The safe action is almost always to reduce traffic, freeze discount issuance, or route to a known good policy.

Privacy, DPA language, and DSAR operational scripts

You can do a lot to reduce risk with the design above, but you still need language in your vendor contracts and a plan for data subject requests.

Vendor and model provider DPA clauses

The goal is to keep model providers firmly as processors for non PII signals, not full customer records. You can adapt language such as:

Purpose limitation

Customer will send to Provider only pseudonymous identifiers and derived behavioral signals that do not directly identify an individual data subject. Provider will process such data solely to generate decision outputs for Customer’s cart abandonment workflows and for no other purpose.

Training and retention

Provider will not use Customer data to train or fine tune general purpose models. Provider will retain Customer data for no longer than is necessary to perform the contracted services and in any case no longer than [N] days, after which data will be deleted or irreversibly anonymized.

Subprocessing

Provider will not engage additional subprocessors that have access to Customer data without prior written notice and an opportunity for Customer to object where reasonable.

Internal data retention choices

For your own logs, a defensible pattern is:

do not log raw Shopify payloads in normal operation
log only hashed ids, decision summaries, and error codes
keep detailed decision logs for a short window, for example 30 days, to debug issues
aggregate metrics for longer, for example 12 months

DSAR handling script

Support teams need a clear script for data subject access and deletion requests that touch this system.

Access request

Confirm identity using your normal account verification process
Look up internal customer id from email
Query decision logs for records with the hashed identifier that matches this customer
Export:
- dates of cart abandonment decisions
- decision type (play)
- whether a discount was issued
Provide this summary to the customer in clear language

Deletion request

Confirm identity
Delete or anonymize any records in discount_idempotency and decision logs that reference this customer hash
Ensure forwarding of deletion to ESP, for example remove profile from cart abandonment list in Klaviyo
Do not attempt to modify aggregate metrics that do not identify the person

A/B test design for revenue per recipient (RPR)

You should not roll AI driven sequences to everyone without measuring whether they help. A simple primary metric is revenue per recipient for the cart abandonment series.

Sample size example

Imagine:

baseline RPR from your existing non AI sequence is 1.20 in your currency
you hope for a lift of 0.15 (about 12.5 percent)
your historical standard deviation in RPR is about 3.50
you want a significance level of 0.05 and power of 0.8

A rough two sample t approximation for sample size per arm is:

n_per_arm ≈ 2 * (Z_0.975 + Z_0.8)^2 * sigma^2 / delta^2

Using Z values of about 1.96 and 0.84, sigma of 3.50, and delta of 0.15, you get a requirement on the order of several thousand recipients per arm. You should recompute this with your own baseline and variance numbers, but this shows that you likely need thousands, not hundreds, of abandoned checkouts to get a clean read.

Guardrail metrics for the test

Alongside RPR, track:

complaint rate compared to baseline
unsubscribe rate
discount spend per recipient

A possible decision rule:

promote the AI sequence if RPR lift is positive and significant, and guardrails are stable
reject if complaints or unsubscribes increase beyond a tolerable relative band, even if RPR improves

Operational rollout and rollback checklist

A clean rollout saves you from chasing ghosts later. Treat this like any other production change.

Pre launch

CI green:
- normalization tests pass
- webhook verification tests pass
- schema contract tests pass for each provider
secrets set in your deployment environment and not in code
Shopify webhook delivers to a staging or test endpoint without errors
ESP test profile receives expected template with manual triggers

Canary rollout steps

1 percent traffic
- limit to remind only play or a very small discount
- monitor errors, discount volume, and complaints daily
10 percent traffic
- enable full set of plays
- start A/B test against your current sequence
50 percent traffic
- only after you see stable metrics across at least one full weekday and weekend
100 percent traffic
- only after your A/B test shows a clear benefit or is neutral with no meaningful downside on guardrails

Rollback criteria

Define explicit triggers for rollback before launch, for example:

schema failure rate exceeds a threshold for 15 minutes
discount creation rate spikes to more than a defined multiplier of historical manual rates
complaint or unsubscribe rate increases by more than a defined percentage for two consecutive days
critical webhook or discount errors stay elevated for more than a time window, for example 30 minutes

When any of these hit, your runbook steps might be:

flip feature flag to route all traffic to remind only with no discount
if issues persist, disable the AI flow entirely and revert to baseline campaign
capture logs and metrics around the incident window for later analysis

CI contract tests for model schema and normalization invariants

Every deployment should re verify that your model outputs and normalization still match the contract you expect.

Schema contract tests

Create synthetic normalized fixtures and feed them through a mocked model that returns sample responses for each provider. Assert validation and guardrails accept or override as expected.

// test/contractModelSchema.test.ts
import { applyGuardrails } from "../src/decision/guardrails";
import { decisionSchema } from "../src/decision/modelClient";

describe("model decision contract", () => {
  it("accepts a valid decision", () => {
    const raw = {
      version: "v1",
      play: "small_discount",
      discount_percentage: 10,
      reason_code: "new_low_value",
    };
    const parsed = decisionSchema.parse(raw);
    expect(parsed.play).toBe("small_discount");
  });

  it("rejects extra fields", () => {
    const raw: any = {
      version: "v1",
      play: "small_discount",
      discount_percentage: 10,
      reason_code: "new_low_value",
      extra: "not_allowed",
    };
    expect(() => decisionSchema.parse(raw)).toThrow();
  });

  it("guardrails override invalid percentage", () => {
    const decision = {
      version: "v1",
      play: "small_discount" as const,
      discount_percentage: 90,
      reason_code: "test",
    };
    const normalized: any = {
      in_sale_segment: false,
      is_high_risk_segment: false,
      prior_complaint_flag: false,
    };
    const result = applyGuardrails(decision, { normalized });
    expect(result.play).toBe("remind_only");
    expect(result.discount_percentage).toBeNull();
  });
});

Normalization invariants

Add tests that fail if:

new PII fields leak into the normalized structure
bucket boundaries are changed without an explicit test update
hashIdentifier behavior changes unexpectedly

Suggested internal testbench and ownership

Before you give this system real traffic, teams should agree on ownership and how they will test it on their own data.

Ownership

service owner
responsible for code, on call, and incident response
data or analytics owner
interprets A/B tests and drift metrics
privacy or legal owner
approves vendor DPAs and retention choices

Testbench ideas

A useful internal testbench can:

replay a sample of historical abandoned checkouts (with PII removed) through the decision service
inspect the distribution of plays
flag any decisions that would have violated current discount policies
let operators simulate guardrail changes and recompute outcomes

You can implement this as a separate script that reads from a file or staging database and calls the same decision code with logging enabled.

Practical decision: how to ship this safely

Choose a simple path:

If you need to move quickly and do not yet have deep ML experience, keep the AI decision set small, run synchronous calls with tight timeouts, and constrain plays to reminder vs small discount only. Get the plumbing right before you expand complexity.
If you expect high volume or tighter control requirements, invest in the asynchronous pattern with a queue and worker, add formal drift monitoring, and give your operators a control panel to adjust guardrails without changing code.
If you cannot keep someone on the hook to watch metrics and handle alerts, prefer a static rules based cart abandonment flow and revisit AI decisions later. An unmaintained AI sequence is worse than a well run manual one.

The winning pattern is not the fanciest model. It is the one that treats AI as a small component within a hardened system you can test, observe, and change on purpose.