Most teams trying “AI onboarding email sequences for SaaS” are stuck on one thing: they do not have a code-complete, reviewable slice they can actually ship.
- TL;DR: who this is for and what ships in a week
- What success looks like: KPIs and decision owners
- Repo quickstart: fork, run, and see a mock send
- Event contract and schema gates
- Prompt spec and versioning as code
- Generation layer: modern API usage with retries and token accounting
- Validator suite: layered checks with tests
- ESP integrations and safe sending
- Canary rollout and experimentation
- Observability: metrics, dashboards, and logs
- Deliverability warm up and DMARC / ESP guidance
- Privacy, legal, and compliance checklist
- Cost control and model economics
- Operator runbooks and escalation paths
- Tests, CI, and release flow for prompt changes
- Appendix: artifacts, author, and trust signals
- Next steps: choose your first slice and ship
This guide fixes that by walking you to a narrow, production-safe pipeline you can stand up in about a week: one onboarding email, one ESP, 1 percent canary, tested and observable.
TL;DR: who this is for and what ships in a week
Personas
- Growth PM or lifecycle marketer who owns activation and trial conversion
- Backend / infra engineer who owns events, APIs, and CI
- Deliverability specialist or email ops who owns domain reputation
- Privacy / legal reviewer who needs clear artifacts, not slides
Scope for week one
- 1 AI generated onboarding email template (for example, “Day 1: Welcome & first action”)
- Single ESP integration (SendGrid or Amazon SES)
- JSON event contract with tests
- Generation service with retry, token accounting, and idempotent ESP send
- Validator stack wired in: syntax, banned phrases, PII, semantic checks
- 1 percent canary rollout with automated decision rules
- Prometheus metrics, Grafana dashboard exports, alert rules
- Legal & deliverability artifacts: DPIA outline, DMARC example, footer templates
- CI pipeline for prompt changes with fixtures
Timeline (hypothetical, assuming one engineer + one marketer)
- Day 1–2 fork repo, hook up ESP sandbox, wire events and schema tests
- Day 3–4 prompts, validators, metrics, first end to end run to mock ESP
- Day 5–7 canary flag, dashboards, legal/deliverability review, limited live traffic
Everything below is written to support that thin slice. You can widen later.
What success looks like: KPIs and decision owners
You are not just “using a model.” You are changing a production communication channel. Treat it like a feature rollout.
Core KPIs
- Activation rate for the targeted onboarding step (for example, accounts that reach “Aha” event in 7 days)
- Complaint rate delta (ESP reported spam complaints vs baseline template)
- Bounce rate delta (hard/soft bounce rate vs baseline)
- Generation error rate (fraction of sends that fall back due to validation or model error)
- Token cost per email (prompt tokens + completion tokens multiplied by price per token for your chosen model)
All metrics should be sliced by cohort = control | ai_canary.
Decision owners
| Decision | Metric trigger | Primary owner | Consulted |
|---|---|---|---|
| Flip canary on/off | Complaint_rate_delta, bounce_rate_delta, generation_error_rate | Growth PM | Deliverability, infra |
| Revert prompt version | Activation drop or quality alerts | Growth PM | Infra |
| Pause all AI sends | PII incident or reputation risk | Deliverability / Security | Legal, PM |
| Approve new attributes for personalization | Data catalog / DPIA review | Privacy / Legal | PM, data |
Sample SLA and rollback ownership
- If complaint_rate_delta exceeds a configurable threshold (for example, two times baseline level) over a rolling window, deliverability can disable AI sends without waiting for PM approval.
- If generation_error_rate exceeds a threshold (for example, 5 percent of attempts) in a short window, infra switches traffic to deterministic templates and opens an incident.
- Prompt changes ship behind a feature flag; if activation does not improve within a predefined test window, PM reverts to the previous prompt version.
Repo quickstart: fork, run, and see a mock send
You want your growth PM to see a real generated email, with logs and metrics, within an hour. The structure below is designed for a public Git repo, a downloadable zip, and a Docker or Colab experience.
Suggested repo layout
ai-onboarding-pipeline/
README.md
docker/
Dockerfile
docker-compose.yml
notebooks/
power_check.ipynb
src/
config.py
events/schema.py
events/samples/
prompts/
onboarding_day1.yaml
generation/
client.py
models.py
validation/
syntax.py
banned_phrases.py
pii.py
semantic.py
esp/
sendgrid_client.py
ses_client.py
mock_esp.py
rollout/
cohorts.py
canary_policy.py
observability/
metrics.py
logging_config.py
dashboards/
grafana_onboarding.json
prometheus_rules.yml
legal_deliverability/
dpia_outline.md
dmarc_example.txt
footers/
base_footer_en.html
.github/workflows/
ci.yml
tests/
test_schema.py
test_prompts.py
test_validation.py
test_esp.py
test_rollout.py
Docker / docker compose quickstart
# docker/Dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY . /app
RUN pip install --no-cache-dir -r requirements.txt
ENV PYTHONUNBUFFERED=1
CMD ["python", "-m", "src.esp.mock_esp"]
# docker/docker-compose.yml
version: "3.9"
services:
pipeline:
build:
context: ..
dockerfile: docker/Dockerfile
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- ESP_MODE=mock
ports:
- "8000:8000"
Colab / Jupyter demo flow
Structure your notebook to do this in order:
- Load a sample event payload from
events/samples/trial_signup.json - Run schema validation
- Call the generation module to produce subject + body
- Run validators and inspect failures
- Send to mock ESP endpoint and show sample response
- Display generated metrics from a short run
Event contract and schema gates
The event schema is the backbone. Most failures later in the pipeline start with messy or drifting events.
JSON Schema for onboarding event
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "OnboardingEmailContext",
"type": "object",
"required": ["user_id", "email", "plan", "signup_ts", "product_usage"],
"properties": {
"user_id": { "type": "string", "minLength": 1 },
"email": { "type": "string", "format": "email" },
"locale": { "type": "string", "pattern": "^[a-z]{2}(-[A-Z]{2})?$" },
"plan": { "type": "string", "enum": ["free", "trial", "pro", "enterprise"] },
"signup_ts": { "type": "string", "format": "date-time" },
"product_usage": {
"type": "object",
"required": ["has_completed_tutorial", "events_last_24h"],
"properties": {
"has_completed_tutorial": { "type": "boolean" },
"events_last_24h": { "type": "integer", "minimum": 0 }
}
},
"consents": {
"type": "object",
"properties": {
"email_marketing": { "type": "boolean" }
}
}
},
"additionalProperties": false
}
Python schema gate and test harness
# src/events/schema.py
from jsonschema import Draft202012Validator
import json
from pathlib import Path
SCHEMA_PATH = Path(__file__).with_name("onboarding_schema.json")
SCHEMA = json.loads(SCHEMA_PATH.read_text())
VALIDATOR = Draft202012Validator(SCHEMA)
def validate_event(payload: dict) -> None:
errors = sorted(VALIDATOR.iter_errors(payload), key=lambda e: e.path)
if errors:
messages = [f"{'/'.join(map(str, e.path))}: {e.message}" for e in errors]
raise ValueError(f"Event schema validation failed: {messages}")
# tests/test_schema.py
import json
from pathlib import Path
import pytest
from src.events.schema import validate_event
def load_sample(name: str) -> dict:
return json.loads((Path(__file__).parents[1] / "src/events/samples" / name).read_text())
def test_valid_event_passes():
payload = load_sample("trial_signup.json")
validate_event(payload)
def test_missing_email_fails():
payload = load_sample("trial_signup.json")
payload.pop("email", None)
with pytest.raises(ValueError):
validate_event(payload)
CI job to block schema drift
In .github/workflows/ci.yml add a job that runs pytest tests/test_schema.py for any change under src/events/. Require this job for merge. Any incompatible change fails the pull request before it reaches production.
Prompt spec and versioning as code
Prompts are code. Treat them like code.
Prompt spec structure
# src/prompts/onboarding_day1.yaml
version: "1.1.0"
status: "canary" # canary | stable | archived
owner: "growth@example.com"
model_hint: "gpt-4.1-mini"
locale: "en-US"
input_contract:
schema_ref: "events/onboarding_schema.json"
fixture_inputs:
- "events/samples/trial_signup.json"
style_guidelines:
tone: "concise, practical, friendly, no hype"
banned_phrases:
- "limited time offer"
- "act now"
required_elements:
- "one clear CTA link"
- "short summary of feature value"
- "preheader text"
system_prompt: |
You write onboarding emails for a SaaS product.
Constraints:
- Do not make guarantees about uptime or security beyond what is given.
- Respect locale and plan.
- Avoid urgency or false scarcity.
user_template: |
Write a welcome email for the following user context (JSON):
{{ event_json }}
Return JSON with keys: subject, preheader, html_body.
expected_output_checks:
subject:
max_length: 80
html_body:
must_include:
- "<a "
- "Get started"
Prompt fixtures and CI
# tests/test_prompts.py
import json
from pathlib import Path
import yaml
from src.generation.client import generate_email
PROMPTS_DIR = Path("src/prompts")
def test_prompt_fixtures_generate_valid_shape(monkeypatch):
# Use a cheap stub model in CI
monkeypatch.setenv("MODEL_PROVIDER_MODE", "stub")
for prompt_file in PROMPTS_DIR.glob("*.yaml"):
spec = yaml.safe_load(prompt_file.read_text())
for fixture in spec["input_contract"]["fixture_inputs"]:
payload = json.loads((Path("src/events/samples") / fixture).read_text())
result = generate_email(spec, payload)
assert set(result.keys()) == {"subject", "preheader", "html_body"}
assert len(result["subject"]) <= spec["expected_output_checks"]["subject"]["max_length"]
Pull requests that change a prompt must pass these fixture tests before merging.
Generation layer: modern API usage with retries and token accounting
This is how you interface with a model provider in 2026 without surprising costs or brittle behavior.
Model client abstraction
# src/generation/models.py
from dataclasses import dataclass
from typing import Dict, Any, Tuple
import os
import time
import logging
from openai import OpenAI # official SDK
log = logging.getLogger(__name__)
@dataclass
class ModelResponse:
content: str
prompt_tokens: int
completion_tokens: int
model: str
latency_ms: float
class ModelClient:
def __init__(self, model: str):
self.model = model
self.client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
def generate(self, system_prompt: str, user_prompt: str) -> ModelResponse:
start = time.time()
response = self.client.responses.create(
model=self.model,
input=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
],
max_output_tokens=800,
temperature=0.4
)
latency_ms = (time.time() - start) * 1000
out = response.output[0].content[0].text
usage = response.usage
log.info(
"model.generate",
extra={
"model": self.model,
"prompt_tokens": usage.input_tokens,
"completion_tokens": usage.output_tokens,
"latency_ms": latency_ms,
},
)
return ModelResponse(
content=out,
prompt_tokens=usage.input_tokens,
completion_tokens=usage.output_tokens,
model=self.model,
latency_ms=latency_ms,
)
Generation with retry and idempotency hook
# src/generation/client.py
import json
import logging
import uuid
from typing import Dict, Any
import yaml
from tenacity import retry, wait_exponential, stop_after_attempt
from .models import ModelClient
from src.validation.pipeline import validate_generated_email
log = logging.getLogger(__name__)
def load_prompt_spec(path: str) -> Dict[str, Any]:
import pathlib
p = pathlib.Path(path)
return yaml.safe_load(p.read_text())
def make_idempotency_key(user_id: str, template_id: str) -> str:
return f"{template_id}:{user_id}"
@retry(wait=wait_exponential(multiplier=0.5, min=1, max=8),
stop=stop_after_attempt(3))
def generate_email(prompt_spec: Dict[str, Any], event: Dict[str, Any]) -> Dict[str, str]:
system_prompt = prompt_spec["system_prompt"]
user_template = prompt_spec["user_template"]
user_prompt = user_template.replace("{{ event_json }}", json.dumps(event, sort_keys=True))
client = ModelClient(prompt_spec["model_hint"])
resp = client.generate(system_prompt, user_prompt)
try:
parsed = json.loads(resp.content)
except json.JSONDecodeError as e:
log.warning(
"generation.invalid_json",
extra={"error": str(e), "raw": resp.content[:300]},
)
raise
email = {
"subject": parsed.get("subject", "").strip(),
"preheader": parsed.get("preheader", "").strip(),
"html_body": parsed.get("html_body", ""),
}
validate_generated_email(email, event, resp)
return email
The retry decorator handles transient model errors. Idempotency is handled at the ESP layer but based on a stable key from user id and template id.
Token accounting
Store measured prompt_tokens and completion_tokens as Prometheus histograms and per send logs. Cost per email is then:
cost_per_email = (avg_prompt_tokens + avg_completion_tokens) * price_per_token
Use hypothetical ranges while planning capacity. For instance, suppose you see 300 prompt tokens and 250 completion tokens, with cost per 1k tokens defined by your vendor. Multiply out for expected monthly email volume.
Validator suite: layered checks with tests
Validation is where you keep AI from hurting your reputation. Use a layered “validation pyramid” and emit metrics at each layer.
Syntactic validators
# src/validation/syntax.py
from typing import Dict
def check_lengths(email: Dict[str, str]) -> None:
if len(email["subject"]) > 80:
raise ValueError("Subject too long")
if len(email["html_body"]) > 12000:
raise ValueError("HTML body too long")
def check_basic_html(email: Dict[str, str]) -> None:
body = email["html_body"]
if "<script" in body.lower():
raise ValueError("Script tags not allowed")
Banned phrase engine
# src/validation/banned_phrases.py
from typing import Dict, List
DEFAULT_BANNED = [
"100% guaranteed",
"act now",
"risk-free",
]
def check_banned_phrases(email: Dict[str, str],
extra_banned: List[str] | None = None) -> None:
phrases = set(DEFAULT_BANNED + (extra_banned or []))
haystack = (email["subject"] + " " + email["html_body"]).lower()
hits = [p for p in phrases if p.lower() in haystack]
if hits:
raise ValueError(f"Banned phrases detected: {hits}")
Keep this list small and opinionated. Let marketers tune it per template instead of global hardcoding.
PII detection: regex + NER
# src/validation/pii.py
import re
from typing import Dict
EMAIL_RE = re.compile(r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}")
PHONE_RE = re.compile(r"\+?\d[\d\s\-]{7,}\d")
def detect_pii(text: str) -> dict:
return {
"emails": EMAIL_RE.findall(text),
"phones": PHONE_RE.findall(text),
}
def check_pii(email: Dict[str, str], user: Dict[str, str]) -> None:
body = email["html_body"]
pii = detect_pii(body)
user_email = user.get("email")
# Allow the user's own email in body if that is consistent with your template style
found_emails = [e for e in pii["emails"] if e != user_email]
if found_emails:
raise ValueError("Unexpected email addresses in output")
For higher accuracy, add a small local NER model or hosted classifier and treat it as an extra signal. Regex covers many obvious incidents with low overhead. For logs, use a redact before log pattern so PII does not land in plain text.
Semantic checks
# src/validation/semantic.py
from typing import Dict
from dataclasses import dataclass
@dataclass
class SemanticResult:
toxicity_score: float
off_policy: bool
def semantic_guardrails(email: Dict[str, str]) -> SemanticResult:
# Placeholder: call your safety classifier here.
# In tests, stub this so CI is deterministic.
return SemanticResult(toxicity_score=0.0, off_policy=False)
def check_semantic(email: Dict[str, str]) -> None:
result = semantic_guardrails(email)
if result.off_policy:
raise ValueError("Semantic safety violation")
Validation pipeline and metrics
# src/validation/pipeline.py
from typing import Dict
import logging
from .syntax import check_lengths, check_basic_html
from .banned_phrases import check_banned_phrases
from .pii import check_pii
from .semantic import check_semantic
from src.observability.metrics import VALIDATION_COUNTER
log = logging.getLogger(__name__)
def validate_generated_email(email: Dict[str, str],
event: Dict[str, str],
model_resp) -> None:
layers = [
("syntax", lambda: (check_lengths(email), check_basic_html(email))),
("banned_phrases", lambda: check_banned_phrases(email)),
("pii", lambda: check_pii(email, event)),
("semantic", lambda: check_semantic(email)),
]
for name, fn in layers:
try:
fn()
VALIDATION_COUNTER.labels(layer=name, status="pass").inc()
except Exception as e:
VALIDATION_COUNTER.labels(layer=name, status="fail").inc()
log.warning("validation.failed", extra={"layer": name, "error": str(e)})
raise
Validator tests
# tests/test_validation.py
import pytest
from src.validation.syntax import check_lengths
from src.validation.banned_phrases import check_banned_phrases
def test_subject_length_violation():
email = {"subject": "x" * 200, "html_body": "<p>hi</p>"}
with pytest.raises(ValueError):
check_lengths(email)
def test_banned_phrase_detected():
email = {"subject": "Act now", "html_body": "<p>hi</p>"}
with pytest.raises(ValueError):
check_banned_phrases(email)
Use fixtures for borderline cases and involve marketing to tune false positives over time.
ESP integrations and safe sending
Your model is only half the story. ESP idempotency and error handling prevent double sends and broken campaigns.
SendGrid integration with idempotency
# src/esp/sendgrid_client.py
import os
import logging
import requests
from typing import Dict
log = logging.getLogger(__name__)
SENDGRID_API_URL = "https://api.sendgrid.com/v3/mail/send"
class SendGridClient:
def __init__(self):
self.api_key = os.environ["SENDGRID_API_KEY"]
def send_email(self, email: Dict[str, str],
to_email: str,
idempotency_key: str) -> Dict:
headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json",
"Idempotency-Key": idempotency_key,
}
payload = {
"personalizations": [{"to": [{"email": to_email}]}],
"from": {"email": os.environ.get("FROM_EMAIL")},
"subject": email["subject"],
"content": [{"type": "text/html", "value": email["html_body"]}],
}
resp = requests.post(SENDGRID_API_URL, json=payload, headers=headers, timeout=10)
if resp.status_code not in (200, 202):
log.error(
"sendgrid.send_failed",
extra={"status": resp.status_code, "body": resp.text[:300]},
)
raise RuntimeError("SendGrid send failed")
log.info(
"sendgrid.send_success",
extra={"idempotency_key": idempotency_key, "status": resp.status_code},
)
return {"status": resp.status_code}
SES example
# src/esp/ses_client.py
import os
import logging
from typing import Dict
import boto3
log = logging.getLogger(__name__)
class SESClient:
def __init__(self):
self.client = boto3.client("ses", region_name=os.environ.get("AWS_REGION"))
def send_email(self, email: Dict[str, str],
to_email: str,
idempotency_key: str) -> Dict:
resp = self.client.send_email(
Source=os.environ["FROM_EMAIL"],
Destination={"ToAddresses": [to_email]},
Message={
"Subject": {"Data": email["subject"]},
"Body": {"Html": {"Data": email["html_body"]}},
},
ConfigurationSetName=os.environ.get("SES_CONFIG_SET"),
)
message_id = resp.get("MessageId")
log.info(
"ses.send_success",
extra={
"idempotency_key": idempotency_key,
"message_id": message_id,
},
)
return {"message_id": message_id}
ESP tests
# tests/test_esp.py
from src.esp.sendgrid_client import SendGridClient
def test_sendgrid_handles_non_2xx(mocker):
client = SendGridClient()
mock_post = mocker.patch("src.esp.sendgrid_client.requests.post")
mock_post.return_value.status_code = 500
mock_post.return_value.text = "error"
email = {"subject": "Hi", "html_body": "<p>hi</p>"}
try:
client.send_email(email, "user@example.com", "key123")
except RuntimeError:
assert True
Wire idempotency keys through your end to end span and store them alongside ESP message ids for later reconciliation.
Canary rollout and experimentation
You do not flip all onboarding traffic to AI in one go. You start with a small, stable canary.
Cohort assignment with stable hashing
# src/rollout/cohorts.py
import hashlib
def assign_cohort(user_id: str, experiment_name: str, canary_percent: float) -> str:
key = f"{experiment_name}:{user_id}"
h = hashlib.sha256(key.encode("utf-8")).hexdigest()
bucket = int(h[:8], 16) / 0xFFFFFFFF
return "ai_canary" if bucket < canary_percent else "control"
Automated canary decision rules
Implement a daily job that reads metrics and applies a simple rule set such as the following matrix, using hypothetical thresholds:
- If complaint_rate_delta > threshold for 2 consecutive days, set canary_percent to 0 and revert prompt version.
- If bounce_rate_delta > threshold, restrict to known engaged users or pause.
- If activation_delta is positive and safety metrics are stable, gradually raise canary_percent.
Map each outcome to a runbook step so operators know which toggle to flip.
Power check notebook
In notebooks/power_check.ipynb, parameterize:
- Baseline activation rate
- Desired relative lift
- Significance level (alpha)
- Power (1 minus beta)
Use a standard two proportion test formula to estimate required sample sizes. The aim is not perfect statistics but to avoid tests with so little traffic that you draw false comfort from noise.
Observability: metrics, dashboards, and logs
Without observability, you will only discover problems when a big customer complains.
Prometheus instrumentation
# src/observability/metrics.py
from prometheus_client import Counter, Histogram
GEN_LATENCY = Histogram(
"ai_onboarding_generation_latency_ms",
"Model generation latency",
["model"],
buckets=(50, 100, 200, 400, 800, 1600, 3200),
)
TOKENS = Histogram(
"ai_onboarding_tokens",
"Tokens per email",
["type", "model"], # type = prompt | completion
buckets=(50, 100, 200, 400, 800, 1600),
)
VALIDATION_COUNTER = Counter(
"ai_onboarding_validation_events_total",
"Validation events by layer and status",
["layer", "status"],
)
ESP_SENDS = Counter(
"ai_onboarding_esp_sends_total",
"ESP send outcomes",
["provider", "status"], # status = success | failure
)
Hook these into the generation and ESP layers. Export metrics via an HTTP endpoint for Prometheus scraping.
Grafana dashboard exports
Include JSON definitions for panels such as:
- Generation latency by model over time
- Validation failures by layer (stacked bar)
- ESP sends success vs failure rate
- Complaint and bounce rates by cohort
Operators should be able to import a JSON file and see a working dashboard in minutes.
Prometheus alert rules
# dashboards/prometheus_rules.yml
groups:
- name: ai-onboarding
rules:
- alert: HighGenerationErrors
expr: rate(ai_onboarding_validation_events_total{status="fail"}[5m]) > 5
for: 10m
labels:
severity: warning
annotations:
summary: "AI onboarding validation failures increased"
- alert: ESPFailures
expr: rate(ai_onboarding_esp_sends_total{status="failure"}[5m]) > 1
for: 5m
labels:
severity: critical
annotations:
summary: "ESP send failures for AI onboarding"
Sample log schema
{
"ts": "2026-02-01T10:15:20Z",
"event": "onboarding_email_sent",
"user_id": "u_123",
"template_id": "onboarding_day1",
"cohort": "ai_canary",
"idempotency_key": "onboarding_day1:u_123",
"model": "gpt-4.1-mini",
"prompt_tokens": 320,
"completion_tokens": 260,
"validation_status": "pass",
"esp_provider": "sendgrid",
"esp_status": "success"
}
Apply redaction to any field that might include PII beyond the fields you consciously log.
Deliverability warm up and DMARC / ESP guidance
AI content does not give you a pass on deliverability basics. If anything, you should be more conservative.
IP / domain warm up
- Start AI traffic on the same authenticated domain and IP pool as your existing onboarding if possible.
- Keep initial canary volume small relative to your daily onboarding traffic so patterns do not trigger filters.
- Use seed addresses across major ISPs (Gmail, Outlook, Yahoo) to monitor placement.
DMARC, SPF, DKIM, BIMI basics
- Ensure SPF includes your ESP.
- DKIM signing should be enabled on the sending domain.
- DMARC should be configured with a policy aligned with your maturity (for example, start with a monitoring policy while you instrument reporting).
Include a DMARC example for your DNS administrator:
_dmarc.example.com. IN TXT "v=DMARC1; p=none; rua=mailto:dmarc-reports@example.com"
DMARC report parsing script
# legal_deliverability/dmarc_parser.py
import glob
import xml.etree.ElementTree as ET
def parse_aggregate_reports(path_pattern: str):
for file in glob.glob(path_pattern):
tree = ET.parse(file)
root = tree.getroot()
for record in root.findall("record"):
source_ip = record.find("row/source_ip").text
disposition = record.find("row/policy_evaluated/disposition").text
yield {"source_ip": source_ip, "disposition": disposition}
Use this to spot unexpected sending sources and alignment issues.
Privacy, legal, and compliance checklist
Legal reviewers do not want a marketing deck. They want a list of decisions and artifacts.
Privacy checklist for personalization
- Document each attribute used in the prompt (plan, locale, usage flags) and its source system.
- Ensure consent for marketing email exists and is enforced at query time.
- Define retention for raw model outputs and logs; avoid storing full content longer than needed.
- Limit PII passed into prompts. Prefer segment tags over free form descriptive text that includes identifiers.
Footer templates
<!-- legal_deliverability/footers/base_footer_en.html -->
<table role="presentation" width="100%" cellpadding="0" cellspacing="0">
<tr>
<td align="center" style="font-size:12px;color:#888;padding:16px">
You are receiving this email because you signed up for {{product_name}} with
{{user_email}}.
<br/>
<a href="{{manage_preferences_url}}">Manage preferences</a> |
<a href="{{unsubscribe_url}}">Unsubscribe</a>
<br/>
{{company_name}}, {{company_address}}
</td>
</tr>
</table>
DPIA outline
- Description of processing: AI generation of onboarding emails using limited behavioral and account data.
- Purpose: increase activation while respecting consent and privacy.
- Data categories: identifiers (email, user id), product usage metrics, plan details.
- Risks: PII leakage in content or logs, unintended profiling, cross border transfers.
- Safeguards: validation pipeline, PII redaction, strict access controls on logs, model provider data handling review.
- Residual risk and approval: signoff from data protection lead.
Cost control and model economics
AI onboarding is cheap on a per email basis at small scale, but it can surprise you when traffic grows.
Token accounting pattern
- Record prompt and completion tokens per send as metrics and logs.
- Compute rolling averages and store them as reference values.
- Simulate monthly cost with simple formulas using your vendor price list and projected volume.
Personalization depth vs cost vs brittleness
- Heavy personalization that uses many user attributes tends to increase prompt size and maintenance cost.
- Segment based personalization (for example, by plan + activity cluster) can be generated once and cached per segment.
- Use caching for high volume cohorts: you can generate a template per segment and insert only basic identifiers at send time.
A practical pattern is to generate copy at the cohort level and reserve direct per user model calls for key lifecycle moments or high value users.
Operator runbooks and escalation paths
Incidents will happen. The value is in how quickly you can detect and unwind them.
Runbook: high complaints spike
- Alert fires from complaint_rate_delta.
- Deliverability confirms in ESP dashboard.
- Immediate step: set feature flag to route all traffic to control template.
- Revert prompt version to last stable, keep AI off until a new experiment plan is written.
- Review content with marketing and legal, check banned phrases and promises.
Runbook: model error spike
- Alert fires from validation failures or model errors.
- Infra checks model provider status and internal changes.
- Immediate step: increase logging detail, route traffic to deterministic templates.
- If problem is provider outage, keep AI off and schedule a postmortem to consider multi model fallback.
Runbook: suspected PII leak
- Security receives report or sees validation failures due to PII detection.
- Pause AI sends and freeze relevant logs.
- Engage legal and data protection; follow your incident response playbook.
- Audit prompts for data passed into the model and logs for stored content.
- Document changes: stricter PII filters, log redaction updates, reduced attribute set.
Escalation matrix
- Severity 1: PII incident or major deliverability hit. Security or deliverability leads the incident, with legal, PM, and infra.
- Severity 2: Significant performance regression without user harm. Infra and PM lead with deliverability consulting.
- Severity 3: Quality issues or minor anomalies. PM and marketing iterate on prompts and validators.
Tests, CI, and release flow for prompt changes
Prompt changes should feel like code changes: small, reviewable, and tested.
CI pipeline outline
# .github/workflows/ci.yml
name: CI
on:
pull_request:
branches: [ main ]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install -r requirements.txt
- run: pytest tests -q
Include:
- Schema validation tests
- Prompt fixture tests with stub model
- Validator tests
- ESP mock tests
Prompt release checklist
- Open PR with prompt spec changes and rationale in description.
- Include example outputs for one or two representative events.
- Ensure CI passes all prompt and validation tests.
- Have growth PM and deliverability sign off in comments.
- Tag prompt version as
canaryand ship behind canary flag. - Promote to
stableonce metrics look acceptable.
Appendix: artifacts, author, and trust signals
Exported artifacts checklist
- Grafana dashboard JSON for onboarding metrics
- Prometheus alert rule files
- Sanitized sample logs and example ESP responses
- DPIA outline document and footer HTML templates
- Notebook for power checks and example parameter sets
Author credentials and change log
Signed by a senior operator who has shipped and supported email and messaging pipelines in production. Change log should live in CHANGELOG.md with entries like:
- v1.0.0 initial repo, single template, SendGrid sandbox support
- v1.1.0 added SES integration, semantic validator stub, power check notebook
- v1.2.0 DMARC parser example, updated footers for legal feedback
Next steps: choose your first slice and ship
You have three practical options:
- Choose a single onboarding template, single ESP, and 1 percent canary if you want a working pipeline in a week and can iterate later.
- Choose multi template rollout only if you already have strong observability and deliverability processes and can extend them.
- Defer AI entirely if you cannot support incident response, DMARC monitoring, or schema validation. In that case, invest in those basics first.
Pick the narrowest slice that still tests AI personalization where it matters for your SaaS. Wire it with contracts, validators, metrics, and a kill switch. Then you can scale with confidence rather than hope.
