AI Programmatic Local Landing Pages for SMBs Playbook

TL;DR and one-click demo

This playbook helps small agencies and in-house SMB teams pilot AI programmatic local landing pages for SMBs safely, measurably and with compliance baked into the pipeline. A runnable ZIP with a single-command demo is included in the repo and the README shows a 10 minute try-out for a 3-page sample. Included: generator and verifier prompts, embedding + vector DB config, WordPress ACF import, Apps Script + Node scripts, GitHub Actions publish workflow, BigQuery queries and Looker Studio template, Search Console test scripts, and legal templates for image and call tracking consent.

Contents

TL;DR and one-click demo
What this playbook solves and who should use it
Quick pilot plan – 10 to 25 pages
Data model and CMS mapping (runnable files)
RAG and retrieval architecture
Prompt patterns and verifier pipeline
Automation recipes and CI/CD
Schema, Search Console and indexing automation
Measurement, attribution and dashboards
Compliance and legal playbooks
Pre-publish QA SOP and reviewer templates
Scaling, cost models and operations
Appendix – downloadables and code snippets
Decision framework
What can go wrong, how to spot it, and recovery steps
Practical next step

Files you will find in the repo (exact list):

/README.md
/examples/page-examples.json
/cms-imports/wordpress-acf-export.json
/cms-imports/webflow-collection.csv
/scripts/generate_drafts.js
/scripts/generate_drafts.gs
/ci/publish.yml
/dashboards/looker_studio_template.json
/bigquery/queries/metrics.sql
/tests/richresults_test.sh
/legal/image_release_template.docx
/legal/phone_consent_snippet.html
/qa/pre_publish_checklist.pdf

What this playbook solves and who should use it

Problem: teams try to scale many local pages and end up with hallucinated facts, image provenance gaps, duplicate content, and no reliable measurement. This playbook solves the operational steps you need to ship programmatic local pages that are grounded, auditable, and measurable.

Use this if

You are a small agency ops lead managing 1-3 clients and need low-friction deployable assets.
You are an in-house digital lead at an SMB with a developer plus a part-time reviewer.
You are an SEO or product manager evaluating programmatic pages for multi-location businesses and need an acceptance framework to approve budget.

Do not use this if

You cannot commit a reviewer to verify local facts within the SLA below.
You have no plan to surface image release forms or phone consent records.

Quick pilot plan – 10 to 25 pages

Selection rubric

Revenue potential score: estimate monthly local lead value, rank top 25% first.
Search intent similarity: group pages with identical intent to reuse templates.
Verification coverage: choose pages where one reviewer can realistically verify 10-15 pages in pilot time.
Risk gating: exclude franchises or regulated verticals unless legal review available.

Numerical acceptance criteria to publish

No remaining [[assumption]] tokens in generated content.
At least 2 independent local evidence items per page, each with source_url and snippet hash.
JSON-LD (schema.org) validation passes with no errors.
Conversion tracking events mapped and test events show up in GA4 debug stream.
Reviewer signoff logged in CMS custom fields: reviewer email, timestamp, audit signature.

Staffing and rough cost tiers (hypothetical)

Suppose a 100 page monthly cadence. Estimate: 0.5 FTE Content Ops, 0.2 FTE Local Expert reviewer, 0.05 FTE Legal on-call. Tooling: vector DB $200 to $1,000 monthly, embeddings + LLM API $200 to $1,500, call tracking $30 to $150 per number. Use these to build a small/medium/scale budget in the README.

Pilot timeline (example)

Week 0: pick 10-25 pages using rubric.
Week 1: ingest sources, configure vector DB, run one-click demo.
Week 2: generate drafts, start reviewer verification SLA 48 hours per page.
Week 3: publish first set (10 pages) and start measurement window (30 days).
End of pilot: assess using acceptance criteria and BigQuery analysis.

Data model and CMS mapping (runnable files)

Canonical CSV columns for imports. Use this as the single source of truth per location. The repo includes working import files for WordPress ACF and Webflow.

location_id,location_name,street,city,state,postal_code,phone,phone_swap_number,lat,lon,hours_html,services,primary_category,secondary_categories,hero_image_url,image_release_id,owner_contact,timezone,canonical_url,template_variant,local_proofs_json

WordPress ACF notes

Import file: /cms-imports/wordpress-acf-export.json follows ACF field keys used in the generator templates.
Custom fields to store audit: verification_reviewer, verification_timestamp, verification_audit_hash.

Webflow notes

/cms-imports/webflow-collection.csv provided with example rows. Use Webflow import and then trigger a post-import validation script in /scripts.

Import commands (examples in README)

node ./scripts/generate_drafts.js --input ./cms-imports/webflow-collection.csv --out ./examples/page-examples.json
# For Apps Script demo (G Suite):
# Deploy as web app and use the sample spreadsheet provided in the repo

RAG and retrieval architecture

Design goals: retrieval must provide verifiable snippets and metadata to the generator, and embeddings must be stored with metadata that allows proof tracing.

Chunking and embeddings

Chunk size: 500 tokens, overlap: 50 tokens.
Embedding model: use an up-to-date provider 1536-d embedding model; in the repo we recommend text-embedding-3-small as a placeholder and show how to swap provider calls.
Store metadata per vector: source_url, fetch_date, domain_trust_score, snippet_hash, location_id.

Vector DB selection and config

Options: Pinecone, Weaviate, Milvus. Compare by situation:

Choose Pinecone if you want managed service and simple scaling.
Choose Weaviate if you need hybrid search and richer metadata joins.
Choose Milvus if you prefer self-hosting for cost control.

Retrieval flow

Fetch authoritative sources for a location: business site, chamber listing, Google Business Profile, local news, public records.
Normalize and chunk text into 500 token pieces, compute embedding, attach metadata.
Index into vector DB with cosine or L2 depending on engine.
On generation time, retrieve top K=6 chunks, include source_url and snippet extracts in the prompt context.

Example ingestion snippet (Node.js)

const { embed, indexToPinecone } = require('./lib/embeddings');
async function ingestDocument(url, text, location_id) {
  const chunks = chunkText(text, 500, 50);
  for (const chunk of chunks) {
    const vec = await embed(chunk);
    await indexToPinecone({
      vector: vec,
      metadata: {
        source_url: url,
        fetch_date: new Date().toISOString(),
        domain_trust_score: scoreDomain(url),
        snippet_hash: hash(chunk),
        location_id
      }
    });
  }
}

Prompt patterns and verifier pipeline

Overview: generator creates draft copy with inline citation tokens [SRC1], [SRC2]. Low-temperature verifier cross-checks each claim against the retrieved snippets and flags any claim lacking a matching snippet or containing an [[assumption]] marker.

Generator prompt (chat API style)

System: You are a copywriter who must only assert facts backed by provided sources. Do not invent details. Use citations like [SRC1], [SRC2]. 
User: 
{
  "location": { "name":"ACME Plumbing - Downtown", "phone":"REDACTED" },
  "sources": [
    {"id":"SRC1", "url":"https://acme.example/locations/downtown", "snippet":"ACME Plumbing has served downtown since 1999..."},
    {"id":"SRC2", "url":"https://localnews.example/article", "snippet":"ACME responded to burst pipe at 12 Main St."}
  ],
  "tone":"helpful, local",
  "template":"service-area"
}
Instruction: Produce a draft page with headings, 2 advantage bullets, 1 local example paragraph citing sources in-line with [SRCx]. Mark any unknown claim as [[assumption]].

Verifier prompt (low temperature)

System: Verify that every factual claim in the draft has supporting text in one of the provided snippets. Only mark claims as verified when you find phrase-level or paraphrase-level support. Output a JSON array of {claim, verified:true|false, supporting_sources:[ids], confidence_score}.
User: Provide the draft and the same sources array.

Failure modes and enforcement

If verifier returns any claim with verified:false or any [[assumption]] tokens, the automation marks the draft as “needs local review” and notifies reviewer via Slack with a link and required SLA 48 hours.
High-risk verticals flagged for legal review automatically and locked from publish until cleared within 72 hours.

Automation recipes and CI/CD

Deliverables in repo: Node script and Apps Script to generate drafts, GitHub Actions workflow to publish, and secret examples for integration with a secret manager.

Key flows

Ingest source documents into vector DB (scripts/ingest.js).
Generate drafts via chat API and store in CMS staging using CMS API.
Run verifier pass. If pass, push to publish queue. If fail, assign to reviewer with audit record.
Publish job runs via GitHub Actions with idempotent publish.yml that calls CMS publish API and updates index sitemap.

Secrets and CI example

# GitHub Actions (ci/publish.yml) snippet
jobs:
  publish:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Run publish script
        env:
          CMS_API_KEY: ${{ secrets.CMS_API_KEY }}
          VECTOR_API_KEY: ${{ secrets.VECTOR_API_KEY }}
        run: node ./ci/publish_pages.js

Secrets: store API keys in Google Secret Manager or HashiCorp Vault and inject them into CI at runtime. Do not commit keys to repo.

Error handling and idempotency

Publish steps must be idempotent: check for existing page by location_id and canonical_url before creating.
On failure, record error code, last attempt timestamp and retry with exponential backoff. After N retries mark for manual triage.
Rate limit: batch embed calls and use backoff strategy; log consumption to avoid surprise billing.

Schema, Search Console and indexing automation

Each page publishes JSON-LD with LocalBusiness, Service and Review snippets as applicable. The repo includes templates and automated validation scripts.

JSON-LD requirements and example

{
  "@context":"https://schema.org",
  "@type":"Plumber",
  "name":"ACME Plumbing - Downtown",
  "address":{ "streetAddress":"12 Main St", "addressLocality":"Downtown", "addressRegion":"CA", "postalCode":"90001" },
  "telephone":"+1-555-0100",
  "openingHours":"Mo-Fr 08:00-18:00",
  "url":"https://example.com/locations/downtown"
}

Validation steps (scripts/richresults_test.sh)

Run JSON-LD validator against page.
Run structured data testing using Search Console API or the CLI test script included in /tests.
Ensure sitemap entry exists and lastmod is set to publish timestamp.

Search Console automation examples (curl)

# Request URL inspection (example)
curl -X POST "https://searchconsole.googleapis.com/v1/urlTestingTools/mobileFriendlyTest:run?key=API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/locations/downtown"}'

Measurement, attribution and dashboards

Goal: map page-level events to BigQuery for analytical rigor. The repo includes GA4 event mapping, sample BigQuery SQL and a Looker Studio template.

GA4 event mapping

page_view – standard.
contact_call_click – event when user clicks phone number. Include parameter phone_swap_number, location_id, page_variant.
contact_form_submit – include form_id, location_id, lead_value_estimate.
call_inbound – server-side event if using call tracking webhook. Include call_id, phone_swap_number, duration_seconds, location_id.

Call tracking mapping

Examples: CallRail or Twilio webhooks should post to your server that writes call events to BigQuery. Map call tracking phone_swap_number to location_id in the import CSV. Sample call webhook flow included in /examples.

BigQuery sample SQL (metrics.sql)

-- page-level conversion rate and z-test for two variants
WITH page_metrics AS (
  SELECT
    page_url,
    SUM(CASE WHEN event_name IN ('contact_form_submit','contact_call_click') THEN 1 ELSE 0 END) AS conversions,
    COUNTIF(event_name='page_view') AS views
  FROM `project.dataset.events_*`
  WHERE event_date BETWEEN '20260101' AND '20260131'
  GROUP BY page_url
)
SELECT *,
  conversions/NULLIF(views,0) AS conv_rate
FROM page_metrics;

Sample A/B sizing note

Suppose baseline conversion is 3%. To detect a 20% relative lift (to 3.6%) with alpha 0.05 and power 0.8 you might need on the order of tens of thousands of page views per variant. For low traffic pages use pooled longer windows or aggregate pages by template for pragmatic detection.

Compliance and legal playbooks

Top concerns: image provenance, phone consent, data retention, and regional privacy laws.

Image provenance workflow

Collect image_release_template.docx signed copies and store with image_release_id referenced in CMS.
Do not publish images that lack release_id when faces or private property are identifiable.
Retain release forms for a period defined in data retention rules (set per jurisdiction in README).

Phone consent and call tracking

Include a short consent snippet on pages where call tracking will swap numbers. Repo includes phone_consent_snippet.html for insertion. Store consent logs server-side and link to location_id for retention and deletion requests.

Retention policy

Provide configurable retention defaults in the repo: proofs and release forms retained for X years where X is a legal decision. The repo supplies a template retention policy and deletion webhook example keyed by location_id.

Pre-publish QA SOP and reviewer templates

Short SOP enforced by automation.

Content Ops triggers generation and verifier. If verifier green, page goes to staging and reviewer is notified by Slack message containing page link, checklist, and 48 hour SLA.
Reviewer tasks: confirm two independent proofs, verify image_release_id if images used, verify phone mapping and consent, run structured data test, sign the audit field in CMS with email and timestamp.
Legal review when vertical flagged. Legal has 72 hours SLA to clear or escalate.

Reviewer checklist items are provided in /qa/pre_publish_checklist.pdf and include required evidence links and a pass/fail field for each acceptance criterion.

Scaling, cost models and operations

Throughput per reviewer: suppose a trained reviewer can verify 8 to 15 pages per week depending on complexity. If you need higher throughput you can hire junior verifiers but retain one senior local expert for escalation and training.

SLA matrix (example)

Generator -> Verifier assignment: immediate
Verifier turnaround: 48 hours
Legal review: 72 hours
Publish window after signoff: same day automated

Escalation playbook for manual actions

If Search Console reports manual action or spike in removals, immediately unpublish affected pages and notify legal and SEO lead.
Collect audit trail: stored proofs, verifier signatures, and publish timestamps.
File reconsideration with Search Console using collected audit artifacts and remediation notes.

Appendix – downloadables and code snippets

The ZIP includes all files listed at top. Quick install steps in README:

Clone repo and install Node dependencies.
Create secrets in your secret manager and set them into GitHub Actions secrets.
Run one-click demo: npm run demo which ingests sample sources and generates 3 sample drafts.
Run ./tests/richresults_test.sh https://your.staging/page to validate structured data.

Decision framework

Choose this playbook if speed to a grounded, auditable pilot is your constraint and you can assign a 0.2 FTE reviewer. Choose a simpler single-location page approach if you cannot maintain review SLAs or cannot store release/consent records. Avoid programmatic publishing if you cannot map call tracking or cannot guarantee image provenance; the hidden risk is liability and costly removals.

What can go wrong, how to spot it, and recovery steps

Hallucinated local facts

Spot: verifier returns many [[assumption]] tokens or reviewer reports claims unsupported by sources. Recover: re-run retrieval with broader sources, require a manual proof link, or remove the claim until proof is obtained.

Image provenance failure

Spot: takedown notice or unverified owner claim. Recover: unpublish page, surface release form, and replace image with a stock or client-supplied verified image. Keep audit trail for reconsideration.

Duplicate content / crawl budget waste

Spot: organic traffic plateaus and Search Console shows low index rates. Recover: ensure canonical logic is correct, consolidate near-duplicate pages, and use robots directives for low-value auto pages until improved.

Data leakage from secrets

Spot: secrets present in logs or committed to repo. Recover: rotate keys, remove commits containing secrets, and enable stricter CI checks.

Practical next step

If you can commit one part-time reviewer and a developer, run the one-click demo in the repo. If you cannot, pause and allocate those resources before trying to scale.