PawStyle by MV Clinic.
An AI diagnostic recommendation engine, plus a SOAP-note-to-follow-up agent, deployed at an independent veterinary practice to defend revenue per visit. Four production surfaces, one shared data spine, ~21-month payback.
Revenue per visit was leaking. And nobody could point to where.
MV Clinic, an independent veterinary practice, was watching revenue per visit fall against a backdrop of declining national visit volumes (AVMA/Vetsource, 2024). The natural instinct. Raise prices, drive more visits. Would have made the wrong things worse. The real failure was upstream: at the recommendation moment in every exam, preventive care recommendations were being declined and forgotten. Industry data suggested 81% of recommended preventive care goes unaccepted (AAHA); capture of declined items hovered around 16% (dvm360).
That reframing narrows the design brief enormously. The solution doesn't have to be a new pricing scheme or a new patient acquisition strategy. It has to improve the recommendation moment and close the loop after the visit.
How the SimSo workflow ran on this engagement.
Problem framing
Diagnosed as "the recommendation moment is leaking". Not "revenue is in decline." This unlocked the rest of the work.
Evidence gathering
Five named-source citations on the business case: AVMA/Vetsource on visit decline, AAHA on recommendation acceptance, dvm360 on capture rate, AVMA practitioner survey, Fortune/Morgan Stanley on market size.
Option generation
Three real plays evaluated: AI recommendation engine, clinic-branded online pharmacy (Banfield strategy), preventive care membership plans (Chewy moat play).
Selection & justification
Four explicit filters. Defensible, asset-leveraging, compounding, executable. Disqualified options 2 and 3. Option 1 passed all four. Filters surfaced on the page so the board could rerun the logic.
Economic modeling
$76K one-time + $46K annual investment. +$62K Y1 / +$165K Y3 revenue lift. Math shown plainly: 4,200 visits × 40% lift on flagged cases × $95 avg yield ≈ $160K. Payback ~21 months.
Rollout design
Four phases over 12 months. Named owners per milestone (IT, Ops, HR, Finance, Clinical). Two GO/NO-GO board checkpoints. M8 KPI gate, M11 ROI gate. Designed in for revocability.
Demonstration
Where most engagements stop, this one continued. Four production surfaces shipped, each proving a different claim in the deck. A reviewer can click through every one of them, today.
Four AI roles, one revenue loop.
The system isn't a single model. It's four well-understood roles working together. Each with a different risk profile, validation requirement, and cost.
Clinical reasoner
When it runs: in the exam room, after signalment and history.
What it does: reads the patient profile, returns ranked diagnostic recommendations with reasoning, AAHA-alignment flags, projected revenue, and a confidence signal. JSON-schema-enforced output.
Why an LLM: the alternative (rule engine with hard-coded thresholds) breaks on combinatorial cases. A 9yr Boxer with nocturnal cough and a 14-month gap needs four interacting rules. LLMs are good at that synthesis.
What keeps it safe: system prompt makes it clinician-facing decision support. The vet decides; the model proposes.
Pattern finder
When it runs: at portfolio level. Weekly, monthly.
What it does: k-means (k=4) clustering and decision-tree classification across 30+ patient records, surfacing breed × age-band capture gaps, decline-reason clusters, and worst-performing cells.
Why classical ML: at cohort scale, we're not reasoning case-by-case. We're counting and grouping. Faster, cheaper, more interpretable, no API call required. You can audit a decision tree by reading its splits.
What it proves: the compounding data advantage. Every new visit is a row that retrains the model.
Risk & routing predictor
When it runs: before any LLM call, on every incoming case.
What it does: predicts whether a case is high, moderate, or low-risk from signalment alone. Instant, offline, no API call.
Why it matters architecturally: running Claude on every visit is fine economically. Running it on every phone inquiry, walk-in triage, and online pre-screen is not. The router gives you a free pre-filter that bounds inference cost as volume scales 10–20×.
What keeps it safe: the tree is a router, not a decider. It chooses whether to spend more compute, not what to do.
Workflow coordinator
When it runs: after the visit, when the doctor reviews the queue.
What it does: four-stage pipeline. Parse SOAP note → plan outbound actions → draft each communication → queue for human approval. Each draft cites source lines from the original note. Nothing actually sends.
Why a linear pipeline (not a tool-calling agent): deliberately. Clinical workflows benefit from predictability. Linear pipelines beat agents in stability; agents win in flexibility. We chose the right tool for the room.
What keeps it safe: source-line provenance, confidence scoring, "no diagnosis in drafts" prompt rule, mandatory human approval before send.
How the roles compose into a loop
┌─────────────────────────────────┐
│ INCOMING PATIENT VISIT │
└────────────────┬────────────────┘
│
▼
┌─────────────────────────────────┐
│ ROLE 3. Risk Predictor │
│ local decision tree, <1ms │
│ "Is this worth an engine call?"│
└────────────────┬────────────────┘
│
┌───────────┴───────────┐
│ high / moderate │ low
▼ ▼
┌─────────────────────────┐ ┌──────────────────────┐
│ ROLE 1. Clinical │ │ Routine template │
│ Reasoner (Claude) │ │ (no AI call) │
│ "What should we run?" │ └──────────┬───────────┘
└────────────┬────────────┘ │
│ │
▼ │
┌─────────────────────────┐ │
│ Clinician reviews, │ │
│ accepts / declines │◀──────────────┘
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ SOAP note written │
│ by doctor │
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ ROLE 4. Coordinator │
│ Claude, 4-stage pipe │
│ "What happens next?" │
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ Doctor approves │
│ drafts → sent log │
└────────────┬────────────┘
│
└────┐
▼
┌─────────────────────────────────────────┐
│ ROLE 2. Pattern Finder │
│ k-means + tree, cohort-scale │
│ Runs weekly on accumulated visits, │
│ surfaces gaps, retrains Role 3's tree │
└─────────────────────────────────────────┘
│
└──────▶ ↺ feedback loop
What the AI is. And is not. Doing.
This is the question that always gets asked at the board table. Being precise about it is what separates a system the board can approve from one that needs FDA / USDA review and a different liability posture.
The AI is not:
- Practicing medicine. Vets review every recommendation
- Ordering diagnostics. Vets order, the model proposes
- Sending communications. Every draft requires human approval
- Making billing decisions. Revenue figures are estimates
- Diagnosing. Drafts cannot state or imply a diagnosis
The AI is:
- Surfacing the next-best action with reasoning
- Translating SOAP notes into structured follow-up
- Drafting outbound messages with source-line provenance
- Routing low-stakes cases to templated workflows
- Learning from the cohort over time
Click through each one yourself.
Three production surfaces, each one proving a different claim. They share cohort-data.json as a single source of truth and call the Anthropic Messages API directly from the browser for inference.
Recommendation engine
Pick a patient, hit Run. Returns ranked diagnostic recommendations with reasoning, priority, AAHA-alignment, and projected revenue. Proves the engine works.
→ /demos/engine
Cohort dashboard
30 cases, real k-means + decision tree in the browser. Six sections of aggregated insights, plus a revenue waterfall tying back to the deck math. Proves the engine learns.
→ /demos/cohort
Follow-up agent
Pick or paste a SOAP note. Watch the four-stage pipeline parse, plan, draft, and queue. Approve or reject each draft. Proves the engine closes the loop.
→ /demos/agent
What this is not.
The architecture is mature but the demos are demos:
- Not production-grade. Direct browser API calls with visitor-supplied keys for the demo. A Cloudflare Worker proxy is the next step.
- Not HIPAA-compliant. Demo posture only. A real deployment requires a BAA with the model provider or self-hosted inference.
- The agent is a pipeline, not a true tool-calling agent. Linear was the right call for clinical workflows. The upgrade path is clean if a future use case earns it.
- The decision tree hits 100% training accuracy because synthetic labels are deterministic on age thresholds. A real cohort would land around 75–85%.
Want this pattern adapted to your workflow?
The four-role pattern isn't veterinary-specific. The same architecture has been adapted to professional services, SMB operations, and life sciences contexts. Same method, different domain experts in the loop.