Clinical Trial Protocol

REGAIN-ADVOCATE-TA3-RCT

Scalable Agentic AI for Heart Failure & Post-MI Management: A Pragmatic Non-Inferiority Randomized Controlled Trial

(The ADVOCATE Scalability Study)

Protocol Author

Alexey Revtovich

Scientific Lead, Regain Inc. | formerly Rice University

h-index 12 720 citations 19 publications

Google Scholar LinkedIn

1. Project Summary

REGAIN-ADVOCATE-TA3-RCT is a multi-site pragmatic randomized controlled trial designed to evaluate a dual-agent "Clinical AI" system for cardiovascular disease management in the United States, explicitly structured as an ADVOCATE Scalability Study to generate evidence for:

Clinical non-inferiority
Operational efficiency
Technical robustness across EHR vendors/workflows
Payer-facing economic endpoints

Investigational System

TA1 (Clinical Agent / SaMD): proposes guideline-concordant medication optimization and monitoring plans for heart failure (HFrEF/HFmrEF/HFpEF) and post-myocardial infarction (post-MI) patients, generating actionable orders and follow-up plans.

TA2 (Supervisory Agent / Safety Control): independently monitors TA1 outputs in real time, blocks or escalates unsafe recommendations, and enforces a fail-safe state when safety or system performance degrades.

Primary Objective

The primary objective is to demonstrate non-inferiority of the investigational system to usual care on GDMT adherence (operationalized as the GCTS - Guideline-Concordant Therapy Score) at Month 12, with key supportive endpoints including:

Rehospitalization or all-cause mortality
CV death / HF hospitalization
Patient-reported quality of life (KCCQ)
Operational efficiency (clinician time per patient)
Technical integration reliability (read/write success and uptime)
Economic outcomes (total cost of care per patient per month)
Adjudicated safety outcomes (agent decision-related SAEs and unsafe recommendation rate)

Shadow Mode Clarification (Phase 1B)

Shadow Mode: run the system prospectively in a non-interventional manner to generate adjudicated safety labels and operational readiness evidence (no hidden intervention; research dashboard only).

2. Specific Aims

Aim 1 Clinical Effectiveness (Non-Inferiority)

Demonstrate non-inferiority of the investigational system versus usual care in GDMT adherence (operationalized as GCTS - Guideline-Concordant Therapy Score) at Month 12.

Regulatory intent: Generate evidence aligned with FDA SaMD clinical evaluation expectations for effectiveness in the intended use population.

Aim 2 Safety Control Performance (TA2 Focus)

Validate TA2's ability to detect, block, and escalate unsafe TA1 outputs and system hazards in a clinically meaningful taxonomy of error classes.

Regulatory intent: Generate evidence consistent with MDDT-style analytical validation as a parallel evidence stream, while treating TA2 as an internal safety control within the investigational system during Phase 2.

Aim 3 Scalability & Operational Efficiency

Quantify the change in clinician workload and workflow efficiency (e.g., minutes/patient/month; order review burden; escalation rates) while maintaining non-inferior clinical outcomes.

Key Metrics

Clinician time per patient-month
Order review burden
Escalation rates
Specialist Extension Factor (SEF)

Aim 4 Safety

Quantify adjudicated device-related serious harms and unsafe recommendation rate, and evaluate the impact of the safety control layer and clinician sign-off on risk mitigation.

Safety Metrics

Device-related serious adverse events (adjudicated)
Unsafe recommendation rate
TA2 critical miss rate
Agent decision-related SAE rate (<3% target)

3. Research Strategy

3a. Significance

Heart failure and post-MI care require longitudinal, guideline-concordant optimization of medications and monitoring, yet specialty capacity is constrained and outcomes remain heterogeneous across settings. A scalable, auditable, safety-controlled agentic AI system could extend specialist-quality management across diverse US health systems, including resource-limited and rural settings, while maintaining patient safety and regulatory-grade traceability.

3b. Innovation

Dual-agent architecture with independent safety control (TA2) monitoring TA1 in real time
Auditability ("glass box"): complete trace logs of inputs, model versions, outputs, TA2 decisions, and clinician actions
Pragmatic EHR-integrated workflow with a "pending order → clinician sign" mechanism supporting deployment realism while preserving clinician accountability

3c. Approach (High-Level Design)

Phase 1A: retrospective data access + pre-production sandbox integration for read/write validation (orders, In-Basket drafts, note drafts) and IV&V Study 1 support.
Phase 1B: IRB approval with FWA; beta patients for UI/UX; prospective non-interventional Shadow Mode evidence; IDE activities; IV&V Study 2 support; go/no-go readiness.
Phase 2 (Live Pragmatic RCT): randomized comparison of investigational system-enabled care versus usual care with blinded endpoint and safety adjudication.
Safety governance: DSMB + Medical Monitor, pre-specified stopping rules, and a fail-safe state when TA2 or data quality degrades.

4. Study Overview

4.1 Investigational System Definition

The investigational device is the combined TA1+TA2 system integrated into the clinical workflow and EHR. TA2 is treated as an internal safety control. TA2's potential MDDT qualification evidence is developed in parallel, but Phase 2 evaluates the combined system's safety and effectiveness in situ.

4.2 ADVOCATE Schedule (39 Months)

Phase 1A: Discovery & Foundation

Months 0-12

Retrospective de-identified longitudinal EHR data access + pre-production sandbox integration to validate read/write workflows (pending orders, In-Basket drafts, encounter note drafts) and participate in IV&V Study 1.

Phase 1B: Preparation & Regulatory

Months 12-24

IRB approval with FWA; beta patients for UI/UX testing; prospective Shadow Mode readiness evidence; IDE activities; IV&V Study 2 support; go/no-go thresholds finalized.

Phase 2: Scalability Study

Months 24-39

Live pragmatic RCT where eligible patients are randomized to investigational system–enabled care versus usual care. Clinical workflow uses "pending order → clinician sign" mechanism.

4.3 Change Control / Model Freeze

To preserve interpretability of the RCT and maintain a stable investigational device definition:

Model Freeze Before Phase 2

TA1 and TA2 are frozen prior to first Phase 2 enrollment. Freeze scope includes: model weights, TA2 safety rules/detectors, prompting/policy logic, knowledge sources, and drug knowledge/database versions.

PCCP-Style Controlled Updates (IDE-Governed)

Update Type	Description	Requirements
Non-clinical updates	UI, logging, performance, and reliability improvements that do not change clinical behavior	Versioning and validation
Safety-driven rule updates	Deterministic TA2 safety rules/data-guardrails	CAPA with DSMB notification and IDE amendment/notification
Clinical behavior update ("Version 2")	Only under pre-specified bridge process	(1) shadow-to-live bridge evaluation, (2) adjudicated safety challenge set, (3) IDE/IRB approvals, (4) SAP version-strata handling

No Online Learning During Phase 2

Phase 2 interaction data (including clinician rationales) are not used to update the deployed model during the trial; they may be used for post-trial analysis and subsequent releases under IDE-controlled change processes.

Versioning and traceability: Every recommendation and TA2 decision is tied to a unique system version identifier in the audit log.

5. Study Design (Phase 2)

5.1 Trial Type

Multi-site pragmatic RCT, open-label at point of care, with blinded endpoint and safety adjudication.

Regulatory framing (ADVOCATE TA3 requirement): This trial is conducted as an IDE study supporting FDA SaMD authorization for the investigational TA1+TA2 system.

Technical robustness requirement (TA3): The site network will include at least two major EHR vendors (e.g., Epic and Cerner) and demonstrate stable operation across vendor-specific workflows; vendor- and site-level integration metrics are reported.

5.2 Randomization

Individual patient randomization (1:1) within each site with centralized allocation concealment.

Justification: Preserves patient-level causal inference while enabling pragmatic deployment; contamination risk is mitigated via access controls and audit logs.

Contingency: If operationally unavoidable or contamination is excessive, a cluster design at clinician/team level may be adopted with corresponding ICC-driven sample size adjustments and analysis.

Stratification Variables

Site
HF phenotype (HFrEF/HFmrEF/HFpEF) vs post-MI cohort
Baseline GCTS (guideline-concordant therapy score)
Age (>65 vs ≤65)
Rural/urban indicator (based on ZIP RUCA)

5.3 Blinding & Adjudication

Open-label care is expected due to workflow integration.
Blinded adjudication is required for:
- Primary endpoint scoring (where subjective elements exist)
- Device-related serious harms attribution
- Classification of "unsafe recommendations" and "critical misses"

5.4 Contamination Control & Spillover Analysis

To prevent "spillover" learning from intervention to control:

Restrict TA1/TA2 UI access to intervention participants (role-based access; EHR flags)
Segregate order queues and dashboards
Maintain audit logs of UI access and recommendation viewing
Train staff on separation and documentation requirements

Contamination Exposure Index (Pre-Specified)

Derive an exposure index per clinician/team and per patient from audit logs (e.g., # of AI-case views, # of intervention pending orders reviewed, time-in-AI UI). Use it for (1) monitoring separation fidelity, and (2) sensitivity analyses.

Operational Separation (Recommended Default)

Use a dedicated intervention review pool (NP/PA/MD adjudicator queue) where feasible
Keep intervention dashboards separate from control workflows by role and patient flags

5.5 IDE Sponsor Accountability, Reporting, and Change Control

This study is conducted under an IDE for SaMD. The IDE sponsor (Regain, Inc.) holds device accountability and is responsible for FDA communications, regulatory reporting, and software release control.

Key Principles

Software change control: the investigational system is frozen per Section 4.3. Any permitted safety-driven changes follow an IDE amendment process and documented CAPA.
Safety reporting: sites report events rapidly to the IDE sponsor; sponsor performs required FDA/IRB reporting and maintains the Device Master Record and audit trail.
Regulatory reporting and monitoring: the IDE sponsor fulfills applicable IDE obligations (including 21 CFR 812 reporting expectations).

RACI Matrix

Activity	IDE Sponsor (Regain)	Coordinating Center	Site PI/Team	DSMB/Medical Monitor
FDA IDE submission/maintenance	R/A	C	I	I
Software release control, CAPA	R/A	C	C	I
Device accountability and audit-log custody	R/A	R	C	I
Site IRB submissions/consent	C	I	R/A	I
AE/SAE identification and initial reporting	I	I	R/A	I
UADE / device-related serious harm adjudication	R	C	C	R/A
DSMB reviews and pause/resume recommendations	I	C	C	R/A
Data monitoring, quality checks, database lock	C	R/A	R	I

R = Responsible, A = Accountable, C = Consulted, I = Informed

6. Study Population

6.1 Inclusion Criteria (Pragmatic)

Age ≥18 years receiving longitudinal care at participating US health systems
Heart failure diagnosis (HFrEF, HFmrEF, or HFpEF), NYHA class II–IV, OR Post-MI within the prior 12 months
Ability to provide informed consent
English or Spanish literacy
Access to smartphone/tablet OR caregiver-assisted support
Ability to use required wearable/RPM devices (digital scale, BP cuff, SpO2)
EHR data availability sufficient to support safe medication management (problem list, meds, allergies, labs, vitals)

HF Phenotype Definitions

HFrEF: typically LVEF ≤40%
HFmrEF: typically LVEF 41–49%
HFpEF: typically LVEF ≥50%

LVEF used when available; diagnosis may also be confirmed by clinician problem list / encounter documentation

6.2 Exclusion Criteria (Minimal, Safety-Focused)

Exclude only if there is no safe path to participate even with device provisioning/training and caregiver assistance, or if participation would be clinically inappropriate:

Exclusion	Rationale
Inability to provide informed consent with available supports	Ethical requirement
Severe cognitive impairment preventing interaction with the AI (despite available supports)	Safety
Inability to use required wearable/RPM devices due to physical limitations and no safe caregiver-assisted alternative	Data collection requirement
Expected life expectancy <12 months from non-cardiovascular disease	Endpoint interpretability
Enrollment in another interventional study that materially conflicts with GDMT management	Confounding
Any site-defined condition where medication management cannot be safely supported due to missing essential data	Safety

6.3 Recruitment & Enrollment Sources (TA3 Requirement)

Participants may be enrolled through:

Outpatient clinics: cardiology, primary care, HF programs
Inpatient settings: prior to discharge after HF hospitalization or acute MI, with longitudinal follow-up arranged within the TA3 health system

6.4 Demographic Representation Targets

Category	Minimum Target
Older adults (65+)	≥40%
Black/African American	≥13%
Hispanic/Latino	≥18%
Rural/Underserved	≥25%

6.5 Equity & Representation Operational Plan

To make recruitment targets achievable under real constraints, TA3 execution uses a monitored, adaptive approach:

Site Selection Criteria

Include at least one safety-net/underserved urban site
Include at least one rural-serving site
Prioritize systems with demonstrated Black/African American and Hispanic/Latino HF/Post-MI volume

Adaptive Recruitment Triggers

If any target stratum falls >5 percentage points below plan for two consecutive months:

Open additional clinics/sites
Add community outreach
Increase device provisioning and engagement staffing

Resourcing Tied to Equity KPIs

Budget and staffing explicitly support translation (EN/ES)
Device setup/training resources
Caregiver onboarding
Navigation support to prevent technology access from becoming an exclusion

7. Interventions & Clinical Workflow

7.1 Control Arm: Usual Care

Standard clinician-led management consistent with local workflows and current AHA/ACC guidelines. Data collection is passive via EHR + PROs.

To support interpretability:

Document site-level baseline practice patterns
Pre-specify minimum documentation for GDMT status (med list, doses, contraindications/intolerance)
Define a minimum measurement-only chart abstraction standard for GDMT eligibility/contraindications at each assessment window (baseline, Month 3, Month 12) to reduce differential misclassification and documentation bias across arms

7.2 Intervention Arm: Investigational System

Core workflow:

TA1 ingests EHR data and produces a guideline-concordant optimization plan
TA2 independently evaluates TA1 outputs against safety constraints in real time
Approved actions are converted into pending orders or structured recommendations
A licensed clinician reviews and signs (or rejects/modifies) the pending orders

No Autonomous Execution

The investigational system does not execute medication orders without clinician sign-off. Clinical decision responsibility remains with the licensed clinician.

Rationale Capture (Designed to be Scalable)

Disposition	Required Action	Notes
Accept as-is	One-click disposition with default reason code "Accepted as recommended"	No free-text required
Reject or modify	Required structured reason codes + optional free text	Reason codes: contraindication, patient preference, plan already in progress, data incorrect, safety concern, out-of-scope

Order Review SLAs (Protocol Defaults)

Order Type	Review Requirement	Expiration
Routine pending orders	Reviewed within 3 business days	Auto-expire after 7 days if unsigned
Safety-critical escalations (TA2 high-severity)	Immediate clinician notification; review within 24 hours	Documented disposition required

7.3 Training and Credentialing

Mandatory Clinician Training Before Phase 2 Start

How to review pending orders
When to override
Documentation requirements
Escalation pathways and fail-safe behavior

Ongoing: Periodic refreshers and change-control notifications (without changing frozen model behavior)

7.4 Order Classes Matrix

Category	Examples	Allowed?	Review
GREEN	Routine GDMT titration, standard labs, refills	Yes	Single sign-off
YELLOW	Diuretic escalation, borderline SBP initiation, complex diuretic combinations	Conditional	Double-sign or specialist consult
RED	Anticoagulants initiation/change, dual antiplatelet decisions, antiarrhythmics	No	Escalate only

7.5 Fail-Safe Behavior (System-Level)

If TA2 is unavailable, degraded, or outside performance thresholds, the system must enter a fail-safe state:

Trigger	System Behavior
TA2 unavailable	TA1 cannot generate or submit pending orders
TA2 degraded performance	Clinicians revert to usual care
Data quality guardrails fail	All events logged and reported per incident workflow

Fail-safe exit: System remains in fail-safe until TA2 availability and performance thresholds are restored and verified.

7.6 Staged Autonomy Pathway (Phase 2)

ADVOCATE's goal is to demonstrate safe autonomy at scale, not merely decision support. This protocol pre-specifies a staged autonomy ladder during Phase 2.

Scope Guardrail (Non-Negotiable)

Medication orders are never executed without clinician signature; autonomy staging applies to GREEN non-order actions (messaging, scheduling, patient education) and workflow automation.

Stage A — Run-in (First 4 Weeks)

Requirement	Purpose
Clinician review required for all AI-generated outputs	Stabilize workflow
TA2 hard-stops and escalation active	Calibrate adjudication
Full audit logging and response-time capture	Validate systems

Stage B — Review-Exception for GREEN Non-Order Actions

Action Type	Behavior
GREEN non-order actions	Auto-executed (e.g., sending templated patient education, scheduling requests, routing low-risk FYI In-Basket messages)
Medication/lab orders	Remain pended for sign-off, but GREEN orders routed for batched review (daily queue)
Exceptions	Clinicians review only TA2 escalations, YELLOW/RED, or sampled audits

Stage C — Optional Limited Pilot (Site- and IDE-Approved)

Expand review-exception coverage
Allow limited protocolized actions under explicit standing protocols
Post-hoc clinician audit sampling and DSMB oversight

Advancement Criteria (Evaluated Per Site Monthly)

Criterion	Threshold
Post-TA2 high-severity unsafe actions	0
High-severity TA2 critical misses	0
Agent decision-related SAE rate	Below TA3 target trajectory
Pause triggers	None
Integration reliability	Thresholds met (read/write success, uptime/latency)

Scalability KPIs (Reported Monthly)

KPI	Definition
AAR (Autonomous Action Rate)	% of GREEN non-order actions executed without synchronous clinician review
BRR (Batched Review Rate)	% of GREEN pending orders handled via batched review sessions (vs interruptive review)
Clinician minutes per patient-month	Median and p90, with burden drivers
TA2 hard-stop rate	Per 1,000 recommendations
Escalation rate	Per 100 patient-months

8. Outcomes & Endpoints

8.1 Primary Endpoint (Non-Inferiority)

GDMT adherence, operationalized as the Guideline-Concordant Therapy Score (GCTS; 0-4 points) at Month 12.

Primary analysis population: HFrEF/HFmrEF and post-MI participants (HFpEF included in the trial but analyzed as a pre-specified supportive subgroup due to less uniformly defined medication optimization targets).

HFrEF GCTS (4 Pillars)

RAASi/ARNI (ACEi/ARB/ARNI)
Evidence-based β-blocker
MRA
SGLT2i

GCTS Scoring Framework

Score	Criteria
1.0	On guideline-recommended agent at ≥50% target dose or documented maximally tolerated dose
0.5	On agent but <50% target dose (titration in progress) with no contraindication to further titration documented
0.0	Not on agent despite eligibility and no documented contraindication/intolerance

Eligibility adjustment: Contraindicated/intolerant pillars are excluded from the denominator.

Post-MI GCTS (4 Elements)

High-intensity statin (or maximally tolerated)
Antiplatelet therapy appropriate to time-from-MI and bleeding risk
β-blocker if indicated
ACEi/ARB/ARNI if indicated

HFpEF Supportive Therapy Score (HFpEF-STS; 0–2 Points)

Element	Scoring
SGLT2i element (0–1)	1.0 / 0.5 / 0.0 scoring analogous to other cohorts (eligibility-adjusted)
Congestion management element (0–1)	Objective evidence of active loop/thiazide diuretic plan when congestion is documented plus monitoring plan (weight + labs) → 1.0; partial plan → 0.5; absent plan when eligible → 0.0

Final Score Calculation

GCTS = 4 × (Σ element scores / # eligible elements)

GCTS Ascertainment & Documentation-Bias Mitigation

Because the intervention arm may improve documentation quality (not just prescribing), this protocol explicitly separates pragmatic documentation from adjudicated "best-available truth" to prevent biased non-inferiority conclusions.

Dataset	Description
Observed GCTS (Pragmatic)	Computed from routine EHR documentation as it exists in care delivery (what a health system "sees" in real time)
Adjudicated GCTS (Credibility Anchor)	Computed using centralized chart abstraction and blinded adjudication applying the evidence hierarchy, for both intervention and control arms (measurement-only; no care changes)

8.2 Key Supportive Endpoints

Re-hospitalization or all-cause mortality through Month 15
CV death / HF hospitalization through Month 15
Time-to-optimization (time to GCTS ≥3.5)
Early optimization rate (proportion with GCTS ≥3.0 by Month 3 and Month 6)
GCTS AUC (0–6 months) - area-under-the-curve to capture speed + maintenance

8.3 Patient-Reported Outcomes

KCCQ (quality of life) at baseline and follow-up time points (Months 3, 6, 12, 15)

8.4 Operational/Scalability Endpoints

Clinician time per patient-month (median and p90)
Specialist Extension Factor (SEF): target ≥5 by Month 3, ≥10 by Month 9
Response time from red-flag event to clinical action
Total cost of care (PMPM)

Autonomy-at-Scale KPIs

AAR (Autonomous Action Rate): % of GREEN non-order actions auto-executed
BRR (Batched Review Rate): % of GREEN pending orders handled via batched review
TA2 hard-stop rate per 1,000 recommendations
Escalation rate per 100 patient-months

Interruptiveness Metrics (Burnout-Relevant)

Interruptive alerts/pages per 100 patient-months
Non-interruptive queue items per 100 patient-months
Median time-to-disposition for each class

Cost / Reimbursement Evidence (TA3 Requirement)

Total cost of care per patient per month (PMPM)
Using claims feeds where available OR standardized cost weights derived from utilization
At least one participating site will provide claims linkage (e.g., Medicare FFS/MA, ACO)

Budget Impact Analysis (Payer-Grade)

Gross savings from reduced admissions/ED utilization
Incremental program costs (devices/data plans, integration, adjudication time)
Net PMPM
Breakeven month

8.5 Patient Medication-Taking Adherence (Secondary/Mediator)

Because the TA3 "GDMT adherence" effectiveness endpoint is operationalized as guideline-concordant prescribing/optimization (GCTS), separate patient medication-taking adherence is treated as a secondary/mediator endpoint and measured via:

Pharmacy claims/fills (when available)
EHR medication reconciliation
Optional ePill devices and/or validated self-report

8.6 Safety Endpoints

Device-related serious adverse events (adjudicated)
Unsafe recommendation rate (adjudicated)
Agent decision-related SAEs: <3% target
TA2 performance: critical miss rate, false positive block rate

Hallucination/Invalid-Reasoning Metrics (Reportable)

Metric	Definition
TA2 "caught hallucinations" per 1,000 recommendations	By taxonomy
Residual hallucinations that reached clinician review	Count and rate
Any hallucinations that became accepted actions	With adjudicated outcomes

8.7 Sample Size & Statistical Analysis

Final planned sample size: N = 800 total participants (400 per arm)

Non-Inferiority Margin

Δ = -0.20 points on the 0-4 GCTS scale. The investigational system is non-inferior if the lower bound of the one-sided 97.5% CI is greater than -0.20.

Base NI Calculation

Endpoint SD (planning): σ = 1.0 GCTS points (conservative; refined using Phase 1B data)
NI margin: Δ = 0.20
One-sided α = 0.025; power = 90%

n_{per arm} = ((Z_1-α+Z_1-β)×σ/Δ)² ≈ ((1.96+1.28)×1.0/0.20)² ≈ 263

Inflations Applied

Factor	Value
Attrition / incomplete endpoint ascertainment	15%
Design inflation (site heterogeneity, clustering/contamination, implementation variability)	1.15

Rounding to 400 per arm provides margin for heterogeneity and improves precision for key supportive event endpoints and subgroup analyses.

Enrollment Balance Targets

Cohort	Target
HF overall	≥60%
HFrEF minimum	≥35%
Post-MI	≥30%
HFpEF	Supportive subgroup (no minimum quota)

Clinical Rationale

A 0.20-point difference represents ~5% of the full scale. Across 100 participants, this is equivalent to approximately 5 patients missing one full guideline element OR 20 patients being one half-step below target.

9. Schedule of Assessments (Phase 2)

Assessment Windows

Timepoint	Window	Key Assessments
Baseline (Day 0)	−30 to 0 days for EHR data	Demographics, comorbidities, cohort classification, NYHA class, LVEF, medication list + doses, allergy list, contraindications/intolerance, key vitals and labs, baseline Observed and Adjudicated GCTS, KCCQ, onboarding completion
Month 1	±14 days	Updated meds/doses and key labs/vitals (EHR), safety events, operational metrics, patient-reported out-of-network utilization, RPM data completeness
Month 3	±21 days	Meds/doses, labs/vitals, Adjudicated GCTS, KCCQ, events, SEF calculation, autonomy-stage progress evaluation
Month 6	±30 days	Meds/doses, labs/vitals, events, operational metrics, out-of-network utilization prompt
Month 12	±30 days	Primary endpoint (Adjudicated GCTS) and pragmatic Observed GCTS, labs/vitals, KCCQ, events, operational metrics, HFpEF-STS for HFpEF subgroup
Month 15	±45 days	TA3-required composite endpoint (re-hospitalization or all-cause mortality), supportive CV endpoints, KCCQ (optional), final safety review

Continuous Event Capture

Source	Method
In-network events	EHR + ADT feeds
Out-of-network events	Monthly patient prompts + record requests, HIE queries (TEFCA-enabled) where feasible, claims linkage at capable sites
Mortality	Health-system feeds + external sources (state death registry, NDI queries)

All suspected endpoint events are adjudicated.

10. Statistical Analysis Plan (SAP) Summary

10.1 Estimands and Analysis Sets

Estimand	Description
Primary (Credibility Anchor)	Difference (Investigational – Usual Care) in Adjudicated GCTS at Month 12 under a treatment-policy strategy
Key Supportive (Pragmatic)	Difference (Investigational – Usual Care) in Observed GCTS at Month 12

Analysis Sets: Both ITT and Per-Protocol non-inferiority analyses are required, with expectation of consistent conclusions.

10.2 Primary Analysis Model

Mixed effects regression (or GEE) appropriate to endpoint scale, with:

Fixed effects: arm, baseline Adjudicated GCTS, cohort (HF vs post-MI), site, stratification variables
Random effects: clinician/team if needed and/or site-level random intercepts
Robust standard errors

10.3 Multiplicity and Hierarchy

Confirmatory Family (Gatekept)

Order	Endpoint	Test
1	Primary: Non-inferiority on Adjudicated GCTS at Month 12	One-sided α=0.025
2	Time-to-optimization (superiority)	Two-sided α=0.05, gatekept
3	Clinician burden / SEF (superiority)	Two-sided α=0.05, gatekept
4	Response time (superiority)	Two-sided α=0.05, gatekept

10.4 Missing Data

Primary analysis: mixed models with maximum likelihood under MAR assumptions, supported by multiple imputation with auxiliary variables
Sensitivity: pattern-mixture (delta-adjustment) and worst-case bounds for differential missingness

10.5 Subgroup Analyses (Pre-Specified)

Age >65 vs ≤65
Sex
Race/ethnicity
Rural/urban
HF phenotype (HFrEF vs HFmrEF vs HFpEF) vs post-MI
CKD strata

11. Data & Safety Monitoring / Stopping Rules

11.1 Governance

Body	Role
DSMB	Oversees safety monitoring and interim reviews; meets at least quarterly during Phase 2 (ad hoc within 7 days of any pause trigger)
Medical Monitor	Provides rapid review of serious events; reviews any probable/definite device-related serious harm within 24 hours
Blinded Adjudication Committees	Classify device-related serious harms, medication-related serious harms, unsafe recommendations, TA2 critical misses

11.2 Definitions

Term	Definition
Unsafe recommendation	A TA1 recommendation that, if implemented as-is without clinician modification, would likely result in serious harm
Critical miss	TA2 fails to block or escalate an unsafe TA1 recommendation (false negative) in a high-severity class
Agent decision-related SAE	An SAE for which blinded adjudication determines probable/definite causal contribution from an accepted TA1 recommendation
Hallucination / invalid reasoning	A TA1 output that asserts or relies on non-existent or incorrect patient-specific facts or produces guideline-inconsistent reasoning without factual support

Hallucination Taxonomy

Fabricated data claims (labs, vitals, medications) not present in EHR/RPM feed
Wrong-patient-context inference
Guideline mismatch / non-concordant recommendation given available facts
Missing-data hazard (proceeds as if required safety data exist)

11.3 Phase 1B Go/No-Go Thresholds

Category	Threshold
Evidence volume (recommendations)	≥10,000 TA1 recommendations in Phase 1B
Evidence volume (challenge scenarios)	≥2,000 adjudicated challenge scenarios
High-severity TA2 critical misses	0
Post-TA2 high-severity unsafe recommendations	0
Overall post-TA2 unsafe recommendation rate	≤0.2%, no upward trend
TA2 false-positive blocking rate	≤15% overall; ≤3% for high-severity
Pending-order creation + audit logging success	≥99%
TA2 availability	≥99.9% over final 30 days

11.4 Phase 2 Stopping Rules

Patient-Level

Remove from autonomous mode if:

≥2 high-severity TA2 blocks within 30 days, OR
≥1 confirmed critical miss, OR
Any probable/definite device-related serious harm

Trial-Level

Trigger	Action
First probable/definite device-related serious harm	Immediate DSMB review
≥2 such events	Pause enrollment pending DSMB review
Post-TA2 unsafe recommendation rate >0.2% (30-day rolling)	DSMB review
Post-TA2 unsafe recommendation rate >0.5% (30-day rolling)	Pause enrollment

System-Level (Fail-Safe)

Automatic fail-safe if:

TA2 unreachable for >5 seconds, OR
TA2 p99 latency >250ms sustained for >5 minutes, OR
Required data-quality guardrails fail

Recurrent fail-safe events (>3 in 24 hours) trigger incident review and DSMB notification.

11.5 Escalation Protocols (24/7 Coverage)

Red Flag Triggers (Minimum Set)

Rapid weight gain (≥2–3 kg in 72 hours) with HF symptoms
New/worsening hypoxia (SpO2 below threshold) or severe dyspnea
Hypotension below threshold with symptoms
ADT feed indicating ED visit/admission for HF-related complaints

Required Behavior: TA1 drafts In-Basket message and/or pages on-call clinician immediately. TA2 validates escalation urgency and blocks inappropriate autonomous action. Sites provide 24/7 coverage via existing on-call systems.

11.6 Event Classification & Reporting Workflow

Event Type	Site → IDE Sponsor	IDE Sponsor Actions
Suspected UADE or probable/definite device-related serious harm	Within 24 hours	Medical Monitor review within 24 hours
Any SAE	Within 48 hours	Triage and classification
TA2 critical miss (high severity)	Within 24 hours	DSMB notification for pause triggers within 24 hours
Near-miss summaries	Within 5 business days	Aggregated review

12. Technology & Integration Requirements

12.1 EHR Integration (FHIR R4)

Read Access (Real-Time)

Labs (chemistry, hematology, BNP and troponin)
Vitals (BP, HR, weight, O2/SpO2)
Medications (current active list)
Clinical notes (cardiology, primary care)
ADT feeds (admission, discharge, transfer)

Write Access (Real-Time)

Draft In-Basket / inbox messages to clinical team (required)
Draft scheduling requests / follow-up tasks
Create pending orders (meds/labs) for clinician sign-off
Draft documentation / encounter notes for clinician review/signature (required)

Audit Logs (Required)

All TA1/TA2 inputs/outputs, model versions, gating decisions, timestamps, clinician actions, override reasons, downstream order execution status.

12.2 Standards, Interoperability, and Auth

Standard	Requirement
FHIR	HL7 FHIR R4 preferred for read/write
TEFCA/USCDI	Data elements aligned with USCDI and TEFCA expectations
Legacy support	HL7 v2 ADT/ORM/ORU interfaces as fallback
Authentication	SMART on FHIR with OIDC for secure context launching

12.3 Phase 1A Data Access

Retrospective data: de-identified longitudinal EHR data for HF/Post-MI cohorts
Connected wearable/RPM platforms: de-identified historical feeds and pre-production access
Pre-production/sandbox: validate API writes without patient risk
Pre-Phase-2 qualification: each site must pass Integration Qualification Checklist (Appendix B)

12.4 Fusion Protocol Test (TA2 Gating Verification)

Pre-Phase 2 verification:

Test harness injects known unsafe scenarios across error classes
Confirm TA2 blocks/escalates per spec
Confirm system enters fail-safe when TA2 unavailable or outside constraints

12.5 Performance Requirements

Metric	Target
TA2 gating latency	p99 < 100ms
TA2 availability	≥99.9% per 30-day period
Data-quality guardrails	Minimum required data elements must be present
Pending-order creation success	≥99%

12.6 Downtime / Failover SOP

Required procedures for:

EHR downtime
TA2 downtime
Missing/degraded data quality
Cybersecurity incidents

13. Ethics, Consent, and Privacy

IRB approval at each site
Informed consent includes:
- Description of investigational system and clinician sign-off workflow
- Data use, audit logs, and privacy protections
- Explicit disclosure that Phase 1B shadow mode does not change care
Data handling: HIPAA-aligned; role-based access; audit logs retained per protocol
eConsent/e-sign: implemented with integrity controls appropriate to environment, consistent with 21 CFR Part 11 expectations where applicable

14. Timeline & Milestones

Phase 1A: Discovery & Foundation (Months 0–12)

Month	Milestone
1	Guidance and access to patient data from institutional EHR and connected wearable/RPM platforms
3	Provide key technical integration metrics and criteria to TA1/TA2
6	Retrospective de-identified longitudinal EHR data dump for HF/Post-MI cohorts
9	IV&V Study 1 support (simulated patient testing)
12	Pre-production EHR environment fully integrated for API writes; deliverables: workflow mapping, impact assessment

Phase 1B: Preparation & Regulatory (Months 12–24)

Month	Milestone
15	IRB approval secured (FWA; AI/SaMD-capable review)
18	Beta patients for UI/UX testing; clinician/patient engagement resources operational; begin IDE activities
21	IV&V Study 2 support (live user testing)
24	Full site readiness for Phase 2; deliverables: EHR dashboard, on-call escalation, automated agent control

Phase 2: Scalability Study Execution (Months 24–39)

Period	Activity
Months 24–39	Pragmatic RCT enrollment and follow-up (patient follow-up through Month 15 post-randomization)
Continuous	Safety monitoring with TA2, capture of operational/technical/economic endpoints
Month 39	Final Clinical Study Report (CSR) completed for FDA submission

15. Budget Justification (Summary)

Cost Categories

Category	Items
EHR Integration	Vendor program fees (Epic/Cerner pathways), interface engine costs
Adjudication & Monitoring	Clinician adjudication effort, DSMB/medical monitor
Device Provisioning	Smartphones/data plans, wearables for underserved participants
Equity Execution	Translation, community outreach, navigation support, screening-log operations
Claims Linkage	Data-use agreements for payer-grade PMPM analyses
Burden Instrumentation	In-app timers, EHR log extraction, time-motion substudy
Security & Monitoring	Audit logging, operational monitoring infrastructure

TA3 Budgeting Categories (Spec-Aligned)

Per-patient costs (recruitment, enrollment, device provisioning)
IT integration costs (interface engine/vendor fees; integration staff time)
Clinical staff research time (adjudication and documentation)
Administrative overhead (IRB fees, grant management)

16. Data Management, Monitoring, and Quality Assurance

Data Sources

EHR (FHIR R4 and/or HL7 v2)
Order-signing logs and In-Basket message logs
Audit logs (TA1/TA2)
Connected wearable/RPM platform data
PROs (KCCQ)
Claims feeds (required for ≥1 site)

Auditability

All TA1/TA2 inputs/outputs and clinician actions captured with timestamps, versions, and unique identifiers. Logs are immutable and retained per protocol.

Data Integrity Controls

Role-based access
Encryption in transit/at rest
Separation of duties between engineering and adjudication
Periodic log review for anomalies

Monitoring Plan

Risk-based monitoring with centralized data checks (missingness, outliers, protocol deviations)
Site monitoring for consent and endpoint ascertainment

Quality Management

Pre-Phase 2 validation (Fusion Protocol tests, downtime drills)
SOPs for incident response
Documented CAPA

16.1 Data Management and Sharing Plan (DMSP)

What Is Shared

Aggregated endpoint summaries
De-identified audit-log extracts for IV&V
Adjudication labels (de-identified)
Integration reliability metrics by site/vendor
Challenge-set and fusion protocol test reports

Cadence

Monthly operational/technical dashboards during Phase 2
Quarterly curated de-identified datasets for IV&V

De-Identification

HIPAA-aligned (safe harbor or expert determination)
Tokenization/pseudonymization for linkage
CUI handling for sensitive artifacts

17. TA3 Management, Collaboration, and Site Eligibility

Required Roles (Clinician-in-the-Loop Team)

Role	Responsibility
Supervising Cardiologist (PI)	Overall clinical responsibility
Clinical Adjudicators (NP/PA/MD)	Review pending orders; document accept/reject reasons
IT/Integration Specialist	Dedicated technical contact for EHR integration

Required Governance Capabilities

Site IRB has Federal Wide Assurance (FWA) and AI/SaMD review capacity
Participation in independent DSMB
24/7 escalation coverage via existing on-call systems

Collaboration and IV&V

TA3 sites collaborate with IV&V Partner on evaluation metrics
Participate in IV&V Study 1 and Study 2 per program schedule

IP Boundary (Program Requirement)

Hospital/TA3 site owns the clinical data
Regain/Prime owns the AI models (TA1/TA2)

17.1 Dealbreakers (Ineligibility Factors)

TA3 proposals are rejected if they:

#	Dealbreaker
1	Deny EHR data access or production/pre-production integration environments
2	Cannot recruit a population matching US demographics (lack diversity)
3	Restrict IP by claiming ownership over TA1/TA2 algorithms
4	Lack FWA for human subject research
5	Are foreign situs (outside the United States)
6	Do not detail clinician engagement for UI/UX and beta testing

18. References (Selected)

ICH E6(R2): Guideline for Good Clinical Practice
ICH E9(R1): Addendum on Estimands and Sensitivity Analysis
CONSORT-AI and SPIRIT-AI reporting guidelines for clinical trials involving AI interventions
AHA/ACC/HFSA heart failure guideline (contemporary version) for GDMT definitions and target dosing references
Contemporary ACC/AHA guidance on secondary prevention after myocardial infarction for post-MI therapy elements

Appendix A: TA3 Traceability Matrix

55-row mapping table linking each requirement from TA3 Official Specs (v1.2) to the corresponding section in this protocol, with notes and required evidence artifacts.

Spec Ref	Requirement	Protocol Section	Evidence Artifacts
1	TA3 is integration partner	12, 17, 4.3	Site LOI/MOU, Integration architecture
1	Scalability Study objective	1, 2, 8, 10	Trial synopsis, KPI list, Dashboard template
2.1	Multi-site RCT design	5.1–5.3	CONSORT-AI checklist, SAP
2.1	Intervention arm workflow	1, 7.2, 12.1	Workflow diagram, Screenshots
2.1	Control arm = Usual Care	7.1	Site SOC description
2.1	IDE study	Header, 5.1, 5.5	IDE sponsor statement, Risk analysis
2.2	NI clinical efficacy	8.1, 10.1–10.3	GCTS scoring manual, Power calc
2.2	Operational efficiency	8.4, 10	Burden instrumentation plan
2.2	Technical robustness	5.1, 8.4, 12	Multi-vendor site list, Uptime dashboard
2.2	Reimbursement evidence	8.4, 10, 16.1	Claims linkage DUA, PMPM analysis plan

Full 55-row matrix available in protocol source document.

Appendix B: Integration Qualification Checklist

Each participating TA3 site must complete this checklist in pre-production and re-validate in production prior to first enrollment.

Capability	Acceptance Criteria	Evidence Artifact
SMART on FHIR + OIDC auth	Context launch works; least-privilege scopes; role-based access	Screenshot, Token scope listing
FHIR R4 read feeds	Successful retrieval of required resources for test patients	FHIR query logs, Completeness report
HL7 v2 fallback	ADT/ORU/ORM messages received and parsed	Interface logs, Message samples
ADT feed latency	Events available within ≤60 seconds	Timestamped receipt logs
In-Basket draft messages	Draft created in correct pool with correct patient context	EHR screenshots, Audit-log entry
Pending medication/lab orders	Pending orders created and routed correctly	Order lifecycle logs
Encounter note drafts	Draft note written and routed for review	Note lifecycle logs
Scheduling/follow-up tasks	Task created per site workflow	Task logs
Audit logging completeness	100% of actions have required fields	Schema + samples, Completeness report
Fail-safe behavior	System blocks, enters fail-safe, logs, and notifies	Downtime drill report
Performance under load	TA2 latency meets targets; write success ≥99%	Load test report
Security controls	Encryption; secrets management; access reviews	Security checklist

Multi-vendor requirement: Checklist completed for each EHR vendor in the TA3 network.

Appendix C: Workflow Diagrams

C.1 Care Loop and Safety Gating

EHR + Wearables/RPM + Patient Inputs
            |
            v
     TA1 Clinical Agent
 (draft plan + pending orders)
            |
            v
     TA2 Supervisory Agent
 (approve | hard-stop | escalate)
            |
     +------+------+------------------+
     |             |                  |
     v             v                  v
PEND in EHR   BLOCK + Escalate   DATA-UNCERTAIN
(GREEN/YELLOW)   (urgent queue)     (needs remediation)
     |
     v
Clinician action (sign / modify / reject)
     |
     v
EHR executes + patient/team notified + immutable audit log

C.2 Offline Improvement Loop

Without Contaminating Phase 2:

Clinician decisions + structured rationales
                |
                v
      Label set (accept/reject/modify + reason codes)
                |
                v
 Offline analysis/training for next release (Phase 1B / post-trial)
                |
                v
Versioned release candidate
 (frozen during Phase 2 unless safety-driven IDE amendment)