AI Agents v3 · Consolidation

Bring the app's AI work onto agents — carefully

Templates, plans, text agents, analyzers, reports, dashboard briefing are slated for deprecation: agents should do that work. This maps every AI operation, proposes one shared scope-tagged tool registry and an org root agent — then runs an adversarial review that narrows the plan hard. The defensible thesis: unify at the tool/service layer, not the runtime layer.

Includes · 2 red-team critiques (technical + product/safety), code-verified Branch · feat/agent-memory-learning-evolution-1559 Status · design — for review, not built

00TL;DR & the corrected thesis

The brief asked to port the deprecated AI areas onto agents and share their tools. The research says "share the tools" is right; the adversarial review says "run them as agents" is right for only two of them.

★

The one-line conclusion

Unify at the tool/service layer (DRY), not at the runtime layer. Every AI operation should reuse one shared, scope-tagged tool catalog. But only the genuinely interactive, multi-step, judgment work should run inside an agent loop. The cheap, deterministic, event/cron-triggered extractors stay deterministic — they call the same shared tool services as plain functions, never an agent turn.

01 · SHARE TOOLS ✓

One scope-tagged catalog (shared/agent/assistant), one allowlist builder, relocated out of mods/assistant. This is the real, safe win.

02 · AGENTIFY 2 OF 6

Only plan executor & detailed briefing are already agentic. The rest (analyzers, task/fact extraction, enrichment, summary briefing, daily report) are single-shot — agentifying them is a regression.

03 · ROOT AGENT = BLOCKER

As specified it cannot act: domain tools need a concrete (user_id, org_id) member session; an org-only agent gets zero tools. Cloning to N members → N divergent brains. Needs a real identity redesign.

04 · SAFETY SURFACE

Routing contact-authored content (transcripts, inbound SMS) into a tool-armed agent turn scales the prompt-injection blast radius from "text-reply only" to "everything." Today those pipelines are toolless.

05 · COST 1 → 50×

A single extract_structured call vs a max_turns agent loop (cap 50). Per call, per night, per org, per member. Background chores shouldn't pay agent prices.

06 · PLANS NEED A DRAIN

The plan executor carries an approval queue, durable ask_user resume, and a plan_log audit timeline. Deprecating it without parity + an in-flight drain strands multi-day plans.

Part I

The substrate — one shared tool registry

01Two catalogs today

The plumbing already half-shares tools. Knowing exactly how is the foundation for everything else.

AiToolName (mods/ai/types, ~38 live "domain" tools) — built by collect_rig_tools() in mods/assistant/services/tool_registry_service.rs, gated by ABAC permission → page surfacing → tier flags. Reached by the assistant (page-surfaced) and agents (collect_agent_domain_rig_tools, all families). The only assistant-only tool is set_member_personalization.
AiAgentToolName (mods/ai_agent/tools/mod.rs, 14 "agent-control" tools) — built by build_agent_tools(allowlist, caller_thread_id), granular per-tool via the agent's stored tools_allowlist. Memory, agent-to-agent, escalation, skills, self-management — all curried with the thread.

Two more wrinkles: agents get domain tools all-or-nothing via one boolean attach_domain_tools (not the granular allowlist), and the plan executor has a third, entirely separate tool set in mods/plan/tools/build_tools.rs that touches neither catalog.

02Three tool scopes

Make scope a declared property, so "shared vs agent-only" is reviewable at a glance — not implied by a code path.

Scope	Meaning	Granted to	Why
shared	Domain capability over org data — contacts, messaging, calls, tasks, sales, analytics, infra.	Any AI area + any deterministic pipeline.	Scoped by capability: needs a `Session` + ABAC, no thread identity.
agent	An autonomous agent acting as itself over time — memory, schedules, agent-to-agent, escalation, skills.	Autonomous agents only.	Scoped by binding: curried with `caller_thread_id`, resolves a persistent identity.
assistant	Helps the live human session — `set_member_personalization`, arguably `delegate_bulk_operation`.	The human assistant only.	Scoped by actor: operates on a human's session/UI.

Capability vs binding — the load-bearing distinction

A shared tool only needs a Session, so a deterministic pipeline can call its service directly. An agent tool is curried with the thread and resolves "me, the agent" — it can never be a plain function call. This is exactly why the deterministic extractors can reuse shared tools without becoming agents (Part III).

03One shared registry

Fold the two catalogs into one scope-tagged catalog with a single build_tools(allowlist, ctx, scope_filter) in a neutral home. Each consumer supplies an allowlist + a scope filter.

AiToolName (unified catalog — AiAgentToolName folded in) • api_name() • display_label() • from_api_name() • scope() -> ToolScope { Shared | Agent | Assistant } ← NEW • build(ctx) -> Box<dyn ToolDyn> │ build_tools(allowlist, ctx, scope_filter) drop if scope not allowed → ABAC fails → tier off → build(ctx) ▲ ▲ ▲ ▲ ASSISTANT AGENT TEXT-REPLY DETERMINISTIC PIPELINE page-derived tools_allowlist channel default calls the tool SERVICE {Shd,Asst} {Shd,Agent} {Shd,Agent} directly (no agent loop)

"Neutral home" is real work, not a tag

The registry lives in mods/assistant today and mods/ai_agent already reaches sideways into it — a mods → mods edge. Adding plan/report/dashboard as consumers makes assistant a de-facto base, violating the bases → mods rule. The catalog must move to mods/ai (both consumers already depend on it) or a shared base. Budget for the relocation.

Part II

The operations — inventory, the split, the root agent

04Full AI-operation inventory

Every LLM operation in the app, its trigger, its current runtime, and whether it's tool-using or single-shot. The last column is the porting reality, justified in §5 and §9.

Operation	Trigger	Runtime today	Shape	Tools it needs
Assistant chat	human	rig agent loop	agentic	shared domain + `set_member_personalization`
Autonomous agents (text/webchat)	inbound msg / enqueue	rig agent loop	agentic	shared domain + agent-control
Plan executor / templates	approval + hourly job	aisdk agentic loop, 15 own tools	agentic	send_email/sms, update_contact, read history/notes, query_knowledge, ask_user, notify_user, complete/fail/schedule_next
Detailed briefing	human (after summary)	rig agent `.max_turns(10)` + structured extract	agentic	5 read tools (contact memory/messages/calls/details/tasks)
Text agents (SMS reply)	inbound SMS	aisdk loop (≤5), structured 3-suggestion	agentic	read_conversation, query_knowledge, send_message (already ported via #1454)
Summary briefing	human / scheduled	single `.schema::<BriefingOutput>`	deterministic	read-bundle (reused as services)
Daily report	cron, end-of-day	single free-form `prompt()`	deterministic	read yesterday calls/msgs/tasks/contacts
Call analyzers	post-call	single free-form `prompt()`, toolless	deterministic	none (transcript in, text out)
Task extraction from call	post-call	single `.schema::<ExtractedTasks>`, idempotent	deterministic	create_task / close_task (services)
Contact-fact extraction	post-call / msg / enrich	single structured, bi-temporal reconcile	deterministic	contact-memory reconcile primitives
Contact enrichment	human / bulk	single `.schema::<ConversationContactInfo>`	deterministic	update_contact (service)
Dashboard insights / KPIs	page load	pure SQL, no LLM	deterministic	none — never an agent
Voice call (realtime)	live call	streaming, no tool calls	—	none today

05The split that decides everything

"Bring the operations to agents" conflates two very different workloads. Separating them is the whole game.

Agentic — belongs in an agent loop

Plan executor, detailed briefing, text replies, the assistant. Multi-step, judgment, reacts to intermediate tool results, conversational. The agent runtime is the right model — these are already loops.

Deterministic extraction — must NOT

Analyzers, task/fact extraction, enrichment, summary briefing, daily report, insights. Single-shot, schema-locked or fixed-format, event/cron-triggered, toolless, cheap, idempotent. The output contract is the feature.

"Tools must be shared" ≠ "operations must be agents"

A deterministic pipeline and an agent can call the same create_task / extract_facts / analyze_transcript service. Share the capability; keep the runtime that fits the workload. This satisfies the brief's "tools must be shared" without dragging every chore through an agent turn.

06The root agent — idea vs reality

The cloning machinery already exists; the org-level twist does not work as a tweak.

What's already there

5 seeded platform agents (Master Orchestrator, Agent Creator singleton, Loquent Assistant, Follow-up Drafter, Text Reply). A platform agent has organization_id NULL AND user_id NULL — there is no is_system_agent column.
clone_system_agent_for_user(db, source_id, user_id, organization_id) clones per member; DEFAULT_USER_AGENT_SOURCES (4) is provisioned on signup. An auto_clone_for_new_members flag exists but is currently inert plumbing.
The ai_agent row already carries cron_expressions, event_triggers, budget, enable_learning (now defaulted true), send_mode (defaulted autonomous), plus the #1531 schedule poller.

Why "org agent cloned to members" breaks (verified)

① build_domain_tools_for_agent (run_ai_agent_thread_service.rs:1814) requires (Some(user_id), Some(org_id)) — an org-only agent (user_id NULL) returns zero domain tools, and build_session_for_member demands a real member row (no synthesized service session exists). ② resolve_owning_user_org terminates only on a thread whose agent has both ids; an org ancestor is indistinguishable from "no owner," and all 5 callers consume a concrete user_id. ③ ai_agent_memory.agent_id is unique and learning is keyed per agent_id — cloning to N members yields N divergent memories + N× billed learning digests. The viable design is a single org-owned agent with a service identity + permission model and org-level memory/learning — a real spike, not a resolver patch.

07Shared tools the agentic work needs

New shared capability tools (so plans/briefing/assistant all reuse them) + the agent schedule tools from the prior round.

Tool	Scope	Backing	For
send_email	shared	plan executor's send_email path	plans, agents
query_knowledge	shared	`knowledge::handle_query_knowledge_tool_call`	plans, text agents, agents
read_conversation_history · read_contact_notes	shared	existing services	plans, briefing, agents
analyze_call_transcript · extract_tasks_from_call · extract_contact_facts · enrich_contact	shared	the deterministic services, exposed as callable tools	agent may call; pipeline calls service
create_schedule · cancel_schedule · list_my_schedules	agent new	`create_ai_agent_schedule` etc.	agents — "follow up tomorrow, then close it" (needs a one-shot `Once` recurrence + contact context in the wake)

Schedule self-creation carries a trust hazard

A scheduled wake is framed to the model as "from your owner — not the contact." If contact-authored content can cause the agent to create a schedule, the contact launders intent into a future owner-trusted turn. Guardrail: create_schedule off-by-default, capped per agent, and never invokable in the same turn that processed contact content (see §8.2).

Part III

Adversarial review & the revised plan

08Red-team findings

Two independent critics (technical + product/safety) attacked the plan against the actual code. The strongest objections, most-damaging first, each with my disposition.

blocker8.1 · The org root agent cannot execute a single tool

build_domain_tools_for_agent returns Vec::new() for user_id = NULL; build_session_for_member needs a real member row. The plan targeted resolve_owning_user_org, which the domain-tool path never calls. "Which member's permissions does an org agent run as?" has no answer in the architecture.

Accepted. Demote the root agent to a redesign spike: define a service identity + org permission model + org-level memory/learning before any build. Do not ship clone-to-members.

critical8.2 · Prompt-injection gains a blast radius it doesn't have today

Analyzers and reports feed contact-authored content (transcripts, inbound SMS) into a toolless single call — worst case it corrupts its own output. Route that content through an agent turn and it shares context with send_sms, update_contact_memory, create_agent, modify_agent_tools… The existing envelope sanitization is mitigation, not elimination. Agentifying widens the population of contact messages that reach a tool-armed context from "text-reply only" to "everything."

Accepted. Keep contact-content extraction toolless and deterministic. Never let contact content seed an owner-framed schedule. This alone kills the "route extraction through agents" idea.

critical8.3 · send_mode defaults to autonomous

The column default is "autonomous" (migration m20260611_130001:24); the much-cited fail-safe-to-Suggest only fires on an unrecognized string, not the default. A normally-created agent processing inbound content can send_sms with no human approval. Widening the agentified send paths multiplies unreviewed outbound to real customers, seeded by content the customer wrote.

Accepted. Any newly-agentified send path defaults to Suggest; autonomous send is an explicit per-channel opt-in with a plain-language consequence.

blocker8.4 · Deprecating plans drops approval / resume / audit and freezes in-flight plans

The executor has a two-level human approval gate, durable ask_user pause/resume (persisted to plan.state + plan_log), and a plan_log replay/audit timeline. The agent system has no per-action staged-yes/no primitive (send_mode is turn-level). The kill-switch is a per-org tier gate with no drain path, so flipping it strands multi-day plans in AwaitingInput/StandBy.

Accepted. Plan-executor port is a separate, gated track: reproduce approval + durable resume + audit first, build a drain-to-terminal migration, remap billing — or rebuild "campaigns" natively before deprecating.

high8.5 · Determinism becomes variance; owners read "different" as "broken"

Summary briefing uses .schema::<BriefingOutput> — a guaranteed shape on a cheap model. An agent can no-op (the rig-0.38 empty-turn failure is a documented gotcha here), return a different action set on identical data, or vary tone/length. For a non-technical owner, a 5am report that silently varies or skips reads as an unreliable product, not "stochastic LLM."

Accepted. Summary briefing, daily report, and insights stay deterministic. Determinism is the feature for these.

high8.6 · Cost: one cheap call → an up-to-50-turn loop, fanned out

Detailed briefing already shows the multiplier in-tree: .max_turns(10) + a structured pass = 1→up to 11×, with history replayed each turn. The agent cap is AGENT_MAX_TOOL_TURNS = 50 on Sonnet-tier models. Multiply by every call ended, every nightly run, every org × member. All usage is metered.

Accepted. Reserve the loop for interactive work. Background extraction stays a single cheap structured call.

high8.7 · Root agent cloned per member manufactures one-brand divergence

Per-member clones + enable_learning defaulted true and backfilled to every agent (commit 4f9308d8) guarantees drift: each member's clone accrues its own memory + learning. Two members of one business reply differently to the same contact under one brand. That's brand incoherence, not personalization; ownership of "the company's AI" is ambiguous.

Accepted. Org-wide automations (anything that speaks to shared contacts under the brand) run on a single org-owned agent with org-level learning. Per-member agents stay personal scratchpads. Settle this before learning-on-by-default speaks to shared customers.

medium8.8 · "Configure your agent" makes guarantees harder, not easier

"Always extract tasks after every call" is trivial in a deterministic pipeline (it just runs). Expressed as agent instructions, it becomes prompt engineering against a model that may decline — and "why didn't it this time?" has no good answer. The repo's own UX rules target non-technical owners and "features that just work."

Accepted. Keep focused deterministic config for guaranteed-behavior areas; reuse the shared tool services underneath rather than collapsing config into a persona.

medium8.9 · Step-5 incoherence + orphaned billing/tier surfaces

"Keep the pipeline deterministic AND expose the same logic as an agent tool" means maintaining two invocation paths with different reliability contracts (the tool path loses strict:true at the argument boundary). Separately, each area owns an AiUsageFeature variant + tier gates (AutonomousPlans, CustomAnalyzers); agentifying silently re-meters customers and drops paywalls.

Partially accepted. Resolve the incoherence: the shared service is the single source; the agent-tool wrapper is thin and used only for the genuinely interactive case. Billing remap is an explicit, mandatory step — not an afterthought.

09Revised per-area verdict

After the red-team, here's the disposition for each operation. Two areas agentify; the rest reuse shared tools but keep their runtime.

Area	Verdict	Why
Text replies	agentify	Already an agent loop (#1454). Unify onto the shared registry; default `Suggest`.
Detailed briefing	agentify	Already a rig loop. Fold into the agent runtime; reuse the 5 read tools as shared.
Plan executor / templates	agentify w/ guardrails	Agentic, but only after approval + durable resume + audit parity, an in-flight drain, and billing remap. Separate track.
Summary briefing	keep deterministic	Schema-locked guarantee, cheap, daily artifact. Reuse read-bundle services.
Daily report	keep deterministic	Dependable nightly output; variance reads as broken. A schedule may trigger the deterministic generator — not an agent loop.
Call analyzers	keep deterministic	Toolless on contact content (safety). Expose `analyze_transcript` as a shared tool an agent may call; background stays toolless.
Task extraction	keep deterministic	Idempotency + cost + safety. Shared `extract_tasks` service reused by both paths.
Contact-fact extraction	keep deterministic	Bi-temporal strict schema. Interactive path already exists via `update_contact_memory`.
Contact enrichment	keep deterministic	Structured profile fill. Shareable as a tool; runtime stays single-shot.
Insights / KPIs	keep — no LLM	Pure SQL. Agentifying arithmetic is non-negotiably wrong.
Org root agent	redesign first	Can't act / can't resolve / diverges when cloned. Needs a service-identity spike before any build.

10Revised build order

Front-load the safe, high-leverage shared-layer work; gate the risky runtime moves behind redesign.

P0Unify the tool registry at the service layerL

Scope-tagged catalog, ToolContext, build_tools(allowlist, ctx, scope_filter), relocate out of mods/assistant into mods/ai. Behaviour-preserving; tests pin the same sets. The DRY win the brief actually wants.
P1Add shared capability toolsM

send_email, query_knowledge, read_conversation_history, read_contact_notes, plus thin tool wrappers over the extraction services. Tag shared. Pipelines keep calling the services directly.
P2Granular domain allowlistM

Expand attach_domain_tools into a recommended shared subset; let an agent name individual domain tools. Update the capabilities UI.
P3Schedule tools + guardrailsM

create/cancel/list_schedule + a one-shot Once recurrence + contact context in the wake. Off-by-default, capped, never same-turn-as-contact-content. Ships "follow up tomorrow, then close it."
P4Fold the already-agentic areas inM

Detailed briefing → agent runtime; text replies fully onto the shared registry (default Suggest). No new runtime risk — they're already loops.
P5Org root agent — design spikeXL · gate

Service identity + org permission model + org-level shared memory/learning. Prototype before committing. Blocks any "clone to members" build.
P6Plan executor → campaignsXL · gate

Reproduce approval queue + durable resume + audit timeline, build the in-flight drain, remap AiUsageFeature + tier gates. Only then deprecate mods/plan.
—Keep deterministic (refactor to reuse, not rebuild)ongoing

Summary briefing, daily report, analyzers, task/fact/enrichment extraction, insights: point them at the shared tool services for DRY. Runtime unchanged.

11Open decisions

The choices that actually gate the work. My lean is marked.

Do we accept "unify tools, not runtimes" as the governing principle?

Yes — share the catalog; agentify only the already-agentic areas; keep deterministic pipelines deterministic (they reuse shared services).
No — push everything through agents (accepts the cost / determinism / safety regressions in §8).

Org root agent identity model?

Single org-owned agent with a service identity + explicit org permission scope + org-level memory/learning (a new primitive).
Designate a "service member" whose Session the org agent borrows (reuses today's path; ties org behaviour to one human's permissions).
Drop the org agent; keep per-member only (no shared org brain).

Plan executor — port or rebuild?

Rebuild "campaigns" natively on agents (schedules + event_triggers) with first-class approval/audit, then migrate & deprecate mods/plan.
Port the executor's loop wholesale, reproducing its 15 tools + approval/resume/audit inside the agent runtime.
Keep mods/plan running in parallel until campaigns reach parity.

Billing for the deprecated areas?

Explicitly remap each AiUsageFeature variant + tier gate to its new home before flipping any area; no silent re-metering.
Collapse everything to AiUsageFeature::Agent (simpler, but changes customer line items + drops paywalls).

Learning-on-by-default for contact-facing agents under the org brand?

Gate it until org-vs-personal ownership is settled — a shared customer relationship shouldn't be shaped by one member's clone's private learning.
Leave on (current state); accept per-member behavioural drift across one brand.