Bring the app's AI work onto agents — carefully
Templates, plans, text agents, analyzers, reports, dashboard briefing are slated for deprecation: agents should do that work. This maps every AI operation, proposes one shared scope-tagged tool registry and an org root agent — then runs an adversarial review that narrows the plan hard. The defensible thesis: unify at the tool/service layer, not the runtime layer.
00TL;DR & the corrected thesis
The brief asked to port the deprecated AI areas onto agents and share their tools. The research says "share the tools" is right; the adversarial review says "run them as agents" is right for only two of them.
The one-line conclusion
Unify at the tool/service layer (DRY), not at the runtime layer. Every AI operation should reuse one shared, scope-tagged tool catalog. But only the genuinely interactive, multi-step, judgment work should run inside an agent loop. The cheap, deterministic, event/cron-triggered extractors stay deterministic — they call the same shared tool services as plain functions, never an agent turn.
One scope-tagged catalog (shared/agent/assistant), one allowlist builder, relocated out of mods/assistant. This is the real, safe win.
Only plan executor & detailed briefing are already agentic. The rest (analyzers, task/fact extraction, enrichment, summary briefing, daily report) are single-shot — agentifying them is a regression.
As specified it cannot act: domain tools need a concrete (user_id, org_id) member session; an org-only agent gets zero tools. Cloning to N members → N divergent brains. Needs a real identity redesign.
Routing contact-authored content (transcripts, inbound SMS) into a tool-armed agent turn scales the prompt-injection blast radius from "text-reply only" to "everything." Today those pipelines are toolless.
A single extract_structured call vs a max_turns agent loop (cap 50). Per call, per night, per org, per member. Background chores shouldn't pay agent prices.
The plan executor carries an approval queue, durable ask_user resume, and a plan_log audit timeline. Deprecating it without parity + an in-flight drain strands multi-day plans.
The substrate — one shared tool registry
01Two catalogs today
The plumbing already half-shares tools. Knowing exactly how is the foundation for everything else.
AiToolName(mods/ai/types, ~38 live "domain" tools) — built bycollect_rig_tools()inmods/assistant/services/tool_registry_service.rs, gated by ABAC permission → page surfacing → tier flags. Reached by the assistant (page-surfaced) and agents (collect_agent_domain_rig_tools, all families). The only assistant-only tool isset_member_personalization.AiAgentToolName(mods/ai_agent/tools/mod.rs, 14 "agent-control" tools) — built bybuild_agent_tools(allowlist, caller_thread_id), granular per-tool via the agent's storedtools_allowlist. Memory, agent-to-agent, escalation, skills, self-management — all curried with the thread.
Two more wrinkles: agents get domain tools all-or-nothing via one boolean attach_domain_tools (not the granular allowlist), and the plan executor has a third, entirely separate tool set in mods/plan/tools/build_tools.rs that touches neither catalog.
02Three tool scopes
Make scope a declared property, so "shared vs agent-only" is reviewable at a glance — not implied by a code path.
| Scope | Meaning | Granted to | Why |
|---|---|---|---|
| shared | Domain capability over org data — contacts, messaging, calls, tasks, sales, analytics, infra. | Any AI area + any deterministic pipeline. | Scoped by capability: needs a Session + ABAC, no thread identity. |
| agent | An autonomous agent acting as itself over time — memory, schedules, agent-to-agent, escalation, skills. | Autonomous agents only. | Scoped by binding: curried with caller_thread_id, resolves a persistent identity. |
| assistant | Helps the live human session — set_member_personalization, arguably delegate_bulk_operation. | The human assistant only. | Scoped by actor: operates on a human's session/UI. |
Capability vs binding — the load-bearing distinction
A shared tool only needs a Session, so a deterministic pipeline can call its service directly. An agent tool is curried with the thread and resolves "me, the agent" — it can never be a plain function call. This is exactly why the deterministic extractors can reuse shared tools without becoming agents (Part III).
03One shared registry
Fold the two catalogs into one scope-tagged catalog with a single build_tools(allowlist, ctx, scope_filter) in a neutral home. Each consumer supplies an allowlist + a scope filter.
"Neutral home" is real work, not a tag
The registry lives in mods/assistant today and mods/ai_agent already reaches sideways into it — a mods → mods edge. Adding plan/report/dashboard as consumers makes assistant a de-facto base, violating the bases → mods rule. The catalog must move to mods/ai (both consumers already depend on it) or a shared base. Budget for the relocation.
The operations — inventory, the split, the root agent
04Full AI-operation inventory
Every LLM operation in the app, its trigger, its current runtime, and whether it's tool-using or single-shot. The last column is the porting reality, justified in §5 and §9.
| Operation | Trigger | Runtime today | Shape | Tools it needs |
|---|---|---|---|---|
| Assistant chat | human | rig agent loop | agentic | shared domain + set_member_personalization |
| Autonomous agents (text/webchat) | inbound msg / enqueue | rig agent loop | agentic | shared domain + agent-control |
| Plan executor / templates | approval + hourly job | aisdk agentic loop, 15 own tools | agentic | send_email/sms, update_contact, read history/notes, query_knowledge, ask_user, notify_user, complete/fail/schedule_next |
| Detailed briefing | human (after summary) | rig agent .max_turns(10) + structured extract | agentic | 5 read tools (contact memory/messages/calls/details/tasks) |
| Text agents (SMS reply) | inbound SMS | aisdk loop (≤5), structured 3-suggestion | agentic | read_conversation, query_knowledge, send_message (already ported via #1454) |
| Summary briefing | human / scheduled | single .schema::<BriefingOutput> | deterministic | read-bundle (reused as services) |
| Daily report | cron, end-of-day | single free-form prompt() | deterministic | read yesterday calls/msgs/tasks/contacts |
| Call analyzers | post-call | single free-form prompt(), toolless | deterministic | none (transcript in, text out) |
| Task extraction from call | post-call | single .schema::<ExtractedTasks>, idempotent | deterministic | create_task / close_task (services) |
| Contact-fact extraction | post-call / msg / enrich | single structured, bi-temporal reconcile | deterministic | contact-memory reconcile primitives |
| Contact enrichment | human / bulk | single .schema::<ConversationContactInfo> | deterministic | update_contact (service) |
| Dashboard insights / KPIs | page load | pure SQL, no LLM | deterministic | none — never an agent |
| Voice call (realtime) | live call | streaming, no tool calls | — | none today |
05The split that decides everything
"Bring the operations to agents" conflates two very different workloads. Separating them is the whole game.
Agentic — belongs in an agent loop
Plan executor, detailed briefing, text replies, the assistant. Multi-step, judgment, reacts to intermediate tool results, conversational. The agent runtime is the right model — these are already loops.
Deterministic extraction — must NOT
Analyzers, task/fact extraction, enrichment, summary briefing, daily report, insights. Single-shot, schema-locked or fixed-format, event/cron-triggered, toolless, cheap, idempotent. The output contract is the feature.
"Tools must be shared" ≠ "operations must be agents"
A deterministic pipeline and an agent can call the same create_task / extract_facts / analyze_transcript service. Share the capability; keep the runtime that fits the workload. This satisfies the brief's "tools must be shared" without dragging every chore through an agent turn.
06The root agent — idea vs reality
The cloning machinery already exists; the org-level twist does not work as a tweak.
What's already there
- 5 seeded platform agents (Master Orchestrator, Agent Creator singleton, Loquent Assistant, Follow-up Drafter, Text Reply). A platform agent has
organization_id NULL AND user_id NULL— there is nois_system_agentcolumn. clone_system_agent_for_user(db, source_id, user_id, organization_id)clones per member;DEFAULT_USER_AGENT_SOURCES(4) is provisioned on signup. Anauto_clone_for_new_membersflag exists but is currently inert plumbing.- The
ai_agentrow already carriescron_expressions,event_triggers,budget,enable_learning(now defaulted true),send_mode(defaulted autonomous), plus the #1531 schedule poller.
Why "org agent cloned to members" breaks (verified)
① build_domain_tools_for_agent (run_ai_agent_thread_service.rs:1814) requires (Some(user_id), Some(org_id)) — an org-only agent (user_id NULL) returns zero domain tools, and build_session_for_member demands a real member row (no synthesized service session exists). ② resolve_owning_user_org terminates only on a thread whose agent has both ids; an org ancestor is indistinguishable from "no owner," and all 5 callers consume a concrete user_id. ③ ai_agent_memory.agent_id is unique and learning is keyed per agent_id — cloning to N members yields N divergent memories + N× billed learning digests. The viable design is a single org-owned agent with a service identity + permission model and org-level memory/learning — a real spike, not a resolver patch.
Adversarial review & the revised plan
08Red-team findings
Two independent critics (technical + product/safety) attacked the plan against the actual code. The strongest objections, most-damaging first, each with my disposition.
build_domain_tools_for_agent returns Vec::new() for user_id = NULL; build_session_for_member needs a real member row. The plan targeted resolve_owning_user_org, which the domain-tool path never calls. "Which member's permissions does an org agent run as?" has no answer in the architecture.
Analyzers and reports feed contact-authored content (transcripts, inbound SMS) into a toolless single call — worst case it corrupts its own output. Route that content through an agent turn and it shares context with send_sms, update_contact_memory, create_agent, modify_agent_tools… The existing envelope sanitization is mitigation, not elimination. Agentifying widens the population of contact messages that reach a tool-armed context from "text-reply only" to "everything."
send_mode defaults to autonomousThe column default is "autonomous" (migration m20260611_130001:24); the much-cited fail-safe-to-Suggest only fires on an unrecognized string, not the default. A normally-created agent processing inbound content can send_sms with no human approval. Widening the agentified send paths multiplies unreviewed outbound to real customers, seeded by content the customer wrote.
Suggest; autonomous send is an explicit per-channel opt-in with a plain-language consequence.The executor has a two-level human approval gate, durable ask_user pause/resume (persisted to plan.state + plan_log), and a plan_log replay/audit timeline. The agent system has no per-action staged-yes/no primitive (send_mode is turn-level). The kill-switch is a per-org tier gate with no drain path, so flipping it strands multi-day plans in AwaitingInput/StandBy.
Summary briefing uses .schema::<BriefingOutput> — a guaranteed shape on a cheap model. An agent can no-op (the rig-0.38 empty-turn failure is a documented gotcha here), return a different action set on identical data, or vary tone/length. For a non-technical owner, a 5am report that silently varies or skips reads as an unreliable product, not "stochastic LLM."
Detailed briefing already shows the multiplier in-tree: .max_turns(10) + a structured pass = 1→up to 11×, with history replayed each turn. The agent cap is AGENT_MAX_TOOL_TURNS = 50 on Sonnet-tier models. Multiply by every call ended, every nightly run, every org × member. All usage is metered.
Per-member clones + enable_learning defaulted true and backfilled to every agent (commit 4f9308d8) guarantees drift: each member's clone accrues its own memory + learning. Two members of one business reply differently to the same contact under one brand. That's brand incoherence, not personalization; ownership of "the company's AI" is ambiguous.
"Always extract tasks after every call" is trivial in a deterministic pipeline (it just runs). Expressed as agent instructions, it becomes prompt engineering against a model that may decline — and "why didn't it this time?" has no good answer. The repo's own UX rules target non-technical owners and "features that just work."
"Keep the pipeline deterministic AND expose the same logic as an agent tool" means maintaining two invocation paths with different reliability contracts (the tool path loses strict:true at the argument boundary). Separately, each area owns an AiUsageFeature variant + tier gates (AutonomousPlans, CustomAnalyzers); agentifying silently re-meters customers and drops paywalls.
09Revised per-area verdict
After the red-team, here's the disposition for each operation. Two areas agentify; the rest reuse shared tools but keep their runtime.
| Area | Verdict | Why |
|---|---|---|
| Text replies | agentify | Already an agent loop (#1454). Unify onto the shared registry; default Suggest. |
| Detailed briefing | agentify | Already a rig loop. Fold into the agent runtime; reuse the 5 read tools as shared. |
| Plan executor / templates | agentify w/ guardrails | Agentic, but only after approval + durable resume + audit parity, an in-flight drain, and billing remap. Separate track. |
| Summary briefing | keep deterministic | Schema-locked guarantee, cheap, daily artifact. Reuse read-bundle services. |
| Daily report | keep deterministic | Dependable nightly output; variance reads as broken. A schedule may trigger the deterministic generator — not an agent loop. |
| Call analyzers | keep deterministic | Toolless on contact content (safety). Expose analyze_transcript as a shared tool an agent may call; background stays toolless. |
| Task extraction | keep deterministic | Idempotency + cost + safety. Shared extract_tasks service reused by both paths. |
| Contact-fact extraction | keep deterministic | Bi-temporal strict schema. Interactive path already exists via update_contact_memory. |
| Contact enrichment | keep deterministic | Structured profile fill. Shareable as a tool; runtime stays single-shot. |
| Insights / KPIs | keep — no LLM | Pure SQL. Agentifying arithmetic is non-negotiably wrong. |
| Org root agent | redesign first | Can't act / can't resolve / diverges when cloned. Needs a service-identity spike before any build. |
10Revised build order
Front-load the safe, high-leverage shared-layer work; gate the risky runtime moves behind redesign.
- P0Unify the tool registry at the service layerL
Scope-tagged catalog,
ToolContext,build_tools(allowlist, ctx, scope_filter), relocate out ofmods/assistantintomods/ai. Behaviour-preserving; tests pin the same sets. The DRY win the brief actually wants. - P1Add shared capability toolsM
send_email,query_knowledge,read_conversation_history,read_contact_notes, plus thin tool wrappers over the extraction services. Tag shared. Pipelines keep calling the services directly. - P2Granular domain allowlistM
Expand
attach_domain_toolsinto a recommended shared subset; let an agent name individual domain tools. Update the capabilities UI. - P3Schedule tools + guardrailsM
create/cancel/list_schedule+ a one-shotOncerecurrence + contact context in the wake. Off-by-default, capped, never same-turn-as-contact-content. Ships "follow up tomorrow, then close it." - P4Fold the already-agentic areas inM
Detailed briefing → agent runtime; text replies fully onto the shared registry (default
Suggest). No new runtime risk — they're already loops. - P5Org root agent — design spikeXL · gate
Service identity + org permission model + org-level shared memory/learning. Prototype before committing. Blocks any "clone to members" build.
- P6Plan executor → campaignsXL · gate
Reproduce approval queue + durable resume + audit timeline, build the in-flight drain, remap
AiUsageFeature+ tier gates. Only then deprecatemods/plan. - —Keep deterministic (refactor to reuse, not rebuild)ongoing
Summary briefing, daily report, analyzers, task/fact/enrichment extraction, insights: point them at the shared tool services for DRY. Runtime unchanged.
11Open decisions
The choices that actually gate the work. My lean is marked.
Do we accept "unify tools, not runtimes" as the governing principle?
- Yes — share the catalog; agentify only the already-agentic areas; keep deterministic pipelines deterministic (they reuse shared services).
- No — push everything through agents (accepts the cost / determinism / safety regressions in §8).
Org root agent identity model?
- Single org-owned agent with a service identity + explicit org permission scope + org-level memory/learning (a new primitive).
- Designate a "service member" whose Session the org agent borrows (reuses today's path; ties org behaviour to one human's permissions).
- Drop the org agent; keep per-member only (no shared org brain).
Plan executor — port or rebuild?
- Rebuild "campaigns" natively on agents (schedules + event_triggers) with first-class approval/audit, then migrate & deprecate
mods/plan. - Port the executor's loop wholesale, reproducing its 15 tools + approval/resume/audit inside the agent runtime.
- Keep
mods/planrunning in parallel until campaigns reach parity.
Billing for the deprecated areas?
- Explicitly remap each
AiUsageFeaturevariant + tier gate to its new home before flipping any area; no silent re-metering. - Collapse everything to
AiUsageFeature::Agent(simpler, but changes customer line items + drops paywalls).
Learning-on-by-default for contact-facing agents under the org brand?
- Gate it until org-vs-personal ownership is settled — a shared customer relationship shouldn't be shaped by one member's clone's private learning.
- Leave on (current state); accept per-member behavioural drift across one brand.