Running & Managing
This chapter shows how to keep agents humming in production: monitor health, automate schedules, process bulk tasks, manage spend, and get alerted when something drifts.
8.1 Live Task Dashboard
Task List
Every job with status badge (Queued ▢, Running ▶, Succeeded ✔, Failed ✖)
Spot stuck or failing runs instantly
Timeline
Gantt‑style trace of each tool call with latency bars
Identify slow MCP hops (> 200 ms)
Agent Chat
Real‑time stream of model thoughts (when debug mode on)
Understand reasoning; copy snippets for support tickets
Details Pane
Inputs, outputs, cost, sandbox ID, policy decisions
Forensic debugging & auditing
Access via Home → Tasks or Builder → Tasks → View Runs.
Anomaly Highlight 🔍 – Rows turn amber if runtime > P95, red if policy blocked.
8.2 Recurring Schedules
Why schedule? Eliminate manual triggers, ensure reports arrive before business hours, and smooth out workload spikes.
Hourly
0 * * * *
Sync CRM leads
Daily
30 6 * * *
6 : 30 AM KPI digest
Weekly
0 1 * * 1
Monday 1 AM log rotation
Monthly
0 2 1 * *
First‑of‑month invoice summary
Custom
Any Cron‑5
Edge cases (e.g., 15th & last day)
How‑To: Builder → Tasks tab → flip Run on Schedule → pick template or enter Cron → save. Scheduled tasks appear with a ⏰ icon and green Next Run column.
Pro Tip ✨ Start with a small test window (e.g., next minute) before committing to production cadence.
8.3 Bulk Task Queues
Need to process 10k PDFs or 50k leads? Use Bulk Upload.
Prepare CSV with a header row matching input names (
file_url
,customer_id
, …).Builder → Tasks → Bulk Run → upload CSV.
Choose concurrency (default 50) and dead‑letter policy (retry×3 or move to Failed queue).
Click Launch – progress bar shows completed vs remaining.
Scale Math: Each micro‑VM sandbox can handle ~2-3 tool calls/sec. A 10k job with avg 3 calls = ~25 min at 50 concurrency.
8.4 Cost & Usage Dashboards
Navigate: Home → Cost.
Tokens Burned
per agent / per model
Tune prompts, choose cheaper model
MCP Latency P95
by tool
Identify slow back‑ends (DB, SaaS)
Sandbox Seconds
compute per integration
Spot heavy workloads (e.g., large GPT‑4 calls)
Spend vs Budget
daily burn vs set budget
Auto‑throttle when 80 % reached
Budget Guardrails 💰 – Set Hard Cap (terminate runs) or Soft Cap (pause schedules, alert owner). Budget events are logged in Policy journal.
8.5 Alerts & Notifications
Task Failure (> N retries)
Slack DM, Email
Toggle in Org Settings → Alerts
Budget Hit 80 %
Slack #ops‑alerts
Enabled by default
Policy Block
Webhook POST
Custom endpoint
Latency Spike (> 500 ms)
PagerDuty
Add via Integrations
Alerts include deep links to Task ID and Timeline for quick triage.
8.6 Operational Best Practices
Tag Agents by business unit (
finance/
,support/
) – dashboards auto‑group.Stagger Schedules (± 5 min) to avoid thundering‑herd on DB.
Enable Must‑Cite on KB tools to catch hallucinations early.
Review Policy Journal weekly; look for repeated blocks (might indicate missing tool scopes).
Export Metrics via OpenTelemetry to Grafana or Datadog.
Last updated