Running & Managing

This chapter shows how to keep agents humming in production: monitor health, automate schedules, process bulk tasks, manage spend, and get alerted when something drifts.

8.1 Live Task Dashboard

Panel
What It Shows
Why It Matters

Task List

Every job with status badge (Queued ▢, Running ▶, Succeeded ✔, Failed ✖)

Spot stuck or failing runs instantly

Timeline

Gantt‑style trace of each tool call with latency bars

Identify slow MCP hops (> 200 ms)

Agent Chat

Real‑time stream of model thoughts (when debug mode on)

Understand reasoning; copy snippets for support tickets

Details Pane

Inputs, outputs, cost, sandbox ID, policy decisions

Forensic debugging & auditing

Access via Home → Tasks or Builder → Tasks → View Runs.

Anomaly Highlight 🔍 – Rows turn amber if runtime > P95, red if policy blocked.

8.2 Recurring Schedules

Why schedule? Eliminate manual triggers, ensure reports arrive before business hours, and smooth out workload spikes.

Schedule Type
Cron DSL
Typical Use

Hourly

0 * * * *

Sync CRM leads

Daily

30 6 * * *

6 : 30 AM KPI digest

Weekly

0 1 * * 1

Monday 1 AM log rotation

Monthly

0 2 1 * *

First‑of‑month invoice summary

Custom

Any Cron‑5

Edge cases (e.g., 15th & last day)

How‑To: Builder → Tasks tab → flip Run on Schedule → pick template or enter Cron → save. Scheduled tasks appear with a ⏰ icon and green Next Run column.

Pro Tip ✨ Start with a small test window (e.g., next minute) before committing to production cadence.

8.3 Bulk Task Queues

Need to process 10k PDFs or 50k leads? Use Bulk Upload.

  1. Prepare CSV with a header row matching input names (file_url, customer_id, …).

  2. Builder → Tasks → Bulk Run → upload CSV.

  3. Choose concurrency (default 50) and dead‑letter policy (retry×3 or move to Failed queue).

  4. Click Launch – progress bar shows completed vs remaining.

Scale Math: Each micro‑VM sandbox can handle ~2-3 tool calls/sec. A 10k job with avg 3 calls = ~25 min at 50 concurrency.

8.4 Cost & Usage Dashboards

Navigate: Home → Cost.

Chart
Metric
Actionable Insight

Tokens Burned

per agent / per model

Tune prompts, choose cheaper model

MCP Latency P95

by tool

Identify slow back‑ends (DB, SaaS)

Sandbox Seconds

compute per integration

Spot heavy workloads (e.g., large GPT‑4 calls)

Spend vs Budget

daily burn vs set budget

Auto‑throttle when 80 % reached

Budget Guardrails 💰 – Set Hard Cap (terminate runs) or Soft Cap (pause schedules, alert owner). Budget events are logged in Policy journal.

8.5 Alerts & Notifications

Trigger
Channel
Configuration

Task Failure (> N retries)

Slack DM, Email

Toggle in Org Settings → Alerts

Budget Hit 80 %

Slack #ops‑alerts

Enabled by default

Policy Block

Webhook POST

Custom endpoint

Latency Spike (> 500 ms)

PagerDuty

Add via Integrations

Alerts include deep links to Task ID and Timeline for quick triage.

8.6 Operational Best Practices

  1. Tag Agents by business unit (finance/, support/) – dashboards auto‑group.

  2. Stagger Schedules (± 5 min) to avoid thundering‑herd on DB.

  3. Enable Must‑Cite on KB tools to catch hallucinations early.

  4. Review Policy Journal weekly; look for repeated blocks (might indicate missing tool scopes).

  5. Export Metrics via OpenTelemetry to Grafana or Datadog.

Last updated