Documentation
Real-time LLM cost tracking per agent, endpoint, and user. 3-line drop-in SDK.
Overview
Agent Cost Radar (ACR) is open source observability and routing recommendations for LLM agents. Drop in our SDK with three lines of code — every Anthropic and OpenAI call is tracked automatically: tokens, cost, latency, per-agent breakdown.
- Real-time per-agent cost tracking
- Anthropic + OpenAI auto-instrumentation
- Routing recommendations (Haiku vs Sonnet vs Opus)
- Free tier: 1,000 events/month
Quick start
Python
import acr
acr.init(api_key="acr_live_...", project="my-app")
# any anthropic.messages.create() is now auto-tracked
import anthropic
client = anthropic.Anthropic()
client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
)
JavaScript
import { ACR } from '@aimayak/acr-sdk';
import Anthropic from '@anthropic-ai/sdk';
const acr = new ACR({ apiKey: 'acr_live_...', project: 'my-app' });
const anthropic = acr.instrument(new Anthropic());
await anthropic.messages.create({
model: 'claude-sonnet-4-5',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Hello' }],
});
Install the SDK
Python (pip)
pip install agent-cost-radar
JavaScript / TypeScript (npm)
npm install @aimayak/acr-sdk
# or
pnpm add @aimayak/acr-sdk
API Reference
Status: Draft. Subject to change before v1.0 (planned 2026-Q3).
Stable contract: event schema fields below are forward-compatible (new optional fields may be added).
Base URL
https://api.agentcostradar.com/v1
For local development: http://localhost:8787/v1 via Cloudflare Workers wrangler.
Authentication
All requests require an Authorization: Bearer <api_key> header.
API keys have prefix acr_live_ (production) or acr_test_ (sandbox).
POST /v1/events HTTP/1.1
Host: api.agentcostradar.com
Authorization: Bearer acr_live_abc123...
Content-Type: application/json
POST /v1/events
Ingest one or more usage events from an instrumented LLM client.
Request — single event
{
"ts": "2026-05-14T19:35:00.123Z",
"project": "my-app",
"model": "claude-sonnet-4-5",
"input_tokens": 1234,
"output_tokens": 567,
"cost_usd": 0.012345,
"latency_ms": 842,
"agent_id": "researcher-v2",
"conversation_id": "conv_abc123",
"metadata": {
"user_id_hash": "sha256:...",
"feature": "search"
}
}
Request — batch (preferred — up to 1,000 events)
{
"events": [
{ "ts": "...", "project": "...", "model": "...", "..." },
{ "ts": "...", "project": "...", "model": "...", "..." }
]
}
Response — 202 Accepted
{
"accepted": 1,
"rejected": 0,
"request_id": "req_xyz789"
}
Response — 400 Bad Request
{
"error": "invalid_event",
"details": "field 'model' is required",
"rejected_indexes": [3]
}
Event schema
| Field | Type | Required | Description |
|---|---|---|---|
ts | ISO 8601 string | yes | UTC timestamp of the LLM call |
project | string | yes | Project slug (set via init(project=...)) |
model | string | yes | Model name (e.g. claude-sonnet-4-5, gpt-4o) |
input_tokens | integer | yes | Prompt tokens (incl. cached) |
output_tokens | integer | yes | Completion tokens |
cost_usd | float | yes | Computed cost in USD (6 decimal places) |
latency_ms | integer | no | Round-trip latency, milliseconds |
agent_id | string | no | Optional agent role identifier |
conversation_id | string | no | Logical conversation grouping |
cache_read_tokens | integer | no | Anthropic cache read tokens |
cache_write_tokens | integer | no | Anthropic cache write tokens |
provider | string | no | anthropic | openai | azure | custom |
metadata | object | no | User-defined tags (≤ 8 KB) |
GET /v1/events (planned, Week 3)
Query historical events with filters: ?project=...&from=...&to=...&model=...
GET /v1/insights (planned, Week 4)
Returns AI-generated cost optimization insights for a project.
{
"project": "my-app",
"period": "2026-05-07/2026-05-14",
"total_cost_usd": 124.56,
"insights": [
{
"kind": "model_downgrade",
"savings_usd_per_month": 89.50,
"message": "78% of calls use Opus for classification — Haiku would suffice."
}
]
}
Rate limits
| Tier | Events / minute | Burst |
|---|---|---|
| Free | 1,000 | 5,000 |
| Pro | 50,000 | 200,000 |
| Enterprise | unlimited | — |
429 responses include a Retry-After header.
Errors
| Code | Meaning |
|---|---|
| 400 | Invalid event payload |
| 401 | Missing or invalid API key |
| 403 | Project not allowed for this key |
| 413 | Batch too large (> 1,000 events or > 5 MB) |
| 429 | Rate limit exceeded |
| 5xx | Transient — SDK retries with exponential backoff |
SDK guide
SDK responsibilities
- Buffer & batch — accumulate events in memory, flush every 5 s or 100 events.
- Retry on 5xx / 429 — exponential backoff with jitter, max 3 attempts.
- Never block the user's LLM call — instrumentation runs after response, async.
- Drop on overflow — if buffer > 10K events (server unreachable), log warning and drop oldest.
Python SDK
Source: sdk/python on GitHub
from acr import init, track
# Auto-instrument
init(api_key="acr_live_...", project="my-app")
# Manual tracking (alternative)
track(
model="claude-sonnet-4-5",
input_tokens=1234,
output_tokens=567,
cost_usd=0.012345,
agent_id="researcher",
)
JavaScript SDK
Source: sdk/javascript on GitHub
import { ACR } from '@aimayak/acr-sdk';
const acr = new ACR({
apiKey: 'acr_live_...',
project: 'my-app',
flushInterval: 5000, // ms
batchSize: 100,
});
// Auto-instrument
const anthropic = acr.instrument(new Anthropic());
// Manual tracking
acr.track({
model: 'claude-sonnet-4-5',
inputTokens: 1234,
outputTokens: 567,
costUsd: 0.012345,
agentId: 'researcher',
});