Attribution you can trust
Cost is attributed at the proxy, keyed by your server's API key and headers. Not self-reported from the client, and not spoofable by end users. Per customer, user, feature, workflow, and agent run.
Marginary ties every LLM call to a customer and joins it with your Stripe revenue. So you can see where your AI product makes money and where it burns it, for every customer, feature, and agent run.
One baseURL change. Prompts never stored. Metering failures never block your calls.
| Customer | Plan | Revenue | AI cost | Margin | Status |
|---|---|---|---|---|---|
| Meridian AI | Growth | $199.00 | $342.18 | −72% | losing money |
| Fieldnote | Free | $0.00 | $86.12 | — | free-tier burn |
| Halcyon Labs | Startup | $79.00 | $41.03 | +48% | healthy |
| Solstice Research | Startup | $79.00 | $12.77 | +84% | healthy |
| Beacon & Co | Indie | $19.00 | $2.41 | +87% | healthy |
No SDK to adopt, no agent to deploy. Marginary is a streaming-safe proxy in front of the APIs you already call. Your provider keys pass through untouched. Marginary never stores them and never logs them.
Swap your baseURL. Requests stream through byte for byte. If our metering breaks, your calls still go through.
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.marginary.xanova.ai/v1", // ← the only change defaultHeaders: { "X-TokenOps-Key": env.TOKENOPS_KEY }, });
Headers, not payload changes. Every request lands attributed to a customer, feature, workflow, or agent run. Attribution happens on your server, so end users can't spoof it.
await client.chat.completions.create(body, { headers: { "X-TokenOps-Customer": "acme", "X-TokenOps-Feature": "support_chat", "X-TokenOps-Agent-Run": "run_4921", }, });
Sync Stripe subscriptions read-only (or post revenue manually). Cost meets revenue, and every customer gets a live gross margin.
$ GET /api/margin acme revenue $199.00 cost $312.40 margin −57% ● losing money
Built for the money path. Prices live in a database, not in code, and the math is fixed-point. If we don't know a model's price, we say unknown instead of guessing.
Cost is attributed at the proxy, keyed by your server's API key and headers. Not self-reported from the client, and not spoofable by end users. Per customer, user, feature, workflow, and agent run.
Stripe subscriptions sync read-only into a revenue join, and every customer gets a live gross margin. Worst first, so the account quietly eating your pricing plan is row one.
Free plans with real AI cost are flagged before they compound. Know which signups to convert, cap, or cut, with the dollar amount attached.
Soft alerts or a hard block at the proxy, per org, customer, or feature. The only sanctioned way a request gets rejected, and the only 402 you'll ever be glad to see.
Streaming passes through byte-for-byte. If our recorder, pricing lookup, or budget engine fails, your customer's AI call still completes. Auth is the only thing that fails closed.
Prompts and completions are never stored, logged, or traced. Not in the database, not in error logs, not anywhere. Token counts, models, latency, and your business tags. That's the entire record.
Great tools already exist for traces, routing, and invoices. None of them answer the founder's question:
“Which customers do I lose money on?”
Answering it takes joining per-request cost to per-customer revenue. And revenue lives in Stripe, not in your traces.
| Category | Main buyer | Primary value |
|---|---|---|
| LLM observabilityLangfuse · Helicone | AI engineers | Debug traces and prompts |
| Model gatewaysLiteLLM · Portkey | Platform teams | Routing, keys, rate limits |
| Usage billingOpenMeter · Lago | Billing & RevOps | Metering and invoicing your customers |
| MarginaryAI FinOps | Founders, SaaS & ops teams | Cost vs revenue: gross margin per customer |
Back-of-the-envelope unit economics for your AI feature. List prices, no caching discounts. Production traffic is spikier than any average, which is rather the point.
A margin product that wrecked your margin with usage-based billing would be a bad joke. Early access pricing. The founding cohort locks it in.
We're onboarding a founding cohort of AI SaaS teams spending $100 to $10k a month on model APIs. Setup is one line. The first worst-margin customer usually shows up the same day.