Paste a Claude Code session log, an API response, or a stripped {usage:…} JSONL — and the VU meter animates the per-turn burn across four channels: input, output, cache-read, cache-creation. Late-90s Winamp chrome on a Win98 dialog frame; CRT-green gradient bars, peak-hold ticks, log-scale heights. Optional unlock: a 20-item Token Budget Optimization Playbook.
Twenty concrete moves for cutting Claude token burn without losing capability. Caching invariants, pruning patterns, routing rules, measurement habits — each item is a config line, an SDK option, a CLI flag, or a process change with a P0/P1/P2 priority chip and primary-source citations. One 5,000-sat confirmed payment unlocks all seven current priced features site-wide (Promptle Hard Mode, Token Tetris Daily Challenge, Stumble Deep Cut, Vibe Code Security remediation, Comprehension Recovery Checklist, AI Review Noise Reduction Playbook, and Token Budget Optimization Playbook). Paste the same TXID on any priced page; the shared client-side verifier accepts any prior confirmed 5,000-sat tx to the address. Verified via mempool.space + blockstream.info — no account, no server.
To the address below. Any modern wallet works (Mutiny, Phoenix, Wallet of Satoshi, Sparrow). Wait for one block confirmation (~10–30 minutes).
From your wallet's transaction history. 64 hex characters, no prefix.
Verifier checks mempool.space + blockstream.info; the playbook below unlocks.
✓ Playbook unlocked. The 20 items are below — work them P0 → P1 → P2.
"cache_control": {"type": "ephemeral"} to the content block where the static prefix ends. Anthropic's Prompt caching reference confirms type: "ephemeral" is the canonical value; cached input bills at a fraction of standard input rates (read the doc for current per-model multipliers).cache_read_input_tokens / (cache_read_input_tokens + cache_creation_input_tokens). The four usage keys (snake_case) are documented in the Messages API reference. Vendor docs note typical hit-rates land between 30% and 98% depending on traffic patterns — calibrate against your own workload. If your hit-rate is dropping turn-over-turn rather than holding stable, your variable content is leaking into the cached prefix; move it below the last cache_control block.tools first, then system, then messages — stable preamble at the top, variable content (user message, conversation tail) last. The prompt-caching docs spell out that cache prefixes are created in this exact tools→system→messages order; placing tool definitions below the system prompt is the most common cause of unexpected cache invalidation. Cache breakpoints can mark up to 4 contiguous regions per request — every cache_control insertion costs one breakpoint, so consolidate."cache_control": {"type": "ephemeral", "ttl": "5m"} for active sessions and "ttl": "1h" for reusable contexts (cross-session agent loops, daily-job system prompts). Per the caching docs, the 5m TTL write costs 1.25× base input token rate; the 1h TTL costs 2× base — pay the 1h premium only when the same prefix will be reused across sessions within that hour. In single-session workflows, prefer 5m. Ordering constraint: if mixing both in one request, 1h breakpoints must appear before any 5m breakpoints.cache_creation_input_tokens; every subsequent call reads from cache_read_input_tokens at the discounted rate./compact manually when context fills (it accepts an optional focus hint: /compact focus on the API changes). In hand-rolled API loops, replicate it via a summarization call before context grows past your budget.grep, rg, awk, or your editor's symbol search first; paste only the relevant excerpt. The bigger the file, the bigger the input bill — and most of those tokens contribute nothing to the answer.description for a single-purpose tool is wasteful; tighten to the minimum the model needs to choose correctly. Short, unambiguous descriptions outperform long ones at lower cost.tools=[...] filter — every tool definition counts toward input tokens whether the model calls it or not. A focused agent with 4 tools beats a generalist with 14 in both quality and cost.model parameter per call.Task tool with subagent_type targeted at the right specialist; in the SDK, kick off a fresh messages.create with only the focused context.model: field in its frontmatter (.md or JSON config) — set model: haiku for lookup-style subagents, model: sonnet for normal work, model: opus for hard reasoning, or model: inherit to share the parent's model. The Task tool then routes by subagent_type; the model is set per subagent definition, not per Task invocation. See the subagents reference for the full frontmatter schema.npx ccusage@latest (or bunx ccusage if Bun is your runtime). It reads your ~/.config/claude/projects/*/*.jsonl log files (or ~/.claude/projects/*/*.jsonl on legacy installs) and shows per-day, per-project token + cost breakdowns. The same data this VU meter visualizes — but as a CLI you can run from cron.statusLine to show per-turn input/output deltas. The statusline docs document the JSON shape piped to your script: context_window.current_usage contains the same four *_tokens keys you see in API responses.npx ccusage@latest daily for a 7-day breakdown by project and model.max_tokens per call (capped output), or a custom rate limit at your proxy, or just team discipline + a weekly review. Catching runaway costs after the fact is too late — by then you've already paid.One turn per line: Claude Code session JSONL, stripped {usage:…}, single API response, or a JSON array. Empty paste falls back to a 4-turn demo dataset so the page is never blank.
Four channels (IN, OUT, CACHE, CRT) animate left-to-right, one column per turn, log-scaled to global peak. CRT-green gradient bars, amber peak-hold ticks, phosphor glow. Up to 63 turns; beyond that, scroll.
Send 5,000 sats. Once one block confirms, paste the TXID — the 20-item Token Budget Optimization Playbook opens. Caching, Pruning, Routing, Measurement, P0 → P2.
It is a Winamp-style VU meter for Claude token usage. You paste raw token data from a Claude Code session log, an Anthropic API response, or a custom script — and the page animates a four-channel bar display: IN (input_tokens), OUT (output_tokens), CACHE (cache_read_input_tokens), CRT (cache_creation_input_tokens). Each turn becomes one column; bars rise log-scaled to the global peak with a CRT-green gradient and peak-hold ticks. The aesthetic is late-90s media-player chrome on top of a Win98 dialog frame.
Four shapes are accepted. (1) Claude Code session JSONL (~/.config/claude/projects/<project>/<session>.jsonl on Claude Code 1.0.30+; legacy fallback ~/.claude/projects/ on older installs), one record per line with a message.usage sub-object. (2) Stripped JSONL — the same lines after a jq filter has dropped everything except {usage: ...}. (3) A single Anthropic API response JSON object containing a usage field. (4) A JSON array of any of the above. The parser walks each line up to two levels deep looking for a usage sub-object with at least input_tokens; everything else is discarded immediately on parse, so prompt content, tool calls, file paths, and branch names never leave your browser. If the first parse pass extracts zero turns, a manual-input fallback (three number fields) is shown.
Claude Code's session JSONL contains multiple records per assistant API step: streaming partials plus the final snapshot. Without dedupe, the VU meter would draw duplicate bars and inflate the apparent burn rate. The parser groups records by message.id (preferred), then requestId (fallback), and within each group keeps one — preferring records with stop_reason set, then the largest output_tokens value (streaming usually accumulates monotonically; the largest is the final), then the last record by file order as an idempotent tiebreaker. Records with no identity key fall through as one-turn-each. The dedupe rule mirrors Anthropic's Agent SDK cost-tracking guidance: count one usage snapshot per assistant API step.
All aggregate stats pack into 16 hex characters: ?vu=<16hex>. 2 bits encode the source-tag (0=Claude Code JSONL, 1=API direct, 2=manual, 3=other), 6 bits encode the turn count (capped at 63 — the live view also caps at 63 so the share URL never silently truncates), 16 bits each encode total input / output / cache-read tokens (capped at 65535; input and output show a > prefix on overflow; cache-read is capped silently in the current URL format), and 6 bits encode peak-input and peak-output log-scale buckets. The URL contains zero per-turn data — pasting a teammate's URL renders an aggregate-only summary frame, not the bouncing animation. To get the animation back, paste your own data above. Wrong length, non-hex, or out-of-range fields are rejected; tampered URLs do not silently fall through to a blank state or to your previous localStorage.
No. The parser runs entirely in your browser. Non-usage fields (message content, tool inputs, working directories, branch names, file paths) are discarded immediately on parse and never stored in any state variable. Only the usage object is retained. The share URL packs aggregate stats only (totals plus peak buckets), so even sharing a URL does not expose per-turn token counts. If you want to be extra careful, run jq -c 'select(.message.usage) | {usage: .message.usage}' on your JSONL before pasting — same animation, zero prompt content in transit.
A 5,000-sat one-time payment unlocks the Token Budget Optimization Playbook — 20 concrete items grouped into Caching, Pruning, Routing, and Measurement. Each item is a config line, an SDK option, a CLI flag, or a process change with a P0/P1/P2 priority chip, plus links to docs.claude.com and the ccusage repo where applicable. The free meter visualizes your current burn; the playbook gives you the moves to cut it. The same 5,000-sat payment also unlocks the other priced features on Promptshelf (Promptle Hard Mode, Token Tetris Daily Challenge, Stumble Deep Cut, Vibe Code Security remediation guide, Comprehension Recovery Checklist, AI Review Noise Reduction Playbook) — the shared client-side verifier accepts any prior 5,000-sat tx to the address, so paying once gets you all seven. Verified via mempool.space and blockstream.info; no account, no server.
Tip the project. Token Burn-Rate VU Meter is part of Promptshelf — a small library of single-file tools for AI coding workflows. If this saved you a debugging hour, send a few sats. Same address as the playbook unlock; any amount confirms.