Install & CLI

Up and running in one command.

Calma is one self-contained folder — pure Python standard library, no dependencies, Python 3.9+. Install it as a Claude Code plugin, drop it into any project that reads SKILL.md, or put the calma CLI on your PATH. macOS is first-class (verified sandbox, proven by a built-in self-test); Linux runs with reduced isolation and says so in the ledger; Windows is unsupported.

Claude Code plugin recommended

Installs the skill and the zero-touch Stop-hook guardrail — it watches your agent's final message for checkable numbers and re-executes the work to confirm them before the turn finishes. Invisible until a number doesn't hold.

claude code

/plugin marketplace add rikhinkavuru/calma

/plugin install calma@calma

Drop into any project

Works with every agent that reads SKILL.md — Claude Code, Codex, Cursor, and more. No build step, nothing to configure.

bash

git clone https://github.com/rikhinkavuru/calma

cp -r calma/.claude/skills/calma your-project/.claude/skills/

Plain CLI

Pure stdlib — no pip, no deps. install.sh symlinks bin/calma onto your PATH and prints a hint if needed.

bash

cd calma # the repo you just cloned

./install.sh # or: make install

calma demo

The CLI

Every command, one binary.

The same engine the skill calls, on your terminal. Verify a result, batch a whole sprint, sign and timestamp a verdict, or audit the public registry — all offline, all from calma.

calma --help

calma demo # zero-setup: catch a bundled real inflated backtest (offline, seconds)

calma verify <folder> "<claim>" # check a result against a claim (exit codes below)

calma verify <folder> # no claim: just check the result reproduces

calma batch <dir>... | --manifest m.tsv # verify MANY results + one summary table (CI/sprint)

calma recipes # the 500 built-in metrics, grouped by family

calma verify ... --json # machine-readable verdict (for agents / CI)

calma verify ... --check-determinism # run twice; flaky outputs can't confirm anything

calma verify ... --timeout 300 # raise the re-execution budget (default 120s)

calma verify ... --trust third-party # counterparty code: refuse unless a sandbox tier is live

calma teardown <folder> "<claim>" # shareable "claimed X -> really Y" card (+ --svg)

calma replay <run_dir> # re-run a saved verification; exit 0 iff it reproduces

calma stats <folder> # verification history: catches, hook activity

calma seal <run_dir> [--publish] # sign + RFC-3161 timestamp + counterparty instructions

calma attest keygen # one-time signing key; after this every verify is signed

calma attest verify <bundle> # check a signed bundle, fully offline

calma registry verify [dir] # audit the public catch-history chain offline

Exit codes (calma verify)

0clean — CONFIRMED / CONFIRMED-WITH-CAVEATS
1not clean — REFUTED / MIXED / CAN'T-CONFIRM
2bad input — missing target, malformed contract, unknown --metric
3refused — execution declined (e.g. third-party code, no verified sandbox)
4killed — the re-execution exceeded the --timeout budget

prove the sandbox on your machine:
python3 .claude/skills/calma/scripts/run_hermetic.py doctor

Opt out of the guardrail any time: CALMA_HOOK=0, touch .calma/hook-off, or .calma/config.json → {"hook": {"enabled": false}}. Every decision is logged to .calma/auto_history.jsonl.

Full docs and source on GitHub — and the 500 recipes are listed on the recipes page.