Privacy

Privacy & data handling

What we collect, what we don't, and how to run fully anonymously.

llm-speed Privacy

This document is the contract. If the CLI ever sends something not listed here, it's a bug — file an issue.

The CLI is open source. Every byte that leaves your machine is auditable in cli/upload.py, cli/signing.py, and cli/fingerprint.py. You can preview the exact JSON before it's sent:

llm-speed bench --quick --dry-run --print-payload

1. What we collect by default

Each upload contains exactly these fields. No others.

Hardware fingerprint (bucketed / coarse)

Field	Example	Why
`os_name`	`Darwin`, `Linux`, `Windows`	leaderboards filter by OS family
`os_version`	`26`, `13`, `6` (major version only)	major-version-level perf differences are real; patch level is identifying
`cpu_model`	`Apple M3 Pro`, `AMD Ryzen 9 7950X3D`	the canonical CPU label users compare on
`cpu_cores`	`12`	CPU-bound workloads scale here
`ram_gb`	`40` (rounded to nearest 8 GB)	memory headroom matters; serial-number-grade precision doesn't
`gpus[].name`	`RTX 4090`, `M3 Pro`, `MI300X`	the canonical GPU label
`gpus[].kind`	`nvidia`, `amd`, `apple`, `intel`, `cpu`	tells the leaderboard which family
`gpus[].memory_gb`	`24` (rounded to nearest 8 GB)	VRAM bucket determines what models fit
`accelerator_summary`	`M3 Pro (18-core GPU) + 36GB unified`	human-readable headline for the run page
`fingerprint_hash`	`a1b2c3d4e5f60708` (16 hex chars)	groups runs of the same hardware class for outlier detection. Hashed over only the bucketed fields above — two physically-different machines with the same SoC + RAM bucket + major OS produce the SAME hash. NOT user-level identity. Omitted entirely in `--strict-anon` mode.
`extras.backends`	`{"llama.cpp": "...", "metal": "Metal 3"}`	so we can correlate decode tps to backend version

Workload results (the actual numbers)

For each workload (W1..W5):

Field	Why
`workload`	which W1..W5
`suite_version`	which methodology version (`suite-v1`, ...)
`backend` / `backend_version`	llama.cpp / ollama / vllm / mlx / hosted-api + version
`model.{name,size,quant,digest}`	the (model x quant) tuple is the leaderboard axis
`ttft_ms`, `prefill_tps`, `decode_tps`, `decode_p50_latency_ms`, `decode_p95_latency_ms`	the benchmark metrics
`prompt_tokens`, `output_tokens`, `batch_size`, `context_tokens`	workload params for replay
`wall_ms`	end-to-end timing
`prefix_cache_hit_rate`	when backend exposes it
`error`, `flags`	captured failure mode + thermal/battery flags
`extras`	small backend-specific telemetry; capped at 256 chars per string field by a client-side privacy invariant (see below)

Provenance

Field	Why
`cli_version`	so backend-version-specific anomalies are diagnosable
`started_at`, `finished_at`	run-duration sanity check
`signature` (ed25519)	proves the bytes weren't edited after signing
`public_key` (ed25519, base64 32B raw)	server verifies signature with this; in default mode it's persistent (see Modes), in `--strict-anon` it's a fresh ephemeral key per run

2. What we DON'T collect

These are deliberately stripped client-side, even though some of them used to ship in earlier builds. They're either too identifying (browser-fingerprint-grade entropy) or simply not needed for the benchmark numbers.

PCI bus IDs (pci.bus_id from nvidia-smi)
GPU driver build numbers (full driver_version strings)
GPU vBIOS versions
Kernel patch versions (6.8.0-31-generic ?> just 6)
OS patch level (26.3.1 ?> just 26)
macOS marketing minor/patch (13.5.7 ?> just 13)
Linux distro patchlevel (Ubuntu 24.04.1 LTS is dropped entirely)
Locale
NVMe / disk model / serial
Hostname / username
Power source / thermal state (captured locally for run flags, never sent)
Prompt text — the workload prompts are part of the open-source suite, not user data
Model output text — never captured at all
Per-token raw timings (raw_timings_ms) — kept locally for replay/dispute, only included in upload if you opt in via include_raw_timings=True

Privacy invariant (client-side)

Before any upload, the CLI walks the payload tree and refuses to send if any non-whitelisted string field exceeds 256 characters. The whitelist is small: accelerator_summary, error, fingerprint_hash, public_key, signature, suite_version, cli_version, model.{identifier,name,size,quant,digest}, workload, backend, backend_version. Anything else triggers upload payload contains unexpected long string at <path>; refusing to upload (privacy invariant). This is a guardrail against a future regression where someone accidentally puts prompt or output text into a backend's extras dict.

3. Three modes

Capability	default	`--anon`	`--strict-anon`
Hardware bucket (OS major, CPU, RAM bucket, GPU name + VRAM bucket)	yes	yes	yes
`accelerator_summary`	yes	yes	yes
`fingerprint_hash` (class-level, bucketed)	yes	yes	no
`public_key` (persistent across your runs)	yes	yes	no — ephemeral, rotated per run
`User-Agent: llm-speed-cli/<version>`	sent	sent	not sent (default httpx UA)
`Authorization: Bearer <api-key>`	if provided	if provided	not sent
`X-LLM-Speed-Anon: 1` header	no	yes	no (would be a beacon)
Server-side IP logging	hashed (rotating salt)	hashed	hashed
Cross-run correlation by the server	possible via `public_key` and `fingerprint_hash`	possible via `public_key`	not possible

`--strict-anon` in detail

A fresh ed25519 keypair is generated in memory for each run. The persistent keypair on disk (~/.config/llm-speed/keys/ed25519.key) is never read or written.
No fingerprint_hash in the payload. Two consecutive --strict-anon runs from the same machine are unlinkable from the server's perspective.
The signature still proves "the bytes weren't tampered with after signing" — it just doesn't tie the run to a long-term identity.

`--no-upload`

No network call at all. The result is saved to ~/.cache/llm-speed/runs/<isoformat>-<fp_hash>.json and that's it.
Re-upload later with llm-speed bench --resume <path> (which verifies the signature locally before posting).

4. Server-side

The server (api/server.py) stores only the fields documented at /privacy.json (machine-readable, versioned). Specifically:

Client IPs are never logged in cleartext. Every access-log entry uses sha256(salt || client_ip)[:16] where the salt is generated fresh at midnight UTC and held in process memory only — never persisted to disk. The previous day's salt is retained for one hour past midnight UTC for log-correlation overlap, then dropped. After that hour, yesterday's hashed IPs cannot be correlated with today's: same IP, completely different hash.
The salt is never returned in any response and never written anywhere. A server restart loses all salts immediately.
The hashed IP is 16 hex characters — collision-resistant for abuse-rate-limiting, but small enough that it's not a stable cross-day identifier.
The server has no access to anything the CLI didn't put in the payload. Driver builds, PCI IDs, kernel patches, etc. are physically absent.

5. How to inspect what's sent

llm-speed bench --quick --dry-run --print-payload

--dry-run runs the workloads but skips both upload and local save. --print-payload prints the exact JSON that would be uploaded to stdout.

You can also save without uploading and pretty-print the file:

llm-speed bench --quick --no-upload --json /tmp/run.json
cat /tmp/run.json | jq

The bytes signed by the CLI are exactly the bytes between the outer braces of that JSON, modulo the signature and public_key fields (which are excluded from canonical hashing). The server verifies against the same canonicalization.

6. How to delete your data

Self-serve consent revocation: rm ~/.config/llm-speed/consent.json — the next upload will re-prompt.
Email [email protected] with the id (the r_xxxxxxxx slug) of any result you want removed. We remove it within 7 days.
Public dispute thread: every result page has a dispute link. File a challenge; if it stands, the result is withdrawn.

--strict-anon submissions can't be linked to your machine after the fact (by design — there's no persistent identity to match against), so deletion of those runs requires the run id from your local cache.

7. First-run consent

The first time you run llm-speed bench against the public API in a mode that will upload (i.e. not --no-upload, not --dry-run, not --json, not --strict-anon — strict-anon is implicit consent because it's the most-private option), the CLI prints the consent text and asks [Y/n]. The choice is saved to ~/.config/llm-speed/consent.json (mode 0600) along with a timestamp, the CLI version, and the trimmed fingerprint_hash. Delete that file to revoke.

8. Open source

Every line of code that touches your data is in this repo:

Fingerprint capture + trimming: cli/fingerprint.py
Upload payload construction: cli/upload.py, cli/signing.py
The server endpoints + IP hashing: api/server.py
The verification path: api/verify.py
This document: you are here.

If something looks wrong, open an issue. We'd rather fix it than fight you.