Privacy
Privacy & data handling
What we collect, what we don't, and how to run fully anonymously.
llm-speed Privacy
This document is the contract. If the CLI ever sends something not listed here, it's a bug — file an issue.
The CLI is open source. Every byte that leaves your machine is auditable in
cli/upload.py,
cli/signing.py, and
cli/fingerprint.py.
You can preview the exact JSON before it's sent:
llm-speed bench --quick --dry-run --print-payload
1. What we collect by default
Each upload contains exactly these fields. No others.
Hardware fingerprint (bucketed / coarse)
| Field | Example | Why |
|---|---|---|
os_name | Darwin, Linux, Windows | leaderboards filter by OS family |
os_version | 26, 13, 6 (major version only) | major-version-level perf differences are real; patch level is identifying |
cpu_model | Apple M3 Pro, AMD Ryzen 9 7950X3D | the canonical CPU label users compare on |
cpu_cores | 12 | CPU-bound workloads scale here |
ram_gb | 40 (rounded to nearest 8 GB) | memory headroom matters; serial-number-grade precision doesn't |
gpus[].name | RTX 4090, M3 Pro, MI300X | the canonical GPU label |
gpus[].kind | nvidia, amd, apple, intel, cpu | tells the leaderboard which family |
gpus[].memory_gb | 24 (rounded to nearest 8 GB) | VRAM bucket determines what models fit |
accelerator_summary | M3 Pro (18-core GPU) + 36GB unified | human-readable headline for the run page |
fingerprint_hash | a1b2c3d4e5f60708 (16 hex chars) | groups runs of the same hardware class for outlier detection. Hashed over only the bucketed fields above — two physically-different machines with the same SoC + RAM bucket + major OS produce the SAME hash. NOT user-level identity. Omitted entirely in --strict-anon mode. |
extras.backends | {"llama.cpp": "...", "metal": "Metal 3"} | so we can correlate decode tps to backend version |
Workload results (the actual numbers)
For each workload (W1..W5):
| Field | Why |
|---|---|
workload | which W1..W5 |
suite_version | which methodology version (suite-v1, ...) |
backend / backend_version | llama.cpp / ollama / vllm / mlx / hosted-api + version |
model.{name,size,quant,digest} | the (model x quant) tuple is the leaderboard axis |
ttft_ms, prefill_tps, decode_tps, decode_p50_latency_ms, decode_p95_latency_ms | the benchmark metrics |
prompt_tokens, output_tokens, batch_size, context_tokens | workload params for replay |
wall_ms | end-to-end timing |
prefix_cache_hit_rate | when backend exposes it |
error, flags | captured failure mode + thermal/battery flags |
extras | small backend-specific telemetry; capped at 256 chars per string field by a client-side privacy invariant (see below) |
Provenance
| Field | Why |
|---|---|
cli_version | so backend-version-specific anomalies are diagnosable |
started_at, finished_at | run-duration sanity check |
signature (ed25519) | proves the bytes weren't edited after signing |
public_key (ed25519, base64 32B raw) | server verifies signature with this; in default mode it's persistent (see Modes), in --strict-anon it's a fresh ephemeral key per run |
2. What we DON'T collect
These are deliberately stripped client-side, even though some of them used to ship in earlier builds. They're either too identifying (browser-fingerprint-grade entropy) or simply not needed for the benchmark numbers.
- PCI bus IDs (
pci.bus_idfromnvidia-smi) - GPU driver build numbers (full
driver_versionstrings) - GPU vBIOS versions
- Kernel patch versions (
6.8.0-31-generic?> just6) - OS patch level (
26.3.1?> just26) - macOS marketing minor/patch (
13.5.7?> just13) - Linux distro patchlevel (
Ubuntu 24.04.1 LTSis dropped entirely) - Locale
- NVMe / disk model / serial
- Hostname / username
- Power source / thermal state (captured locally for run flags, never sent)
- Prompt text — the workload prompts are part of the open-source suite, not user data
- Model output text — never captured at all
- Per-token raw timings (
raw_timings_ms) — kept locally for replay/dispute, only included in upload if you opt in viainclude_raw_timings=True
Privacy invariant (client-side)
Before any upload, the CLI walks the payload tree and refuses to send if any
non-whitelisted string field exceeds 256 characters. The whitelist is small:
accelerator_summary, error, fingerprint_hash, public_key, signature,
suite_version, cli_version, model.{identifier,name,size,quant,digest},
workload, backend, backend_version. Anything else triggers
upload payload contains unexpected long string at <path>; refusing to upload (privacy invariant).
This is a guardrail against a future regression where someone accidentally puts
prompt or output text into a backend's extras dict.
3. Three modes
| Capability | default | --anon | --strict-anon |
|---|---|---|---|
| Hardware bucket (OS major, CPU, RAM bucket, GPU name + VRAM bucket) | yes | yes | yes |
accelerator_summary | yes | yes | yes |
fingerprint_hash (class-level, bucketed) | yes | yes | no |
public_key (persistent across your runs) | yes | yes | no — ephemeral, rotated per run |
User-Agent: llm-speed-cli/<version> | sent | sent | not sent (default httpx UA) |
Authorization: Bearer <api-key> | if provided | if provided | not sent |
X-LLM-Speed-Anon: 1 header | no | yes | no (would be a beacon) |
| Server-side IP logging | hashed (rotating salt) | hashed | hashed |
| Cross-run correlation by the server | possible via public_key and fingerprint_hash | possible via public_key | not possible |
--strict-anon in detail
- A fresh ed25519 keypair is generated in memory for each run. The persistent
keypair on disk (
~/.config/llm-speed/keys/ed25519.key) is never read or written. - No
fingerprint_hashin the payload. Two consecutive--strict-anonruns from the same machine are unlinkable from the server's perspective. - The signature still proves "the bytes weren't tampered with after signing" — it just doesn't tie the run to a long-term identity.
--no-upload
- No network call at all. The result is saved to
~/.cache/llm-speed/runs/<isoformat>-<fp_hash>.jsonand that's it. - Re-upload later with
llm-speed bench --resume <path>(which verifies the signature locally before posting).
4. Server-side
The server (api/server.py) stores only the fields documented at
/privacy.json (machine-readable,
versioned). Specifically:
- Client IPs are never logged in cleartext. Every access-log entry uses
sha256(salt || client_ip)[:16]where the salt is generated fresh at midnight UTC and held in process memory only — never persisted to disk. The previous day's salt is retained for one hour past midnight UTC for log-correlation overlap, then dropped. After that hour, yesterday's hashed IPs cannot be correlated with today's: same IP, completely different hash. - The salt is never returned in any response and never written anywhere. A server restart loses all salts immediately.
- The hashed IP is 16 hex characters — collision-resistant for abuse-rate-limiting, but small enough that it's not a stable cross-day identifier.
- The server has no access to anything the CLI didn't put in the payload. Driver builds, PCI IDs, kernel patches, etc. are physically absent.
5. How to inspect what's sent
llm-speed bench --quick --dry-run --print-payload
--dry-run runs the workloads but skips both upload and local save.
--print-payload prints the exact JSON that would be uploaded to stdout.
You can also save without uploading and pretty-print the file:
llm-speed bench --quick --no-upload --json /tmp/run.json
cat /tmp/run.json | jq
The bytes signed by the CLI are exactly the bytes between the outer braces of
that JSON, modulo the signature and public_key fields (which are excluded
from canonical hashing). The server verifies against the same canonicalization.
6. How to delete your data
- Self-serve consent revocation:
rm ~/.config/llm-speed/consent.json— the next upload will re-prompt. - Email [email protected] with the
id(ther_xxxxxxxxslug) of any result you want removed. We remove it within 7 days. - Public dispute thread: every result page has a dispute link. File a challenge; if it stands, the result is withdrawn.
--strict-anon submissions can't be linked to your machine after the fact
(by design — there's no persistent identity to match against), so deletion of
those runs requires the run id from your local cache.
7. First-run consent
The first time you run llm-speed bench against the public API in a mode that
will upload (i.e. not --no-upload, not --dry-run, not --json, not
--strict-anon — strict-anon is implicit consent because it's the most-private
option), the CLI prints the consent text and asks [Y/n]. The choice is saved
to ~/.config/llm-speed/consent.json (mode 0600) along with a timestamp, the
CLI version, and the trimmed fingerprint_hash. Delete that file to revoke.
8. Open source
Every line of code that touches your data is in this repo:
- Fingerprint capture + trimming:
cli/fingerprint.py - Upload payload construction:
cli/upload.py,cli/signing.py - The server endpoints + IP hashing:
api/server.py - The verification path:
api/verify.py - This document: you are here.
If something looks wrong, open an issue. We'd rather fix it than fight you.