Skip to content
llm-speed

Privacy

Privacy & data handling

What we collect, what we don't, and how to run fully anonymously.

llm-speed Privacy

This document is the contract. If the CLI ever sends something not listed here, it's a bug — file an issue.

The CLI is open source. Every byte that leaves your machine is auditable in cli/upload.py, cli/signing.py, and cli/fingerprint.py. You can preview the exact JSON before it's sent:

llm-speed bench --quick --dry-run --print-payload

1. What we collect by default

Each upload contains exactly these fields. No others.

Hardware fingerprint (bucketed / coarse)

FieldExampleWhy
os_nameDarwin, Linux, Windowsleaderboards filter by OS family
os_version26, 13, 6 (major version only)major-version-level perf differences are real; patch level is identifying
cpu_modelApple M3 Pro, AMD Ryzen 9 7950X3Dthe canonical CPU label users compare on
cpu_cores12CPU-bound workloads scale here
ram_gb40 (rounded to nearest 8 GB)memory headroom matters; serial-number-grade precision doesn't
gpus[].nameRTX 4090, M3 Pro, MI300Xthe canonical GPU label
gpus[].kindnvidia, amd, apple, intel, cputells the leaderboard which family
gpus[].memory_gb24 (rounded to nearest 8 GB)VRAM bucket determines what models fit
accelerator_summaryM3 Pro (18-core GPU) + 36GB unifiedhuman-readable headline for the run page
fingerprint_hasha1b2c3d4e5f60708 (16 hex chars)groups runs of the same hardware class for outlier detection. Hashed over only the bucketed fields above — two physically-different machines with the same SoC + RAM bucket + major OS produce the SAME hash. NOT user-level identity. Omitted entirely in --strict-anon mode.
extras.backends{"llama.cpp": "...", "metal": "Metal 3"}so we can correlate decode tps to backend version

Workload results (the actual numbers)

For each workload (W1..W5):

FieldWhy
workloadwhich W1..W5
suite_versionwhich methodology version (suite-v1, ...)
backend / backend_versionllama.cpp / ollama / vllm / mlx / hosted-api + version
model.{name,size,quant,digest}the (model x quant) tuple is the leaderboard axis
ttft_ms, prefill_tps, decode_tps, decode_p50_latency_ms, decode_p95_latency_msthe benchmark metrics
prompt_tokens, output_tokens, batch_size, context_tokensworkload params for replay
wall_msend-to-end timing
prefix_cache_hit_ratewhen backend exposes it
error, flagscaptured failure mode + thermal/battery flags
extrassmall backend-specific telemetry; capped at 256 chars per string field by a client-side privacy invariant (see below)

Provenance

FieldWhy
cli_versionso backend-version-specific anomalies are diagnosable
started_at, finished_atrun-duration sanity check
signature (ed25519)proves the bytes weren't edited after signing
public_key (ed25519, base64 32B raw)server verifies signature with this; in default mode it's persistent (see Modes), in --strict-anon it's a fresh ephemeral key per run

2. What we DON'T collect

These are deliberately stripped client-side, even though some of them used to ship in earlier builds. They're either too identifying (browser-fingerprint-grade entropy) or simply not needed for the benchmark numbers.

  • PCI bus IDs (pci.bus_id from nvidia-smi)
  • GPU driver build numbers (full driver_version strings)
  • GPU vBIOS versions
  • Kernel patch versions (6.8.0-31-generic ?> just 6)
  • OS patch level (26.3.1 ?> just 26)
  • macOS marketing minor/patch (13.5.7 ?> just 13)
  • Linux distro patchlevel (Ubuntu 24.04.1 LTS is dropped entirely)
  • Locale
  • NVMe / disk model / serial
  • Hostname / username
  • Power source / thermal state (captured locally for run flags, never sent)
  • Prompt text — the workload prompts are part of the open-source suite, not user data
  • Model output text — never captured at all
  • Per-token raw timings (raw_timings_ms) — kept locally for replay/dispute, only included in upload if you opt in via include_raw_timings=True

Privacy invariant (client-side)

Before any upload, the CLI walks the payload tree and refuses to send if any non-whitelisted string field exceeds 256 characters. The whitelist is small: accelerator_summary, error, fingerprint_hash, public_key, signature, suite_version, cli_version, model.{identifier,name,size,quant,digest}, workload, backend, backend_version. Anything else triggers upload payload contains unexpected long string at <path>; refusing to upload (privacy invariant). This is a guardrail against a future regression where someone accidentally puts prompt or output text into a backend's extras dict.


3. Three modes

Capabilitydefault--anon--strict-anon
Hardware bucket (OS major, CPU, RAM bucket, GPU name + VRAM bucket)yesyesyes
accelerator_summaryyesyesyes
fingerprint_hash (class-level, bucketed)yesyesno
public_key (persistent across your runs)yesyesno — ephemeral, rotated per run
User-Agent: llm-speed-cli/<version>sentsentnot sent (default httpx UA)
Authorization: Bearer <api-key>if providedif providednot sent
X-LLM-Speed-Anon: 1 headernoyesno (would be a beacon)
Server-side IP logginghashed (rotating salt)hashedhashed
Cross-run correlation by the serverpossible via public_key and fingerprint_hashpossible via public_keynot possible

--strict-anon in detail

  • A fresh ed25519 keypair is generated in memory for each run. The persistent keypair on disk (~/.config/llm-speed/keys/ed25519.key) is never read or written.
  • No fingerprint_hash in the payload. Two consecutive --strict-anon runs from the same machine are unlinkable from the server's perspective.
  • The signature still proves "the bytes weren't tampered with after signing" — it just doesn't tie the run to a long-term identity.

--no-upload

  • No network call at all. The result is saved to ~/.cache/llm-speed/runs/<isoformat>-<fp_hash>.json and that's it.
  • Re-upload later with llm-speed bench --resume <path> (which verifies the signature locally before posting).

4. Server-side

The server (api/server.py) stores only the fields documented at /privacy.json (machine-readable, versioned). Specifically:

  • Client IPs are never logged in cleartext. Every access-log entry uses sha256(salt || client_ip)[:16] where the salt is generated fresh at midnight UTC and held in process memory only — never persisted to disk. The previous day's salt is retained for one hour past midnight UTC for log-correlation overlap, then dropped. After that hour, yesterday's hashed IPs cannot be correlated with today's: same IP, completely different hash.
  • The salt is never returned in any response and never written anywhere. A server restart loses all salts immediately.
  • The hashed IP is 16 hex characters — collision-resistant for abuse-rate-limiting, but small enough that it's not a stable cross-day identifier.
  • The server has no access to anything the CLI didn't put in the payload. Driver builds, PCI IDs, kernel patches, etc. are physically absent.

5. How to inspect what's sent

llm-speed bench --quick --dry-run --print-payload

--dry-run runs the workloads but skips both upload and local save. --print-payload prints the exact JSON that would be uploaded to stdout.

You can also save without uploading and pretty-print the file:

llm-speed bench --quick --no-upload --json /tmp/run.json
cat /tmp/run.json | jq

The bytes signed by the CLI are exactly the bytes between the outer braces of that JSON, modulo the signature and public_key fields (which are excluded from canonical hashing). The server verifies against the same canonicalization.


6. How to delete your data

  • Self-serve consent revocation: rm ~/.config/llm-speed/consent.json — the next upload will re-prompt.
  • Email [email protected] with the id (the r_xxxxxxxx slug) of any result you want removed. We remove it within 7 days.
  • Public dispute thread: every result page has a dispute link. File a challenge; if it stands, the result is withdrawn.

--strict-anon submissions can't be linked to your machine after the fact (by design — there's no persistent identity to match against), so deletion of those runs requires the run id from your local cache.


7. First-run consent

The first time you run llm-speed bench against the public API in a mode that will upload (i.e. not --no-upload, not --dry-run, not --json, not --strict-anon — strict-anon is implicit consent because it's the most-private option), the CLI prints the consent text and asks [Y/n]. The choice is saved to ~/.config/llm-speed/consent.json (mode 0600) along with a timestamp, the CLI version, and the trimmed fingerprint_hash. Delete that file to revoke.


8. Open source

Every line of code that touches your data is in this repo:

If something looks wrong, open an issue. We'd rather fix it than fight you.