fix(oci): rate-limit failed token verifications, not successful pulls (BUNYIP-40) #52

Merged
nrupard merged 2 commits from fix/bunyip-40-oci-token-rate-limit into main 2026-06-03 16:51:57 +02:00
Owner

Closes BUNYIP-40. The OCI token endpoint (/auth/token) shared the /v1/auth/login cap (5/min/email) and counted every request. Docker requests a fresh bearer token per repository per operation, so a single docker compose pull of 3 images is 4-6 token requests in seconds, and a legitimate member hit 429 TOOMANYREQUESTS on their first pull (found in BUNYIP-35 e2e A6).

Fix (short term, per the issue): rate-limit failures, not successes

Credential stuffing is many FAILED verifications; Docker's chattiness is many SUCCESSFUL ones with the same correct password. Only failures need the tight cap.

  • RateLimitConfig::OCI_TOKEN_FAILURES (5/min/email): incremented only on credential-verification failures (unknown email, no password, wrong password) via a new fail_credential() helper. The handler does a read-only check before verifying and blocks at/over the cap, so a request that may succeed never consumes the failure budget. Same threat model as the old login cap.
  • RateLimitConfig::OCI_TOKEN_THROUGHPUT (60/min/email): incremented on every request, purely to bound Argon2 CPU (~100ms/verify) so a flood of valid-credential requests cannot exhaust the server. Far above any real multi-image pull.
  • Authorization failures after a successful password check (deleted user, no active membership, no entitlement) are NOT counted toward the credential-failure cap, so an unentitled-but-valid member retrying does not lock themselves out of the credential budget.

Fix (long term): not in this PR

offline_token=true (Docker registry refresh tokens) in dunite-oci, so docker login stores a refresh token and pulls exchange it without re-sending credentials. That removes password verification from the per-pull path entirely. Remains open as the durable solution.

Verification

rust-builder-glibc 1.94.1 container: clippy --workspace --all-targets -D warnings clean, fmt clean, 208 lib tests pass.

Live dev stack (/auth/token with basic auth):

Scenario Result
10 consecutive successful single-image token requests all 200 (previously 429 at the 6th)
wrong-password requests 401, 401, 401, 401, 401, then 429 on the 6th

🤖 Generated with Claude Code

Closes BUNYIP-40. The OCI token endpoint (`/auth/token`) shared the `/v1/auth/login` cap (5/min/email) and counted every request. Docker requests a fresh bearer token per repository per operation, so a single `docker compose pull` of 3 images is 4-6 token requests in seconds, and a legitimate member hit `429 TOOMANYREQUESTS` on their first pull (found in BUNYIP-35 e2e A6). ## Fix (short term, per the issue): rate-limit failures, not successes Credential stuffing is many FAILED verifications; Docker's chattiness is many SUCCESSFUL ones with the same correct password. Only failures need the tight cap. - **`RateLimitConfig::OCI_TOKEN_FAILURES` (5/min/email)**: incremented only on credential-verification failures (unknown email, no password, wrong password) via a new `fail_credential()` helper. The handler does a read-only `check` before verifying and blocks at/over the cap, so a request that may succeed never consumes the failure budget. Same threat model as the old login cap. - **`RateLimitConfig::OCI_TOKEN_THROUGHPUT` (60/min/email)**: incremented on every request, purely to bound Argon2 CPU (~100ms/verify) so a flood of valid-credential requests cannot exhaust the server. Far above any real multi-image pull. - Authorization failures after a successful password check (deleted user, no active membership, no entitlement) are NOT counted toward the credential-failure cap, so an unentitled-but-valid member retrying does not lock themselves out of the credential budget. ## Fix (long term): not in this PR `offline_token=true` (Docker registry refresh tokens) in dunite-oci, so `docker login` stores a refresh token and pulls exchange it without re-sending credentials. That removes password verification from the per-pull path entirely. Remains open as the durable solution. ## Verification rust-builder-glibc 1.94.1 container: clippy `--workspace --all-targets -D warnings` clean, fmt clean, 208 lib tests pass. Live dev stack (`/auth/token` with basic auth): | Scenario | Result | |----------|--------| | 10 consecutive successful single-image token requests | all 200 (previously 429 at the 6th) | | wrong-password requests | 401, 401, 401, 401, 401, then 429 on the 6th | 🤖 Generated with [Claude Code](https://claude.com/claude-code)
fix(oci): rate-limit failed token verifications, not successful pulls (BUNYIP-40)
All checks were successful
Check / fmt / clippy / build / test (pull_request) Successful in 1m40s
658e697772
The OCI registry token endpoint (/auth/token) shared the /v1/auth/login cap (5/min/email) and counted EVERY request, success or failure. Docker requests a fresh bearer token per repository per operation, so a single `docker compose pull` of 3 images is 4-6 token requests in seconds and a legitimate member hit 429 on their first pull (found in BUNYIP-35 e2e A6).

Short-term fix from the issue: only credential failures are the credential-stuffing signal, so only they are tightly capped.

- New RateLimitConfig::OCI_TOKEN_FAILURES (5/min/email): credential-verification failures (unknown email, no password, wrong password) increment this counter via a new fail_credential() helper; the handler blocks at/over the cap (read-only check before verifying, so a request that may succeed never consumes the failure budget). Same threat model as the old login cap.
- New RateLimitConfig::OCI_TOKEN_THROUGHPUT (60/min/email): every request increments this, purely to bound Argon2 CPU (each verify ~100ms) so a flood of valid-credential requests cannot exhaust the server. Far above any real multi-image pull.
- Authorization failures after a successful password check (deleted user, no membership, no entitlement) are NOT counted toward the credential-failure cap.

Long-term fix (offline_token / refresh tokens in dunite-oci) remains open as the durable solution.

Verified in the rust-builder 1.94.1 container (clippy/fmt clean, 208 lib tests) and live: 10 consecutive successful single-image token requests all return 200 (previously 429 at the 6th), and 5 wrong-password requests return 401 then the 6th returns 429.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: nrupard <natrsmith11@gmail.com>
fix(oci): address PR #52 review - per-IP failure cap, dedup limit blocks
All checks were successful
Check / fmt / clippy / build / test (pull_request) Successful in 1m6s
Create release / Create release from merged PR (pull_request) Has been skipped
eb7f3996c3
- Add RateLimitConfig::OCI_TOKEN_IP_FAILURES (20/min/IP): credential failures now increment a per-source-IP counter as well as the per-email one, and the handler blocks on either. Closes the distributed-guessing gap the per-email cap alone left open (one host spraying a few guesses each across many accounts). Counts only failures, so legit users behind a shared NAT/gateway are unaffected. Skipped when no client IP is determinable.
- Extract failures_at_cap() (read-only >= comparison, fails closed) and too_many() (retry-after + audit + 429) helpers, removing the duplicated limit-exceeded blocks and pinning the deliberate >= (read) vs > (increment) distinction in one documented place so a future refactor cannot silently weaken the cap.
- fail_credential now logs (tracing::warn) when a counter increment fails instead of silently discarding the error, surfacing the documented degrade-open-under-DB-stress behaviour; it increments both the per-email and per-IP failure counters.

Deferred (noted, not in this PR): hoisting failure-vs-success counting into a shared RateLimitRepository primitive that /v1/auth/login would also use - that changes login's rate-limit behaviour and warrants its own change. The long-term offline_token/refresh-token fix (dunite-oci) likewise remains the durable solution.

Verified in the rust-builder 1.94.1 container (clippy/fmt clean, 208 lib tests) and live: 10 consecutive valid-credential token requests all 200; 5 wrong-password requests 401 then the 6th 429; the per-IP failure counter is recorded alongside the per-email one.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: nrupard <natrsmith11@gmail.com>
nrupard deleted branch fix/bunyip-40-oci-token-rate-limit 2026-06-03 16:51:57 +02:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
psa-systems/bunyip!52
No description provided.