perf(oci): single debounced background evictor instead of one spawned task per blob (PSA-39) #7

Merged
nrupard merged 1 commit from feat/psa-39-single-debounced-evictor into main 2026-06-03 20:00:46 +02:00
Owner

Problem

dunite-oci's blob cache spawned a detached LRU eviction task on every blob store (tokio::spawn(evict_if_over_cap)). Under a docker pull (one store per blob, several blobs concurrent) this fired N eviction tasks in seconds, each needing its own DB connection. Against a bounded pool the tasks lost the acquire race and failed with PoolTimedOut, so the byte cap stopped being enforced while pulls were active (root-caused in bunyip BUNYIP-41; bunyip mitigated consumer-side with a dedicated pool, but the per-blob spawn pattern is the structural cause and belongs here).

Change

One long-lived background evictor task, spawned by BlobCache::new. It owns at most one connection and is poked through a capacity-1 mpsc channel: a burst of concurrent stores collapses to one pending notification (extra pokes dropped via try_send), so the cap is enforced with one eviction pass per burst rather than one task (one connection) per blob. The task holds only the eviction essentials (store, cache_dir, max_bytes), never a sender, so it exits cleanly once every BlobCache clone drops and the channel closes. The eviction loop moved to a free evict_over_cap fn shared by the task and the public evict_if_over_cap method (kept for blocking/startup passes).

new must now run inside a Tokio runtime (it tokio::spawns the evictor); documented on new and the module.

Why convergence still holds

A poke is dropped only while another poke is already queued, and that queued poke is consumed after the current store committed, so at least one eviction pass always observes the full post-burst total. Totals only grow until eviction runs, so the final pass drives the cache under the cap.

Acceptance criteria

  • A burst of concurrent blob stores triggers at most one concurrent eviction pass using a single connection (structural: one sequential task).
  • No PoolTimedOut from eviction under a multi-blob pull (one connection holder instead of N).
  • Eviction still converges to the byte cap, with a regression test for the burst case.

Tests

burst_of_concurrent_stores_coalesces_into_few_eviction_passes_and_converges: fires N=8 concurrent stores over a cap of 250, asserts the cache converges under the cap AND that eviction passes (counted via the test store's total_size_bytes calls) is well under N. fmt + clippy -D warnings clean; full cargo test -p dunite-oci green; burst test verified non-flaky across 20 runs.

Follow-up (not in this PR)

dunite-download's download_cache has the same per-store tokio::spawn(evict_if_over_cap) pattern. Not exercised by a fan-out pull the way OCI is, but worth the same treatment for cross-vertical consistency, ideally as a shared dunite-core debounced-evictor primitive. Suggest a separate issue.

#PSA-39

## Problem dunite-oci's blob cache spawned a detached LRU eviction task on every blob store (`tokio::spawn(evict_if_over_cap)`). Under a `docker pull` (one store per blob, several blobs concurrent) this fired N eviction tasks in seconds, each needing its own DB connection. Against a bounded pool the tasks lost the acquire race and failed with PoolTimedOut, so the byte cap stopped being enforced while pulls were active (root-caused in bunyip BUNYIP-41; bunyip mitigated consumer-side with a dedicated pool, but the per-blob spawn pattern is the structural cause and belongs here). ## Change One long-lived background evictor task, spawned by `BlobCache::new`. It owns at most one connection and is poked through a capacity-1 `mpsc` channel: a burst of concurrent stores collapses to one pending notification (extra pokes dropped via `try_send`), so the cap is enforced with one eviction pass per burst rather than one task (one connection) per blob. The task holds only the eviction essentials (store, cache_dir, max_bytes), never a sender, so it exits cleanly once every `BlobCache` clone drops and the channel closes. The eviction loop moved to a free `evict_over_cap` fn shared by the task and the public `evict_if_over_cap` method (kept for blocking/startup passes). `new` must now run inside a Tokio runtime (it `tokio::spawn`s the evictor); documented on `new` and the module. ## Why convergence still holds A poke is dropped only while another poke is already queued, and that queued poke is consumed after the current store committed, so at least one eviction pass always observes the full post-burst total. Totals only grow until eviction runs, so the final pass drives the cache under the cap. ## Acceptance criteria - [x] A burst of concurrent blob stores triggers at most one concurrent eviction pass using a single connection (structural: one sequential task). - [x] No PoolTimedOut from eviction under a multi-blob pull (one connection holder instead of N). - [x] Eviction still converges to the byte cap, with a regression test for the burst case. ## Tests `burst_of_concurrent_stores_coalesces_into_few_eviction_passes_and_converges`: fires N=8 concurrent stores over a cap of 250, asserts the cache converges under the cap AND that eviction passes (counted via the test store's `total_size_bytes` calls) is well under N. fmt + clippy -D warnings clean; full `cargo test -p dunite-oci` green; burst test verified non-flaky across 20 runs. ## Follow-up (not in this PR) dunite-download's `download_cache` has the same per-store `tokio::spawn(evict_if_over_cap)` pattern. Not exercised by a fan-out pull the way OCI is, but worth the same treatment for cross-vertical consistency, ideally as a shared dunite-core debounced-evictor primitive. Suggest a separate issue. #PSA-39
perf(oci): single debounced background evictor instead of one spawned task per blob (PSA-39)
All checks were successful
Checks / fmt + clippy + test (pull_request) Successful in 12s
create-release / create-release (pull_request) Has been skipped
3bdfa1d468
dunite-oci's blob cache spawned a detached LRU eviction task on every blob store (`tokio::spawn(evict_if_over_cap)`). Under a `docker pull` (one store per blob, several blobs concurrent) this fired N eviction tasks in seconds, each needing its own DB connection. Against a bounded pool the tasks lost the acquire race and failed with PoolTimedOut, so the byte cap stopped being enforced while pulls were active (root-caused in bunyip BUNYIP-41).

Replace the per-store spawn with one long-lived background evictor task, spawned by `BlobCache::new`. It owns at most one connection and is poked through a capacity-1 `mpsc` channel: a burst of concurrent stores collapses to one pending notification (extra pokes are dropped via `try_send`), so the cap is enforced with one eviction pass per burst instead of one task (one connection) per blob. The task holds only the eviction essentials (store, cache_dir, max_bytes), never a sender, so it exits cleanly when every `BlobCache` clone is dropped and the channel closes. The eviction loop moves to a free `evict_over_cap` fn shared by the task and the public `evict_if_over_cap` method (kept for blocking/startup passes); `new` must now run inside a Tokio runtime.

A poke is dropped only while another poke is already queued, and that queued poke is consumed after the current store committed, so at least one eviction pass always observes the full post-burst total: convergence to the cap is preserved. Regression test `burst_of_concurrent_stores_coalesces_into_few_eviction_passes_and_converges` fires N=8 concurrent stores over a cap of 250 and asserts both that the cache converges under the cap and that the number of eviction passes (counted via the test store's `total_size_bytes` calls) is well under N. Verified non-flaky across 20 runs.

#PSA-39

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
nrupard deleted branch feat/psa-39-single-debounced-evictor 2026-06-03 20:00:47 +02:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
psa-systems/dunite!7
No description provided.