fix(oci): blob cache total_size_bytes decode error broke LRU eviction (BUNYIP-41) #53
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "fix/bunyip-41-oci-blob-eviction-pool"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes BUNYIP-41. The api logged
oci blob cache eviction failedonce per blob during everydocker pull, and the byte cap was never enforced while pulls were active.Actual root cause (the issue's RCA was wrong)
OciBlobCacheRepository::total_size_bytesranSELECT SUM(size_bytes)and decoded the result asOption<i64>. Postgres returnsSUM(bigint)as NUMERIC (to avoid overflow), notint8, so the row decode failed on every non-empty table:dunite-oci's blob cache calls
total_size_bytesat the start of its LRU eviction pass, so eviction failed on every blob store. The failure is client-side (decoding the result), which is why Postgres logged nothing.The issue hypothesized PgPool exhaustion /
PoolTimedOut. That was a misdiagnosis: the failures clustered in well under the acquire timeout, the dev DB had ample connection headroom (16/100), and the query reached Postgres fine. I instrumented the rawsqlx::Error(the genericAppErrormapping hid it) and it was aColumnDecode, not a pool error. No pool-sizing change is needed.Fix
One line:
SELECT COALESCE(SUM(size_bytes), 0)::BIGINT, decoded as a non-nulli64. Plus a DB-backed regression test (total_size_bytes_sums_without_decode_error).Verification
rust-builder 1.94.1 container: clippy
--workspace --all-targets -D warningsclean, fmt clean, 209 lib tests pass.Live dev stack: a cold-cache
docker pull(just verify-oci) that previously logged 7-8 eviction failures now logs zero, and the cache bookkeeping populates correctly (total_size_bytesreturns the summed bytes, e.g. 8 rows / 40922561 bytes).Filed dunite follow-ups (structural / observability, separate repo)
🤖 Generated with Claude Code