feat(tuner): carry input_path, replace IDW with replay-then-rate validation (MK-25) #43
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feat/tuner-input-path-replay-validation-MK-25"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
Implements MK-25: the feedback log now carries
input_pathon everyrunline, andmonkey-tunerreplaces the inverse-distance-weighted (IDW) score guess with a real replay-then-rate validation pass.Follow-up to MK-10 (which shipped the tuner with IDW scoring and flagged this exact gap in its module header). Parent: MK-5.
Changes
monkey image auto --recordwritesinput_path(the path the user passed, unnormalised) on each run record. The rating record is unchanged.src/lib.rslibrary target so themonkey-tunerbinary can call the recipe runner in-process instead of shelling out.src/main.rsbecomes a thin dispatcher overmonkey::*. Newrecipe::replay(input, output, steps)applies an explicit recipe with the exact per-op dispatchautouses, so a replayed output is byte-identical to a real run. No new dependency.monkey-tuner tune: keeps the seeded 80/20 split + IDW grid search to rank candidates, then replays the top candidate (K=1 MVP) on the training inputs. Writes proposed outputs under$XDG_CACHE_HOME/monkey/tuner/<class>/<timestamp>/and current-recipe outputs into a siblingcurrent/dir for A/B review, appends arunrecord per replayed output (carrying the proposedrecipe_sha+ originalinput_path), prints themonkey image ratecommands, and writes<class>.next.tomlwith the IDW-predicted delta plus the replay-output-dir.input_pathare skipped with a rate-limited warning (first 5 named, then a summary count). Clean break, no mixed-mode fallback.monkey-tuner promote <class> --log <log>is the validation gate: re-reads the log, computes the measured proposed-vs-current delta over inputs rated under both recipe shas (matched by(input_path, recipe_sha)), refuses a non-positive delta unless--force, and stamps the measured delta onto the promoted recipe's leading comment. Without--logit falls back to the prior sha-pin-only behaviour.monkey-tuner validate --class <name> --log <log>reports the measured delta without writing.## Workflowgains the rate-the-replayed-outputs round-trip;## Tuningrewritten for replay-then-rate.Implementation notes / assumptions
The local corpus tree is gitignored and never reaches CI, so (per the issue) this PR provides the schema change and the replay scaffolding; the maintainer validates the empirical end-to-end loop locally. Two decisions were made where the spec left room, both documented in code:
input_paths rated under both the proposed and current recipe shas (the paired re-rated set), rather than a separate seeded held-out split. This realises "matched by input_path and recipe_sha" deterministically and needs no seed at promote time.tuneappends arunrecord for each replayed proposed output (input= output hash,input_path= original path,recipe_sha= proposed). This is what lets a latermonkey image rate <output>join back to the proposal sopromote/validatecan measure it. The shortlist is fixed at K=1 (no--shortlistflag) per the MVP scope.Testing
just checkpasses locally: fmt, clippy--all-targets -D warnings, build--all-targets, tests, and the builder-stage Docker (musl) compile.New unit tests:
input_pathNDJSON round-trip, legacy-record skipping with rate-limited warning, missing-path replay branch + >50% abort threshold, measured-delta join (paired and unpaired), andpromotenon-positive-delta refusal.Closes MK-25.
🤖 Generated with Claude Code