feat(tuner): build monkey-tuner binary for log-driven recipe tuning (MK-10) #42
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feat/monkey-tuner-MK-10"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
MK-10: build monkey-tuner binary
Ships the second half of MK-5 Layer 3: a standalone
monkey-tunerbinary that consumes the feedback log written bymonkey image auto --record/monkey image rateand closes the training loop.monkeyitself is unchanged.What it does
monkey-tuner tune --log <runs.jsonl> [--class <name>] [--seed N] [--recipe-dir DIR]reads the NDJSON log, joins each run to its most recent rating by the input blake3 hash, groups the pairs by class, and for every class with at least 20 rated runs grid-searches a per-class recipe parameter delta over a seeded 80/20 split. It writesrecipes/<class>.next.tomlwith a provenance header (live-recipe sha, seed, sample sizes, training and held-out mean scores) and flags non-positive deltas with# REGRESSION. Classes below the minimum are skipped with a one-line message.monkey-tuner replay --log <runs.jsonl> [--class <name>]reports the current-recipe mean score per class and writes nothing.monkey-tuner promote <class> [--recipe-dir DIR] [--force]atomically renames<class>.next.tomlto<class>.tomlviarename(2), refuses when the proposal is missing, and refuses (unless--force) when the live recipe changed since the proposal was generated (its sha is pinned in the proposal header).Scoring model: deliberate deviation from the spec wording
MK-10 step 4 describes re-running the recipe against each input to compute a new
output_blake3and looking up its rating. The feedback log (src/image/feedback.rs) records only blake3 hashes, never input paths or bytes, so the original inputs cannot be replayed; and a never-before-produced output would have no human rating to look up regardless. This PR implements the equivalent that the log schema actually supports: each rated run is one(args, score)observation, and a candidate argument set is scored by inverse-distance-weighted regression over those observations in normalised argument space. Grid search picks the best training point and the held-out split validates that it generalises. The deviation is documented in the module header, the README, and the commit body. True image replay is a follow-up that first needs the log to carry input paths.Wiring
src/bin/monkey-tuner.rsis auto-discovered as a second binary (noCargo.tomlchange, no new dependency: the seeded split usesStdRngalready in the tree viarand).binaryexport stage both copymonkey-tunernext tomonkey; the build-binary workflow publishesmonkey-tuner-linux-x86_64to the generic package registry and Dufs alongsidemonkey-linux-x86_64.## Tuningsubsection right after## Workflow.Verification
just checkpasses locally: fmt,clippy --all-targets -D warnings, build, tests, and the builder-stage Docker compile (which produces both musl binaries; thebinaryexport stage was confirmed to emitmonkeyandmonkey-tuner).Closes MK-10.
🤖 Generated with Claude Code