feat(image): add NDJSON feedback log (MK-5 Layer 3, feedback half) #27

Merged
nrupard merged 3 commits from feat/image-feedback-log-MK-5 into main 2026-05-21 17:34:33 +02:00
Member

What

Feedback-log half of MK-5 Layer 3. The external tuner is deferred to its own follow-up issue/PR per the umbrella plan (it only earns its keep once dozens of rated runs exist).

  • monkey image auto --record runs.jsonl appends one NDJSON line per run: ts_unix_ms, input blake3, class, recipe sha, steps, output blake3, duration_ms.
  • monkey image rate <input> --score 1..5 [--notes "..."] --record runs.jsonl hashes the input and appends a verdict keyed by the same blake3, so a rating joins its run.

Design choices (confirmed before building)

  • Scope: feedback log only; tuner is a separate follow-up.
  • Rate command lives under image (monkey image rate), consistent with classify/auto.
  • --record requires an explicit path. No default, no XDG_DATA writes, no hidden state.

Notes

  • The log is append-only NDJSON: grep-able, git-able, and a partial write loses at most the trailing line.
  • Recording is unified across the filtered and the byte-copy passthrough paths, so an unknown passthrough records an output hash equal to its input hash.
  • recipe_sha is the blake3 of the recipe source text, so the exact recipe version (including a --recipe-dir override) is captured. Recipe::load now returns that source; recipe::Step is pub + Serialize.
  • New deps: blake3, serde_json.

Testing

just pre-commit green (fmt, clippy -D warnings, build, 72 tests). New unit tests cover hash stability, score-range rejection, and one-NDJSON-line-per-record with notes omitted when absent. Manually verified in the builder image: auto --record then rate produce two joined records; an unknown passthrough records output == input; --score 9 errors with exit 1.

Closes MK-5 Layer 3 (feedback log). Remaining: the tuner + recipe promote (separate follow-up).

🤖 Generated with Claude Code

## What Feedback-log half of **MK-5 Layer 3**. The external tuner is deferred to its own follow-up issue/PR per the umbrella plan (it only earns its keep once dozens of rated runs exist). - `monkey image auto --record runs.jsonl` appends one NDJSON line per run: `ts_unix_ms`, input blake3, class, recipe sha, steps, output blake3, `duration_ms`. - `monkey image rate <input> --score 1..5 [--notes "..."] --record runs.jsonl` hashes the input and appends a verdict keyed by the same blake3, so a rating joins its run. ## Design choices (confirmed before building) - **Scope:** feedback log only; tuner is a separate follow-up. - **Rate command** lives under `image` (`monkey image rate`), consistent with `classify`/`auto`. - **`--record`** requires an explicit path. No default, no `XDG_DATA` writes, no hidden state. ## Notes - The log is append-only NDJSON: grep-able, git-able, and a partial write loses at most the trailing line. - Recording is unified across the filtered and the byte-copy passthrough paths, so an `unknown` passthrough records an output hash equal to its input hash. - `recipe_sha` is the blake3 of the recipe source text, so the exact recipe version (including a `--recipe-dir` override) is captured. `Recipe::load` now returns that source; `recipe::Step` is `pub` + `Serialize`. - New deps: `blake3`, `serde_json`. ## Testing `just pre-commit` green (fmt, clippy `-D warnings`, build, 72 tests). New unit tests cover hash stability, score-range rejection, and one-NDJSON-line-per-record with `notes` omitted when absent. Manually verified in the builder image: `auto --record` then `rate` produce two joined records; an `unknown` passthrough records `output == input`; `--score 9` errors with exit 1. Closes MK-5 Layer 3 (feedback log). Remaining: the tuner + `recipe promote` (separate follow-up). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
feat(image): add NDJSON feedback log (MK-5 Layer 3, feedback half)
All checks were successful
Check / fmt + clippy + build + tests (pull_request) Successful in 19s
7d468d5da2
Add the feedback-log half of Layer 3. The external tuner is deferred to its own follow-up, since per the design it only earns its keep once dozens of rated runs exist.

`monkey image auto --record runs.jsonl` appends one NDJSON line per run: timestamp, input blake3, resolved class, recipe sha (blake3 of the recipe source), the applied steps, output blake3, and duration. `--record` takes an explicit path; there is no default location and no hidden state. Recording is unified across both the filtered and the byte-copy passthrough paths, so an `unknown` passthrough logs an output hash equal to its input hash.

`monkey image rate <input> --score 1..5 [--notes ...] --record runs.jsonl` hashes the input and appends a verdict keyed by that same blake3, so ratings join their runs. Out-of-range scores are rejected.

The log is append-only newline-delimited JSON: grep-able, git-able, and a partial write loses at most the trailing line rather than corrupting the file. New deps: `blake3` (hashing) and `serde_json` (NDJSON). `recipe::Step` is now public and `Serialize`, and `Recipe::load` returns the recipe source text so the exact recipe version can be hashed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(image): hash canonical recipe steps for recipe_sha; honest durability doc
All checks were successful
Check / fmt + clippy + build + tests (pull_request) Successful in 19s
ec91012f35
Address review of the feedback log:

`recipe_sha` now hashes the recipe's parsed, serialized steps (`Recipe::sha`) instead of the raw source text. Two recipes that differ only in comments, whitespace, or key order now produce the same sha. This matters because MK-10's tuner groups runs by class + recipe_sha; hashing source text would have let a cosmetic edit fragment a class's rated samples across two shas. `Recipe::load` reverts to returning just the recipe (the source-text plumbing is gone).

Correct the durability claim. The log doc and README previously implied a partial write "loses at most the trailing line". A single `O_APPEND` `write_all` is not crash-atomic: a process killed mid-write can leave a partial line that the next append concatenates onto. The docs now say durability is best-effort and that readers must skip a malformed trailing line rather than assume every line parses.

Adds a test that two source-formatting variants of the same steps hash equal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(image): set run kind internally, warn on orphan rating, stream hash, self-cleaning test tmpfiles
All checks were successful
Check / fmt + clippy + build + tests (pull_request) Successful in 17s
Create release / Create release from merged PR (pull_request) Has been skipped
6faff5b0c1
Clean up the remaining review items on the feedback log:

`RunRecord` fields are now private and built via `RunRecord::new`, which stamps `kind = "run"` internally. A caller can no longer set the wrong discriminator, mirroring how `RatingRecord` is constructed.

`monkey image rate` now warns (on stderr, without failing) when the log holds no `run` record matching the input hash, so a rating for an unrecorded or mis-pathed input is flagged rather than silently orphaned. The lookup skips malformed lines, matching the log's documented skip-bad-trailing-line contract.

`hash_file` streams the file through `blake3::Hasher` via `io::copy` instead of reading it fully into memory, so a large input (or a future video record) is not buffered.

Test temp files now use a self-deleting `TmpFile` drop guard, so a panicking assertion no longer leaves stale files in the tempdir. Adds tests for streamed hashing and the run-lookup helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nrupard deleted branch feat/image-feedback-log-MK-5 2026-05-21 17:34:33 +02:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pandoras-box/monkey!27
No description provided.