feat(image): add classify subcommand (MK-6) #24

Merged

David merged 1 commit from feat/image-classify into main

2026-05-20 21:35:37 +02:00

David commented

2026-05-20 21:22:13 +02:00

Owner

Implements MK-6 (Layer 1 of the MK-5 training-loop design).

Summary

Adds monkey image classify <input> which prints <class>\t<confidence> to stdout. Classes: color-photo, mono-photo, binary-scan, halftone-scan, receipt, screenshot, unknown.
Six heuristic features: saturation mean/std, luma bimodality, Sobel edge density, high-passed multi-line autocorrelation, 4-bpc effective palette size, aspect ratio. Score each class with a soft logistic combination; pick the argmax. Confidence is the margin to the runner-up. No ML, no network calls.
Thresholds load from an embedded src/image/classify.toml, overridable via $XDG_CONFIG_HOME/monkey/classify.toml (or $HOME/.config/monkey/classify.toml). Tuning iterations need no rebuild.
New deps: toml = "0.8", serde = { version = "1", features = ["derive"] }. Both small and well-established; no model runtime, no FFT crate.

What this unblocks

monkey image auto recipe runner (future sub-issue of MK-5), which dispatches on the label this subcommand emits.
The feedback log in Layer 3 of MK-5 groups runs by class for tuning; that grouping is meaningless without a class label.

Design notes worth flagging

Halftone detection uses a moving-average high-pass before per-line autocorrelation. Without it, smooth gradients (a feature of mono and color photos) register as highly autocorrelated at short lags and are misclassified as halftone. The high-pass collapses gradients to near-zero residual while preserving genuine periodicity.
Per-axis halftone score is the median of eight evenly-spaced row and column peaks, not the max. An isolated band of text in an otherwise blank receipt produces a strong autocorrelation on a single row; the median rejects it.
Binary vs receipt is separated by aspect ratio; halftone wins over binary by periodicity. Screenshot vs mono-photo is separated by edge density (sharp seams vs smooth gradient).

Test plan

cargo test (62 tests pass, 10 new in image::classify::tests).
just check (fmt + clippy -D warnings + cargo build + Docker builder-stage compile).
One synthetic fixture per class is generated in-test (no binary blobs committed).
Run against a real corpus of mixed inputs in a follow-up to tune the embedded thresholds before Layer 2 ships.

Implements MK-6 (Layer 1 of the MK-5 training-loop design). ## Summary - Adds `monkey image classify <input>` which prints `<class>\t<confidence>` to stdout. Classes: `color-photo`, `mono-photo`, `binary-scan`, `halftone-scan`, `receipt`, `screenshot`, `unknown`. - Six heuristic features: saturation mean/std, luma bimodality, Sobel edge density, high-passed multi-line autocorrelation, 4-bpc effective palette size, aspect ratio. Score each class with a soft logistic combination; pick the argmax. Confidence is the margin to the runner-up. No ML, no network calls. - Thresholds load from an embedded `src/image/classify.toml`, overridable via `$XDG_CONFIG_HOME/monkey/classify.toml` (or `$HOME/.config/monkey/classify.toml`). Tuning iterations need no rebuild. - New deps: `toml = "0.8"`, `serde = { version = "1", features = ["derive"] }`. Both small and well-established; no model runtime, no FFT crate. ## What this unblocks - `monkey image auto` recipe runner (future sub-issue of MK-5), which dispatches on the label this subcommand emits. - The feedback log in Layer 3 of MK-5 groups runs by class for tuning; that grouping is meaningless without a class label. ## Design notes worth flagging - Halftone detection uses a moving-average high-pass before per-line autocorrelation. Without it, smooth gradients (a feature of mono and color photos) register as highly autocorrelated at short lags and are misclassified as halftone. The high-pass collapses gradients to near-zero residual while preserving genuine periodicity. - Per-axis halftone score is the median of eight evenly-spaced row and column peaks, not the max. An isolated band of text in an otherwise blank receipt produces a strong autocorrelation on a single row; the median rejects it. - Binary vs receipt is separated by aspect ratio; halftone wins over binary by periodicity. Screenshot vs mono-photo is separated by edge density (sharp seams vs smooth gradient). ## Test plan - [x] `cargo test` (62 tests pass, 10 new in `image::classify::tests`). - [x] `just check` (fmt + clippy `-D warnings` + cargo build + Docker builder-stage compile). - [x] One synthetic fixture per class is generated in-test (no binary blobs committed). - [ ] Run against a real corpus of mixed inputs in a follow-up to tune the embedded thresholds before Layer 2 ships.

David added 1 commit

2026-05-20 21:22:14 +02:00

feat(image): add classify subcommand for heuristic image typing

Check / fmt + clippy + build + tests (pull_request) Successful in 17s

Details

Create release / Create release from merged PR (pull_request) Has been skipped

Details

c46b7b5973

Adds `monkey image classify <input>` which prints `<class>\t<confidence>` to stdout. Classes covered: color-photo, mono-photo, binary-scan, halftone-scan, receipt, screenshot, unknown. Confidence is the margin between the top class and the runner-up; if the top score falls below `min_score`, the result is `unknown`.

The classifier computes six cheap deterministic features over the input image (saturation mean/std, luma bimodality of the 64-bin histogram, Sobel edge density, multi-line autocorrelation peak at lags 3..=10 after a moving-average high-pass to suppress smooth gradients, 4-bit-per-channel effective palette size from a 64x64 downsample, and aspect ratio). It then scores each class with a soft logistic combination of the relevant features and picks the argmax. Heuristics-only by design: no ML, no network calls, no temp files.

Thresholds load from an embedded `src/image/classify.toml` via `include_str!`, overridable by writing the same schema to `$XDG_CONFIG_HOME/monkey/classify.toml` (or `$HOME/.config/monkey/classify.toml`). The override path is read once per invocation so tuning iterations need no rebuild.

This is Layer 1 of the MK-5 training-loop design (parent: pandoras-box/monkey#MK-5). The output is the input that the future `monkey image auto` recipe runner will dispatch on, and the label that the future feedback log will group by. Subsequent issues will cover Layer 2 (recipe runner) and Layer 3 (NDJSON feedback log + external tuner).

Adds `toml` and `serde` (with `derive`) as new dependencies. Both are small and well-established; no model runtime, no FFT crate.

Ten unit tests cover one synthetic fixture per class (color gradient, gray gradient, binary text, period-6 halftone dots, narrow-aspect text receipt, banded screenshot with seams) plus the degenerate cases (solid colour, checkerboard, confidence-in-unit-interval). The synthetic fixtures forced two real improvements to the detector: a moving-average high-pass on the autocorrelation line so smooth gradients do not register as periodic, and median-of-eight rows/columns so an isolated text line in an otherwise blank scan does not look like a halftone.

#MK-6 State Done

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>