fix(classify): tune panel_density thresholds against form-scan corpus (partial, MK-20) #40

Merged

David merged 1 commit from feat/classify-panel-density-tune-MK-20 into main

2026-05-24 21:29:07 +02:00

David commented

2026-05-24 21:19:50 +02:00

Owner

Implements MK-20 (partial). Filing follow-up MK for the scoring change that unblocks the remaining 6 fixtures.

Summary

MK-16 shipped panel_density defaults that recognised only 1 of 11 form-scan fixtures in the local corpus. Empirical probe via --probe measured the real distribution; this PR moves the defaults to:

panel_luma_lo: 0.20 -> 0.50 (excludes binary-scan dense-text fusion at component mean_luma 0.32-0.39)
panel_luma_hi: 0.80 -> 0.92 (catches the lightest tax-form blue tints at 0.85-0.89)
panel_density_min: 0.05 -> 0.02 (real form-panel coverage is 0.02-0.18, MK-16's 5% overshot)

Inline comments on each value cite the observed band.

Results

class           before    after    AC
form-scan        1/11     5/11    11/11
binary-scan     52/52    52/52    52/52
receipt          5/5      5/5      5/5
color-photo      1/1      1/1      1/1

Net +4 form-scan classifications, zero regressions in the populated baseline directories.

A pre-existing miss in graphic/Scan-20260524T144404.png (classifies as color-photo) is unchanged: the fixture's sat_mean=0.187, palette_size=336 features fire color-photo's saturation+palette gate under main's defaults too. Orthogonal to MK-20.

Why this is partial

The remaining 6 form-scan misses cannot be reached by threshold tuning alone. The structural blocker is a scoring-formula asymmetry:

binary = bimodal * low_sat * (1 - halftone) is a 3-factor product.
form_scan = soft_ge(panel_density) * soft_ge(non_panel_bimodality) * edges_text * low_sat * (1 - color) is a 5-factor product with each factor in [0, 1].

On feature-similar inputs (high bimodality, low saturation), binary's shorter product wins because nothing attenuates it on form-scan signal the way (1 - halftone) attenuates it on halftone signal. The 6 misses break down as: 4 lose to binary's bimodal * low_sat, 1 falls below min_score (135334 has unusually low non_panel_bimodality 0.85 and edges 0.05), 1 loses to mono-photo via (1 - bimodal) (140051's overall bimodality is 0.61).

Will file a follow-up MK to add (1 - panel_density_high) (or similar) to the binary score so the two classes compete on a level field. This PR stops at the cleanest threshold-only ship.

Test plan

cargo test --bin monkey classify (15/15 in-tree synthetic tests).
just classify-fixtures against the local 11-fixture form-scan corpus and full binary-scan / receipt / color-photo sets (see Results above).
just check (fmt + clippy + build + Docker stage).
CI on Forgejo.

Implements MK-20 (partial). Filing follow-up MK for the scoring change that unblocks the remaining 6 fixtures. ## Summary MK-16 shipped panel_density defaults that recognised only 1 of 11 form-scan fixtures in the local corpus. Empirical probe via `--probe` measured the real distribution; this PR moves the defaults to: - `panel_luma_lo`: 0.20 -> 0.50 (excludes binary-scan dense-text fusion at component mean_luma 0.32-0.39) - `panel_luma_hi`: 0.80 -> 0.92 (catches the lightest tax-form blue tints at 0.85-0.89) - `panel_density_min`: 0.05 -> 0.02 (real form-panel coverage is 0.02-0.18, MK-16's 5% overshot) Inline comments on each value cite the observed band. ## Results ``` class before after AC form-scan 1/11 5/11 11/11 binary-scan 52/52 52/52 52/52 receipt 5/5 5/5 5/5 color-photo 1/1 1/1 1/1 ``` Net +4 form-scan classifications, zero regressions in the populated baseline directories. A pre-existing miss in `graphic/Scan-20260524T144404.png` (classifies as `color-photo`) is unchanged: the fixture's `sat_mean=0.187`, `palette_size=336` features fire color-photo's saturation+palette gate under main's defaults too. Orthogonal to MK-20. ## Why this is partial The remaining 6 form-scan misses cannot be reached by threshold tuning alone. The structural blocker is a scoring-formula asymmetry: - `binary = bimodal * low_sat * (1 - halftone)` is a 3-factor product. - `form_scan = soft_ge(panel_density) * soft_ge(non_panel_bimodality) * edges_text * low_sat * (1 - color)` is a 5-factor product with each factor in `[0, 1]`. On feature-similar inputs (high bimodality, low saturation), binary's shorter product wins because nothing attenuates it on form-scan signal the way `(1 - halftone)` attenuates it on halftone signal. The 6 misses break down as: 4 lose to binary's `bimodal * low_sat`, 1 falls below `min_score` (135334 has unusually low `non_panel_bimodality` 0.85 and edges 0.05), 1 loses to mono-photo via `(1 - bimodal)` (140051's overall bimodality is 0.61). Will file a follow-up MK to add `(1 - panel_density_high)` (or similar) to the binary score so the two classes compete on a level field. This PR stops at the cleanest threshold-only ship. ## Test plan - [x] `cargo test --bin monkey classify` (15/15 in-tree synthetic tests). - [x] `just classify-fixtures` against the local 11-fixture form-scan corpus and full binary-scan / receipt / color-photo sets (see Results above). - [x] `just check` (fmt + clippy + build + Docker stage). - [ ] CI on Forgejo.

David added 1 commit

2026-05-24 21:19:50 +02:00

fix(classify): tune panel_density thresholds against form-scan corpus (partial)

Check / fmt + clippy + build + tests (pull_request) Successful in 17s

Details

Create release / Create release from merged PR (pull_request) Has been skipped

Details

101aeffa7b

The MK-16 shipped defaults (`panel_luma_lo=0.20`, `panel_luma_hi=0.80`, `panel_density_min=0.05`) classified only 1 of 11 form-scan fixtures correctly. Empirical probe via `--probe` against the locally-staged corpus (10 tax-form fixtures + 1 colour-panel variant moved out of `halftone-scan/` after the MK-18 split) found:

- Real form panels in the corpus have component mean_luma 0.55-0.89, with the lightest tax-form blue tints sitting at 0.85-0.89, well above the previous `panel_luma_hi=0.80` cliff. Raising `panel_luma_hi` to 0.92 catches the lightest panels without sweeping in pure paper (luma ~0.95-1.00).

- Binary-scan fixtures that produce false panel signals via dense-text fusion at the 256x256 downsample land at component mean_luma 0.32-0.39. Raising `panel_luma_lo` from 0.20 to 0.50 excludes that population without losing any form-scan panels (the lowest real form panel observed was 0.378 mean_luma on a fixture that already has a second, brighter component).

- Real form-panel coverage is 0.02-0.18, not the 0.05+ MK-16 estimated. Lowering `panel_density_min` from 0.05 to 0.02 admits five additional form-scan fixtures whose panels exist but are smaller than the original threshold.

After tuning: form-scan moves from 1/11 to 5/11. binary-scan stays at 52/52 (zero regression), receipt stays at 5/5, color-photo at 1/1. The graphic class shows one miss (Scan-20260524T144404.png classifies as color-photo) but that miss is present under main's defaults too: it is a pre-existing palette/saturation boundary issue, orthogonal to the panel_density tune.

The remaining 6 form-scan misses cannot be reached by threshold tuning alone. The structural blocker is in the scoring formula: `binary = bimodal * low_sat * (1 - halftone)` is a 3-factor product while `form_scan = soft_ge(panel_density) * soft_ge(non_panel_bimodality) * edges_text * low_sat * (1 - color)` is a 5-factor product with each factor in `[0, 1]`. On feature-similar inputs (form-scan fixture vs binary-scan fixture both with high bimodality and low saturation), binary's shorter product wins because it cannot be attenuated by a panel-density signal the way it is attenuated by halftone via `(1 - halftone)`. The remaining six fixtures either lose to binary's bimodal*low_sat (four cases), fall below `min_score` because their `non_panel_bimodality` or `edge_density` is unusually low (one case), or lose to mono-photo via `(1 - bimodal)` because their overall bimodality is unusually low (one case). A follow-up MK tracks adding a `(1 - panel_density_high)` factor to binary so the two classes can compete on a level field.

`just check` passes. No new dependencies. The override mechanism is unchanged: contributors with their own fixture corpus can still override via `$XDG_CONFIG_HOME/monkey/classify.toml`.

#MK-20 State Done

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>