feat(classify): any-aspect mono page is binary-scan, receipt is the tall specialisation (MK-8) #29

Merged

nrupard merged 2 commits from feat/classify-binary-aspect-MK-8 into main

2026-05-21 18:10:27 +02:00

nrupard commented

2026-05-21 18:07:06 +02:00

Member

What

MK-8: fix the binary-scan / receipt boundary from MK-6 validation. Class membership is visual structure, not business purpose - a hotel receipt scanned full-page is a binary scan.

Changes

score(): drop (1.0 - aspect_long) from binary-scan, so a mono bimodal page scores on bimodality + low-sat regardless of aspect. Square/letter full-page scans now land in binary-scan instead of receipt or unknown.
Receipt becomes a strict refinement (aspect_long * edges_text * low_sat * (1 - halftone)); when both score high the taller image wins via argmax (Receipt sorts after BinaryScan, so ties resolve to it).
classify.toml: receipt_aspect_min 2.0 -> 2.2, with a comment documenting the inversion.
New synthetic test letter_aspect_mono_scan_is_binary_not_receipt (~1.3 aspect, the HamptonInn/Enterprise shape).
README receipt description states the aspect threshold.

Acceptance status

No regression on existing synthetic fixtures (12 prior classify tests pass).
Square / letter mono scan classifies binary-scan (synthetic test added).
README receipt description makes the aspect threshold explicit.
HamptonInn / Enterprise PNG classify as binary-scan - blocked on the user fixture set.
USPS / RegionsBank PNG still classify as receipt - blocked on the user fixture set.
Real-fixture unit test under tests/fixtures/classify/binary-scan/ - blocked on the user fixture set.

Blocked / needs input

Same pattern as MK-7: the real-image acceptance items need the user-provided validation set (HamptonInn, Enterprise, USPS, RegionsBank PNGs). Drop them into the repo and I will measure their features, confirm the boundary, and add the checked-in fixture test as a follow-up commit on this branch. The receipt_aspect_min value (issue open question: 2.2-2.8) may also shift once the real receipt corpus is measured.

The open question on widening sat_mean_mono_max (0.06 -> ~0.12) for scanner colour casts is deferred - it needs the real scans to confirm, and is the same fix flagged on the MK-7 sibling.

🤖 Generated with Claude Code

## What MK-8: fix the binary-scan / receipt boundary from MK-6 validation. Class membership is visual structure, not business purpose - a hotel receipt scanned full-page is a binary scan. ## Changes - `score()`: drop `(1.0 - aspect_long)` from binary-scan, so a mono bimodal page scores on bimodality + low-sat regardless of aspect. Square/letter full-page scans now land in binary-scan instead of receipt or `unknown`. - Receipt becomes a strict refinement (`aspect_long * edges_text * low_sat * (1 - halftone)`); when both score high the taller image wins via argmax (Receipt sorts after BinaryScan, so ties resolve to it). - `classify.toml`: `receipt_aspect_min` 2.0 -> 2.2, with a comment documenting the inversion. - New synthetic test `letter_aspect_mono_scan_is_binary_not_receipt` (~1.3 aspect, the HamptonInn/Enterprise shape). - README `receipt` description states the aspect threshold. ## Acceptance status - [x] No regression on existing synthetic fixtures (12 prior classify tests pass). - [x] Square / letter mono scan classifies binary-scan (synthetic test added). - [x] README receipt description makes the aspect threshold explicit. - [ ] HamptonInn / Enterprise PNG classify as binary-scan - **blocked on the user fixture set**. - [ ] USPS / RegionsBank PNG still classify as receipt - **blocked on the user fixture set**. - [ ] Real-fixture unit test under `tests/fixtures/classify/binary-scan/` - **blocked on the user fixture set**. ## Blocked / needs input Same pattern as MK-7: the real-image acceptance items need the user-provided validation set (HamptonInn, Enterprise, USPS, RegionsBank PNGs). Drop them into the repo and I will measure their features, confirm the boundary, and add the checked-in fixture test as a follow-up commit on this branch. The `receipt_aspect_min` value (issue open question: 2.2-2.8) may also shift once the real receipt corpus is measured. The open question on widening `sat_mean_mono_max` (0.06 -> ~0.12) for scanner colour casts is deferred - it needs the real scans to confirm, and is the same fix flagged on the MK-7 sibling. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

nrupard added 1 commit

2026-05-21 18:07:06 +02:00

feat(classify): accept any-aspect mono page as binary-scan; receipt is the tall specialisation (MK-8)

Check / fmt + clippy + build + tests (pull_request) Successful in 19s

Details

42c9d58066

Fix the binary-scan / receipt boundary surfaced by MK-6 real-image validation. Class membership is a function of visual structure, not the document's business purpose: a hotel receipt scanned full-page is a binary scan that merely happens to be named "receipt".

Scoring change in `score()`:
- Drop the `(1.0 - aspect_long)` factor from binary-scan. A mono bimodal page now scores on bimodality + low saturation alone, regardless of aspect, so square and letter-shaped full-page scans land in binary-scan instead of tipping into receipt or falling through to `unknown`.
- Receipt stays `aspect_long * edges_text * low_sat * (1 - halftone)` and becomes a strict refinement: when both score high the taller image wins receipt via argmax (Receipt sorts after BinaryScan, so equal scores resolve to it).

`classify.toml`: raise `receipt_aspect_min` 2.0 -> 2.2 so casual page bias (1.3-1.6) does not push square scans into receipt, with a comment documenting the inversion.

Adds `letter_aspect_mono_scan_is_binary_not_receipt` (a ~1.3-aspect mono scan, the HamptonInn/Enterprise shape, classifies binary-scan). README `receipt` description now states the aspect threshold explicitly. Existing synthetic fixtures unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nrupard added 1 commit

2026-05-21 18:10:01 +02:00

test(classify): tighten MK-8 boundary test, clarify soft aspect threshold (review)

Check / fmt + clippy + build + tests (pull_request) Successful in 19s

Details

Create release / Create release from merged PR (pull_request) Has been skipped

Details

99f3b1e5f6

Review nits on the binary/receipt boundary:

`letter_aspect_mono_scan_is_binary_not_receipt` drops the redundant `aspect < receipt_aspect_min` assertion (it tested constant arithmetic, 260/200 = 1.3, not classifier behavior) and keeps the meaningful `class == BinaryScan` check.

`classify.toml`: clarify that `receipt_aspect_min` is a soft centre (gate = 0.5 at the ratio, ramps either side via soft_scale_aspect), not a hard cutoff, so a future tuner does not read it as a threshold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>