Monkey see, Monkey say, Monkey hear: audio, video and image manipulation tools
  • Rust 95%
  • Just 3.9%
  • Dockerfile 0.7%
  • Nushell 0.4%
Find a file
David d7b463a443
All checks were successful
Check / fmt + clippy + build + tests (push) Successful in 42s
Merge pull request 'feat(justfile): add dev-clean / dev-clean-all teardown recipes' (#48) from feat/dev-clean-recipes into main
Reviewed-on: #48
2026-06-13 19:49:16 +02:00
.forgejo/workflows fix(ci): generate release notes from commits since previous tag 2026-05-31 13:28:04 -04:00
.idea feat: Add JetBrains .idea directory to git 2026-05-24 22:17:16 -04:00
docs/superpowers refactor(ci): build only the binary, no OCI image push 2026-05-13 20:19:46 -04:00
oci-build feat(tuner): build monkey-tuner binary for log-driven recipe tuning 2026-05-24 20:49:13 -04:00
src feat(version): replace --version flag with version subcommand tree (MK-26) 2026-05-25 11:37:45 -04:00
tests feat(version): replace --version flag with version subcommand tree (MK-26) 2026-05-25 11:37:45 -04:00
.dockerignore chore: bring repo up to governance spec 2026-05-13 18:12:13 -04:00
.gitignore chore: Sync .gitignore 2026-06-06 13:21:44 -04:00
build.rs feat(version): replace --version flag with version subcommand tree (MK-26) 2026-05-25 11:37:45 -04:00
Cargo.lock feat(version): replace --version flag with version subcommand tree (MK-26) 2026-05-25 11:37:45 -04:00
Cargo.toml feat(version): replace --version flag with version subcommand tree (MK-26) 2026-05-25 11:37:45 -04:00
CLAUDE.md feat(pdf): add from-images and to-images subcommands 2026-05-17 22:18:20 -04:00
justfile feat(justfile): add dev-clean / dev-clean-all teardown recipes 2026-06-13 12:46:18 -04:00
LICENSE.md chore: bring repo up to governance spec 2026-05-13 18:12:13 -04:00
README.md feat(version): replace --version flag with version subcommand tree (MK-26) 2026-05-25 11:37:45 -04:00

monkey

A general purpose toolkit to convert and manipulate images, PDFs, and videos.

Install

Container image

docker pull dev.a8n.run/pandoras-box/monkey:latest
docker run --rm -v "$PWD":/work -w /work dev.a8n.run/pandoras-box/monkey \
    video convert input.mov output.webm --format webm

The runtime image bundles ffmpeg (required by monkey video) and poppler-utils (provides pdftoppm, required by monkey pdf to-images).

Static binary

A Linux x86_64 static binary is published per release:

  • Forgejo generic package: https://dev.a8n.run/api/packages/pandoras-box/generic/monkey/<tag>/monkey-linux-x86_64
  • Or build locally: just build-static

The static binary does not bundle external programs. To use monkey video convert, install ffmpeg; to use monkey pdf to-images, install poppler-utils (pdftoppm):

  • Alpine: apk add ffmpeg poppler-utils
  • Debian/Ubuntu: apt install ffmpeg poppler-utils
  • macOS: brew install ffmpeg poppler

Updating the static binary

The static binary can update itself in place from the Generic Package registry:

monkey version            # print the build banner
monkey version check      # report whether a newer build exists on the stable train
monkey version update     # download, SHA256-verify, and self-replace the running binary

version update accepts --train <name> (release channel; stable by default), --force (reinstall the current build), --dry-run (report without downloading), and --url <base> (point at a mirror or test server). Run inside the container image, version update refuses and tells you to docker pull instead, since the image is immutable.

Workflow

The canonical loop for processing a corpus of scanned images:

# 1. Process: classify + run the matching recipe in one shot,
#    appending each run to a feedback log at a stable location.
monkey image auto raw-scan.png cleaned.png \
    --record ~/.local/share/monkey/runs.jsonl

# 2. Review the output. If it looks right, score it 4-5; if not, 1-2 plus notes.
monkey image rate raw-scan.png --score 4 \
    --record ~/.local/share/monkey/runs.jsonl

# 3. Repeat across the corpus. Edit the recipe TOML or the
#    classify.toml thresholds based on what the log shows.

# 4. Once a class has >= 20 rated runs, tune it. `tune` re-runs the proposed
#    recipe on the corpus, writing the new outputs under
#    $XDG_CACHE_HOME/monkey/tuner/<class>/<timestamp>/ and printing one
#    `monkey image rate` command per replayed output.
monkey-tuner tune --log ~/.local/share/monkey/runs.jsonl

# 5. Review the replayed outputs, then rate each one (the command tune printed).
#    This measures the proposal instead of trusting the IDW prediction.
monkey image rate ~/.cache/monkey/tuner/form-scan/<ts>/raw-scan.png \
    --score 5 --record ~/.local/share/monkey/runs.jsonl

# 6. Check the measured delta, then promote (refuses on a non-positive delta).
monkey-tuner validate --class form-scan --log ~/.local/share/monkey/runs.jsonl
monkey-tuner promote form-scan --log ~/.local/share/monkey/runs.jsonl

The --record path is your choice; ~/.local/share/monkey/runs.jsonl is a convention that survives across sessions and follows the XDG Base Directory layout. The log itself is append-only NDJSON: grep-able, git-able, and safe to back up.

See monkey image auto, monkey image classify, and Feedback log: --record and monkey image rate below for the per-command reference.

Tuning

Once a class has accumulated enough rated runs (at least 20), the separate monkey-tuner binary closes the training loop: it reads the feedback log, grid-searches a better set of recipe parameters per class, replays the proposal on the real corpus so you can rate the actual outputs, and promotes the winner once its measured score delta is positive. monkey-tuner ships in the same container image and static-binary bundle as monkey, and reuses only monkey's recipe runner so a replayed output is byte-identical to a real monkey image auto run.

# 1. See where the corpus stands (report-only, writes nothing).
monkey-tuner replay --log ~/.local/share/monkey/runs.jsonl

# 2. Propose a per-class recipe delta and replay it. Writes
#    recipes/<class>.next.toml plus the proposed and current outputs under
#    $XDG_CACHE_HOME/monkey/tuner/<class>/<timestamp>/, and prints the
#    `monkey image rate` commands for the new outputs. Classes below the data
#    minimum are skipped with a one-line message.
monkey-tuner tune --log ~/.local/share/monkey/runs.jsonl

# 3. Review the replayed outputs side by side, then rate each proposed output
#    (the commands tune printed). Each rating goes into the same log.
monkey image rate ~/.cache/monkey/tuner/form-scan/<ts>/raw-scan.png \
    --score 5 --record ~/.local/share/monkey/runs.jsonl

# 4. See the measured delta (read-only), then promote in place. promote
#    re-reads the log and refuses a non-positive measured delta unless --force.
monkey-tuner validate --class form-scan --log ~/.local/share/monkey/runs.jsonl
monkey-tuner promote form-scan --log ~/.local/share/monkey/runs.jsonl
form-scan: 47 rated runs (training 38 / held-out 9)
  current mean score (held-out, IDW): 3.40
  proposed: despeckle.radius 4 -> 3, constrained-sharpen.strength 0.5 -> 0.7
  proposed mean score (held-out, IDW): 3.80 (+0.40 predicted)
  wrote recipes/form-scan.next.toml
  replayed 38 input(s) -> /home/you/.cache/monkey/tuner/form-scan/1779376589005
  rate the proposed outputs, then run `monkey-tuner validate`/`promote`:
    monkey image rate .../form-scan/1779376589005/raw-scan.png --score <1-5> --record ...

binary-scan: 12 rated runs - below minimum (refusing)

tune joins each run to its most recent rating by the input's blake3 hash, groups the pairs by class, then sweeps each numeric recipe argument over a 5-point grid bounded by the values observed in the log. The split between the training set (which picks the winner) and the held-out set (which validates it) is seeded (--seed, default 0xC0FFEE), so the same log always produces the same split. The grid search ranks candidates by inverse-distance-weighted regression over the logged (args, score) pairs, but that prediction is only an extrapolation, so tune re-runs the top candidate on the corpus and writes the proposed and current outputs side by side. Run records that predate the input-path schema are skipped with a warning (clean break, no mixed-mode fallback). A proposal whose predicted delta loses to the live recipe is still written, flagged # REGRESSION in the file header, so promotion is always a deliberate act.

validate --class <name> --log <log> reports the measured proposed-vs-current delta (the mean rating of the replayed outputs minus the mean rating of the current outputs, matched per input path and recipe) without writing anything.

promote renames recipes/<class>.next.toml to recipes/<class>.toml with an atomic rename(2). It refuses if the proposal is missing, and refuses (unless --force) if the live recipe has changed since the proposal was generated (the live recipe's sha is pinned in the proposal's header). When --log is given it also re-reads the log, computes the measured delta, refuses a non-positive delta unless --force, and stamps the measured delta onto the promoted recipe's leading comment. All commands take --recipe-dir DIR, defaulting to $XDG_CONFIG_HOME/monkey/recipes, mirroring monkey image auto --recipe-dir.

Commands

Run monkey --help for the full listing and monkey <command> --help for per-command options.

monkey video convert

Convert a single video file to a supported container format. Requires ffmpeg on PATH.

Supported formats:

Format Default video codec Allowed video codecs Audio codec
webm libvpx-vp9 libvpx-vp9, libvpx, libaom-av1 libopus
mp4 libx264 libx264, libx265, libaom-av1 aac

An overridden codec must also be enabled in your local ffmpeg build.

# Positional input/output
monkey video convert clip.mov clip.webm --format webm

# Flag form (equivalent)
monkey video convert --input-file clip.mp4 --output-file clip.webm --format webm

# Override the video codec
monkey video convert input.avi output.mp4 --format mp4 --codec libx265

# Overwrite an existing output
monkey video convert input.mp4 output.webm --format webm --force

monkey writes to a .monkey-partial.<pid>.<ext> sidecar and renames on success, so a crash, Ctrl-C, or ffmpeg failure never leaves a half-written file in place of a valid output.

monkey image <filter>

Image filters. Each takes an input and output path plus filter-specific options (see --help). All run in pure Rust; none shell out to an external program.

G'MIC-inspired: local-contrast, boost-screen, vivid-screen, smooth-bilateral, smooth-mean-curvature, smooth-median, sharp-abstract, stamp, constrained-sharpen, moire-removal, anti-alias, despeckle, sharpen-tones.

ImageMagick-inspired: shave (trim pixels from each side), deskew (detect and counter-rotate document skew), colors (reduce to N colours via NeuQuant), colorspace (convert colour space), contrast-stretch (per-channel histogram stretch), density (rewrite embedded DPI metadata, pixels unchanged).

monkey image local-contrast in.jpg out.jpg --strength 80
monkey image despeckle scan.png clean.png --threshold 5 --radius 10
monkey image deskew scan.png straight.png --threshold 0.5
monkey image colors photo.png flat.png --count 16

monkey image diff

Per-pixel comparison of two images (ImageMagick compare). Writes an output image that paints matching pixels with --bg and differing pixels with --highlight. --threshold (0.0-1.0) sets a per-channel tolerance: pixels whose maximum channel difference is at or below threshold * 255 count as a match.

monkey image diff a.png b.png diff.png
monkey image diff a.png b.png diff.png --bg white --highlight red --threshold 0.02

monkey image classify

Heuristic image classifier. Prints <class>\t<confidence> to stdout. Classes: color-photo, mono-photo, binary-scan, halftone-scan, form-scan, receipt, screenshot, graphic, unknown. Confidence is the margin between the top class and the runner-up (0.0..1.0); a top score below min_score falls back to unknown.

halftone-scan is the AM-halftone class: newspaper / magazine scans with a detectable dot pitch. form-scan is the FM-screened sibling: tax forms, vehicle inspection forms, and similar documents whose shaded panels are stochastic and have no periodic dot pitch. The split is tracked in MK-18; scoring for form-scan (via the panel_density feature) lands with MK-16.

"AM" and "FM" come from radio: amplitude modulation varies the wave's strength at a fixed frequency, frequency modulation varies the frequency at fixed strength. Print-trade halftone reuses the metaphor on a 2D pixel grid:

screen stands for dot grid what varies typical use
AM Amplitude Modulation regular, periodic dot size (bigger dot = darker tone) newspaper, magazine, book photo plates
FM Frequency Modulation stochastic, no period dot frequency (more dots = darker tone) office form panels, ink-jet dithering

This is why monkey's classifier signals are different per class. halftone-scan detects AM print via halftone_autocorr_peak: the regular dot pitch produces a strong autocorrelation peak at the dot period. form-scan cannot use that signal because FM screening has no period to grip; it uses panel_density (connected-component area of mid-tone regions) instead.

graphic covers logos, brand marks, and simple diagrams: a small saturated palette with smooth, low-edge shapes. It is distinct from screenshot (dense UI edges) and from color-photo (large palette); color-photo now requires a palette above graphic_palette_max, so a saturated logo lands in graphic.

receipt is the narrow-aspect specialisation of binary-scan: a mono bimodal page is classified receipt only when its long/short side ratio reaches receipt_aspect_min (default 2.2, a true thermal roll). Any squarer or letter-shaped mono scan stays binary-scan regardless of aspect.

monkey image classify in.jpg
# color-photo	0.412

monkey image classify scan.png --features
# binary-scan	0.731
# features: width=2480 height=3508 aspect=1.414 sat_mean=0.018 sat_std=0.034 ...

Thresholds load from an embedded classify.toml. Override by writing the same schema to $XDG_CONFIG_HOME/monkey/classify.toml (or $HOME/.config/monkey/classify.toml). This subcommand is Layer 1 of the MK-5 training-loop design and is the input to the monkey image auto recipe runner.

monkey image auto

Classifies the input (via monkey image classify) and runs the matching class recipe end-to-end, so the caller never has to name a filter chain. Layer 2 of the MK-5 training-loop design.

monkey image auto in.jpg out.jpg
# auto: classified in.jpg as color-photo (0.412)
#   step 1/2: local-contrast (4ms)
#   step 2/2: constrained-sharpen (6ms)
# auto[color-photo]: in.jpg -> out.jpg (2 steps, 58ms)

monkey image auto --class binary-scan scan.png clean.png   # force a class, skip detection

A recipe is a TOML list of existing image filters, applied in order:

[[step]]
op = "local-contrast"
args = { strength = 60, sigma = 0.75, iterations = 1 }

[[step]]
op = "constrained-sharpen"
args = { strength = 1, radius = 3, damping = 10 }

op is the kebab-case name of any monkey image filter; args mirrors that filter's flags, and any omitted arg falls back to the filter's own default. There is no new DSL. Defaults ship embedded for every class (unknown is a passthrough copy). Override a class by writing recipes/<class>.toml under $XDG_CONFIG_HOME/monkey/ (or $HOME/.config/monkey/recipes/), or point --recipe-dir DIR at a directory of <class>.toml files. A misspelled op, an unknown args key, or a stray top-level key is rejected rather than silently ignored.

Feedback log: --record and monkey image rate

Layer 3 of the MK-5 training-loop design (feedback half). monkey image auto --record runs.jsonl appends one NDJSON line per run, and monkey image rate appends a human verdict joined to that run by the input's blake3 hash. The log is append-only newline-delimited JSON: grep-able and git-able. Durability is best-effort, not transactional (one O_APPEND write per record, not crash-atomic), so a reader should parse line-by-line and skip a malformed trailing line rather than assume every line is valid JSON. --record takes an explicit path; there is no default location.

monkey image auto --record runs.jsonl in.jpg out.jpg
monkey image rate in.jpg --score 4 --notes "edges crisp" --record runs.jsonl
// one run record (input/output are blake3 hex; recipe_sha hashes the recipe source;
// input_path is the file path you passed to `monkey image auto`, so the tuner can replay it)
{"kind":"run","ts_unix_ms":1779376589005,"input":"95f4...","input_path":"/home/you/scans/in.jpg","class":"color-photo","recipe_sha":"012f...","steps":[{"op":"local-contrast","args":{"strength":60,"sigma":0.75,"iterations":1}}],"output":"ccb7...","duration_ms":53}
// a rating, joined to the run above by `input`
{"kind":"rating","ts_unix_ms":1779376589118,"input":"95f4...","score":4,"notes":"edges crisp"}

score must be 1..5. The monkey-tuner binary consumes this log to propose, replay, and promote recipe deltas; see Tuning above.

monkey pdf extract-images

Extract embedded images from a PDF.

monkey pdf extract-images input.pdf --output-dir images/

monkey pdf to-images

Rasterize each page of a PDF to image files via pdftoppm. Requires poppler-utils (pdftoppm) on PATH. Outputs are named <basename>-<N>.<ext>, where basename defaults to the PDF's file stem and N is zero-padded to the page count.

monkey pdf to-images input.pdf out/
monkey pdf to-images input.pdf out/ --format jpeg --dpi 150 --first-page 2 --last-page 5 --basename page

--format is png (default), jpeg, or tiff; --dpi defaults to 300; --first-page / --last-page are 1-indexed and default to the full document.

monkey pdf from-images

Bundle one or more images into a single multi-page PDF. The last path is the output PDF; preceding paths are the input images (at least one input and the output, so two arguments minimum).

monkey pdf from-images page-1.png page-2.png notes.pdf
monkey pdf from-images scan-*.jpg out.pdf --page-size letter

--page-size is auto (default; derives page size from each image's pixel count and DPI), letter, or a4. In auto mode, --dpi overrides the DPI used to derive page size from pixel dimensions; without it, each image's embedded DPI metadata is read, falling back to 300.

monkey noteshrink

Clean up scanned handwritten notes: remove background noise and bleedthrough, reduce to a compact color palette, and optionally combine the results into a single PDF.

monkey noteshrink page-*.jpg --output-dir out/ --pdf notes.pdf

Development

Requires Rust with the 2024 edition (toolchain 1.93+).

just check        # fmt + clippy + tests
just test         # tests only
just build        # release build
just build-static # static musl build

Integration tests

tests/video_convert.rs runs monkey video convert end-to-end against a real ffmpeg. These tests are #[ignore]d so a default cargo test does not require ffmpeg:

cargo test --test video_convert -- --ignored

The input fixture is generated on the fly via ffmpeg's lavfi source, so no binary test assets are committed.

Release

just create-release minor   # or major / hotfix

Bumps Cargo.toml, pushes a release/<tag> branch, and prints a PR link. After the PR merges, the Forgejo workflows tag the release, publish the container image, and upload the static binary automatically.