perf(image): parallelize per-pixel kernels with rayon (MK-3) #22
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "perf/parallelize-image-kernels"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Bring rayon-based row parallelism to every per-pixel kernel in the image module. On a 32-core dev box
monkey image smooth-median --radius 7 --iterations 4on a 4000x3000 plasma input drops from 67.5 s to 5.0 s (13.5x speedup) and the output is byte-identical to pre-change main.Shared dispatch helper in
src/image/kernel.rs:par_row_map(w, h, f)andpar_row_map_chunked(w, h, chunk_rows, f)route aFn(x, y) -> [f32; 3] + Syncclosure across rayon-owned row strips.chunk_rowsis configurable (defaults to 1) so callers can amortise scheduling overhead on small images.Per-pixel kernels converted to
par_row_map:kernel.rs:gaussian_blur(both passes),gaussian_blur_xy(both passes),grayscale.filters.rs:smooth_mean_curvature,anti_alias,despeckle,sharpen_tones,stamp.magick_ops.rs:shave,downsample_for_detection,rotate_bilinear,colors(NeuQuant lookup).Per-pixel kernels that stream over
out.datadirectly usepar_iter_mut()zipped with input slices:local_contrast,sharp_abstractunsharp pass,constrained_sharpen,blend::boost_screen,blend::vivid_screen_blend.median_filterusespar_chunks_mutover output rows so each rayon task gets a thread-localVec<f32>window scratch.contrast_stretchbuilds a parallel histogram viapar_chunks(...).fold(...).reduce(...)then runs the channel remap withpar_iter_mut().enumerate().detect_skew_angleevaluates the 0.5° candidate angles in parallel withpar_iter().map(projection_variance).reduce(...). The reducer matches the sequential strictly-greater tie-break (prefers the lower angle on equal scores).diff::diffparallelizes the per-pixel comparison over the raw RGBA byte buffer.Out of scope (left alone):
bilateral_filter(MK-2) already has bespoke parallel chunked dispatch with LUTs and the interior fast path.smooth_bilateral,smooth_median,moire_removal,local_contrast,sharp_abstract,smooth_mean_curvature.density(pure I/O).No CLI surface change. No new dependency (
rayonalready pulled in by MK-2).#MK-3
Test plan
just check(fmt, clippy, build, tests, docker compile check)smooth-median --radius 7 --iterations 4on 4000x3000: 67.5 s -> 5.0 s, byte-identical viacmp