Benchmark
A strong-scaling test measures how render-to-video throughput scales with the number of worker processes. It sweeps:
- worker count — 1, 2, 4, 8, 16;
- figure reuse — with vs without (
reuse_figure_object); and - encoding backend — parallel-video-io's CPU (libx264) and GPU (NVENC) encoders. The GPU backend is benchmarked only when a CUDA device is present.
The animation rendered is
very_complex_animation.py, a 14-subplot
GridSpec figure that is expensive to build — exactly the case where the
setup-once / update-many design pays off.
How to run
pip install -e '.[benchmark]'
python examples/scaling_test.py # full sweep
python examples/scaling_test.py --quick # small, fast smoke run
Results are written to examples/output/scaling_test/ (results.csv,
results.json, and the interactive figure embedded below as
scaling_graph.html).
Result
Speedup is normalised to the serial (one worker), no-reuse, CPU-encode baseline, so the curves capture both the parallel speedup and the extra gains from figure reuse and GPU encoding. The dashed black line is ideal (zero-overhead) linear scaling.
The left-most dark-blue point is serial processing with figure reuse. Points at 2+ workers show the parallel speedup; reuse curves (darker) sit above the no-reuse curves (lighter) because the expensive layout is built once per worker instead of once per frame.