hyperfine：替代 time 做命令行 benchmark（自动统计 + 多命令对比）

起因

我想知道两种压缩方案哪个快——gzip -9 vs zstd -19 vs xz -9。
传统做法：

time gzip -9 < big.txt > /dev/null
# real 0m4.231s
time zstd -19 < big.txt > /dev/null
# real 0m3.876s

跑一次的数字根本不可信（系统抖动 5-10%）。要科学就得跑 N 次取平均，
还要算标准差，手写麻烦。

hyperfine 是 Rust 写的 benchmark 工具，自动做 warmup + 多次运行 +
统计 + 多命令对比。

安装

# Debian / Ubuntu
sudo apt install hyperfine

# 或者
brew install hyperfine
cargo install hyperfine

hyperfine --version

单命令 benchmark

hyperfine 'gzip -9 < big.txt > /dev/null'

输出：

Benchmark 1: gzip -9 < big.txt > /dev/null
  Time (mean ± σ):      4.187 s ±  0.082 s    [User: 4.135 s, System: 0.041 s]
  Range (min … max):    4.073 s …  4.298 s    10 runs

默认跑 10 次（不够稳定会自动加），算均值 + 标准差 + 范围。

多命令对比（最实用）

hyperfine \
  'gzip -9 < big.txt > /dev/null' \
  'zstd -19 < big.txt > /dev/null' \
  'xz -9 < big.txt > /dev/null'

输出：

Benchmark 1: gzip -9 < big.txt > /dev/null
  Time (mean ± σ):      4.187 s ±  0.082 s

Benchmark 2: zstd -19 < big.txt > /dev/null
  Time (mean ± σ):      3.421 s ±  0.064 s

Benchmark 3: xz -9 < big.txt > /dev/null
  Time (mean ± σ):     15.842 s ±  0.214 s

Summary
  'zstd -19 < big.txt > /dev/null' ran
    1.22 ± 0.03 times faster than 'gzip -9 < big.txt > /dev/null'
    4.63 ± 0.10 times faster than 'xz -9 < big.txt > /dev/null'

最后的 Summary 直接告诉你倍数差距。

参数扫描

hyperfine --parameter-list level 1,3,5,9,19 \
  'zstd -{level} < big.txt > /dev/null'

会跑 5 个 benchmark（level=1, 3, 5, 9, 19），出对比表 + 倍数。

数值扫描：

hyperfine --parameter-scan threads 1 16 \
  'cargo build --jobs {threads}'

threads=1, 2, 3, ..., 16 各跑一次。

warmup

hyperfine --warmup 3 'some-command'

先跑 3 次"不计入统计"——让 OS page cache 热起来 / JIT 预编译。
对涉及磁盘 IO 或冷启动的命令 essential。

准备 / 清理钩子

hyperfine \
  --prepare 'sync && echo 3 | sudo tee /proc/sys/vm/drop_caches' \
  --cleanup 'rm /tmp/out' \
  'cp big.file /tmp/out'

--prepare 每次 run 之前跑（这里是清 page cache，模拟冷启动）。
--cleanup 每次 run 之后跑。

导出原始数据

hyperfine --export-json bench.json --export-markdown bench.md cmd1 cmd2

# 或 CSV
hyperfine --export-csv bench.csv cmd1 cmd2

JSON 给脚本分析；Markdown 直接贴博客 / PR 描述。

命令名

输出里的 Benchmark 1 不好看，自己命名：

hyperfine \
  -n 'old algorithm' 'old_cmd args' \
  -n 'new algorithm' 'new_cmd args'

Summary
  'new algorithm' ran
    1.34 ± 0.05 times faster than 'old algorithm'

PR 里直接贴这个 summary 是最好的"性能改进"证据。

在 CI 跑 benchmark

# .github/workflows/bench.yml
- name: Benchmark
  run: |
    hyperfine --warmup 3 --runs 20 \
      --export-markdown bench.md \
      'cargo build --release' \
      './before-binary' \
      './after-binary'

- name: Comment PR
  uses: peter-evans/create-or-update-comment@v4
  with:
    issue-number: ${{ github.event.pull_request.number }}
    body-path: bench.md

每次 PR 自动跑 benchmark + 评论到 PR。

与 perf / time / Pythontimeit 对比

time：粗糙，单次运行
/usr/bin/time -v：详细但单次，还得自己算多次
Pythontimeit：仅 Python 函数级
hyperfine：黑盒命令多次运行 + 统计

它们解决不同问题：

工具	适合
hyperfine	"我的脚本快了吗" / "两个命令哪个快"
perf	"函数 X 里哪行慢"
flamegraph	"整个程序时间花在哪"
timeit	"Python 这个表达式多快"

高级：show-output

默认 hyperfine 把 stdout/stderr 重定向到 /dev/null（避免输出 IO 影响）。
debug 时打开：

hyperfine --show-output 'cmd'

限制时间

hyperfine --time-unit second --warmup 5 --min-runs 10 --max-runs 100 \
  'some-command'

--min-runs 保证统计意义，--max-runs 防止单次太慢导致总时间炸。

效果

几秒钟解决 "A vs B 哪个快" 的争论，给出 ± 误差
PR 里贴 hyperfine summary 比口头"快了一些"有说服力 100 倍
CI 集成后性能回归被自动捕捉
团队选型（algorithm / lib / 构建工具）有了客观依据

踩过的坑

命令本身极快（< 1ms）：hyperfine overhead 比命令还大。
测纳秒级用 criterion（Rust 库）或者 Python timeit。
第一次跑 IO 命令明显慢：page cache 没热。--warmup 3 或者
--prepare 'sync && drop_caches' 看你想测"warm" 还是"cold"。
shell expansion 不一致：
```bash
# ❌ shell 把 .txt 展开后传给 hyperfine，第一次有效，后续可能不一致
hyperfine 'cat .txt > /dev/null'

# ✅ 用 sh -c 让命令在子 shell 里展开
hyperfine 'sh -c "cat *.txt > /dev/null"'
```

CI runner 抖动大：共享 runner 受其它 job 影响。专用 self-hosted
runner 或者用 baseline 算相对差异，不看绝对数。
--prepare 失败不退出：prepare 命令出错 hyperfine 不知道。
prepare 命令里 set -e 自保。