hyperfine：CLI 命令 benchmark 工具

起因

想知道哪个命令更快：

find . -name '*.py' vs fd -e py
cat file | grep foo vs rg foo file
两个不同 build 工具

老办法 time cmd：跑一次结果有噪声，不科学。

hyperfine（Rust）专门 benchmark：自动多次跑 + warmup + 统计 + 对比 + 优雅输出。

装

brew install hyperfine
cargo install hyperfine

基本

$ hyperfine 'find . -name "*.py"'

Benchmark 1: find . -name "*.py"
  Time (mean ± σ):     345.2 ms ±  12.3 ms    [User: 90.1 ms, System: 245.6 ms]
  Range (min … max):   330.1 ms … 365.8 ms    10 runs

自动跑 10 次 + mean / σ / range。

对比两命令

$ hyperfine 'find . -name "*.py"' 'fd -e py'

Benchmark 1: find . -name "*.py"
  Time (mean ± σ):     345.2 ms ±  12.3 ms

Benchmark 2: fd -e py
  Time (mean ± σ):      62.1 ms ±   3.4 ms

Summary
  'fd -e py' ran
    5.56 ± 0.36 times faster than 'find . -name "*.py"'

直接看 fd 比 find 5.5x 快。

warmup

第一次跑可能 cold cache：

hyperfine --warmup 3 'cmd'

跑 3 次 warmup（不计时）+ 后面 10 次计时。
避免 disk cache miss 干扰。

prepare / cleanup

hyperfine --prepare 'sync; echo 3 > /proc/sys/vm/drop_caches' 'cmd'

每次 run 前 prepare（drop cache 测 cold path）。
或者每次后 cleanup（删 output file 之类）。

hyperfine --cleanup 'rm output.txt' 'process input > output.txt'

参数 sweep

hyperfine --parameter-list n 100,1000,10000 'sleep 0.{n}'

跑 sleep 0.100 / 0.1000 / 0.10000 各 10 次 → 输出 sweep 表格。

例：测不同 thread 数：

hyperfine --parameter-scan threads 1 8 'make -j{threads}'

1-8 thread 各跑 make → 看 sweet spot。

export 数据

hyperfine 'cmd1' 'cmd2' --export-markdown result.md
hyperfine 'cmd1' 'cmd2' --export-json result.json
hyperfine 'cmd1' 'cmd2' --export-csv result.csv

Markdown 直接贴 PR 比较 before/after。

真实 case 1：CI 测试加速

CI 跑 pytest 改 plugin：

hyperfine --warmup 1 \
    'pytest -p no:cacheprovider' \
    'pytest' \
    'pytest -n 4' \
    'pytest -n auto'

	mean
pytest (no cache)	45s
pytest (cache)	38s
pytest -n 4	14s
pytest -n auto (8 cores)	9s

加 -n auto = 5x 加速 → CI yaml 一行改动收益巨大。

真实 case 2：build 工具对比

hyperfine --warmup 2 --prepare 'rm -rf node_modules dist' \
    'npm install && npm run build' \
    'pnpm install && pnpm build' \
    'bun install && bun run build'

得出"对此项目 pnpm 比 npm 快 2x"。
凭印象不如实测。

ignore failure

hyperfine --ignore-failure 'might_fail_cmd'

某些工具偶尔 fail 但你想 measure 成功的部分。

数据可视化

hyperfine 'cmd1' 'cmd2' --export-json result.json
python -m hyperfine.plot.histogram result.json

直方图看 distribution（不只是 mean）。
发现"通常 1s 偶尔 10s" 的长尾。

warmup 不够准

hyperfine --runs 100 'cmd'

跑 100 次 → CLT 收敛 → 更准 mean / σ。
缺点：慢命令做不到。

与 perf / criterion 对比

perf stat cmd：Linux 性能 counter（cache miss / branch misses 等），更深
criterion（Rust lib）：微基准，函数级别
hyperfine：命令级别比较

hyperfine 是"对比工具 / shell 命令"的瑞士军刀。
深入 profile 用 perf。

与 ab / wrk 对比

ab / wrk 测 HTTP server（并发 + RPS）。
hyperfine 测单 invocation 时间。

# 测 server 响应（应该用 wrk）
wrk -t 4 -c 100 -d 30s http://localhost:8000/

# 测命令耗时
hyperfine 'curl http://localhost:8000/'

不同用途。

我的常用 alias

alias bench='hyperfine --warmup 2'
alias bench-cold='hyperfine --warmup 0 --prepare "sync; echo 3 | sudo tee /proc/sys/vm/drop_caches"'

测优化前后效果：

git stash         # 老代码
bench 'cmd'

git stash pop     # 新代码
bench 'cmd'

或者一次同时跑：

git stash
hyperfine 'cmd' --export-json before.json
git stash pop
hyperfine 'cmd' --export-json after.json
hyperfine-compare before.json after.json

踩过的坑

stdout 输出大：命令 print 几 MB → buffer 影响 measurement。
> /dev/null redirect。
shell startup overhead：测 sub-ms 命令 → shell fork 本身就
几 ms。极短命令 hyperfine 不准。
multi-core 干扰：测时其它进程跑 → 数字飘。taskset 0x1 hyperfine ... 绑核 isolation。
测试 build 重复 work：第二次 build incremental → 比第一次快。
--prepare 'rm -rf build/' 公平。
system noise：disk encryption / antivirus / Time Machine 等
背景影响。多次跑 + 看 σ。