Litestream：把 SQLite 实时复制到 S3（小项目灾备）

后 后端工程纪要官方@backend_jot 2026-05-14 18:23

起因

个人 / 小项目用 SQLite：

单文件，零运维
性能足够（WAL 模式撑万 QPS 读 + 千 QPS 写）
备份 = cp 文件

缺点：单服务器挂了 → 服务停 + 上次备份后的数据丢。

Litestream（Ben Johnson）：实时把 SQLite WAL 流式复制到 S3 / SFTP /
其它。无应用改动 + 几乎实时（秒级 RPO）。

装

# binary
wget https://github.com/benbjohnson/litestream/releases/download/v0.3.13/litestream-v0.3.13-linux-amd64.tar.gz
tar -xf litestream-*.tar.gz
sudo mv litestream /usr/local/bin/

配

/etc/litestream.yml：

dbs:
  - path: /srv/myapp/db.sqlite3
    replicas:
      - type: s3
        bucket: my-backups
        path: myapp/db
        region: us-east-1
        access-key-id: ${AWS_ACCESS_KEY_ID}
        secret-access-key: ${AWS_SECRET_ACCESS_KEY}
        retention: 720h          # 30 day

启动：

litestream replicate -config /etc/litestream.yml

systemd service

# /etc/systemd/system/litestream.service
[Unit]
Description=Litestream
After=network.target

[Service]
ExecStart=/usr/local/bin/litestream replicate -config /etc/litestream.yml
Restart=always
EnvironmentFile=/etc/litestream.env

[Install]
WantedBy=multi-user.target

sudo systemctl enable --now litestream

怎么工作

SQLite WAL 模式下，写操作先到 -wal 文件，定期 checkpoint 到主 db。

Litestream：

open snapshot baseline 上传 S3
tail WAL → 流式上传增量 (每秒)
WAL 自动 checkpoint 时 → 新 snapshot

S3 存：

myapp/db/
  snapshots/
    20250314T100000.db.lz4
    20250320T100000.db.lz4
  wal/
    20250314T100000_000001.wal.lz4
    ...

可以 restore 到任意时间点（snapshot + replay WAL）。

restore

# 最新
litestream restore -o /tmp/restored.db -config /etc/litestream.yml /srv/myapp/db.sqlite3

# 时间点
litestream restore -o /tmp/restored.db \
    -timestamp '2025-03-14T10:00:00Z' \
    -config /etc/litestream.yml \
    /srv/myapp/db.sqlite3

5 分钟级别 RTO（取决于 DB size）。

应用透明

litestream 不改 SQLite 行为。应用照常 read/write db.sqlite3。
litestream 是 sidecar 进程，监 WAL。

应用不知有 litestream 存在 → 0 风险。

性能影响

litestream 跟 SQLite 共享 disk IO。
小 DB（< 1 GB）+ 适度写（< 100 wps）：几乎无感。
高写 throughput → litestream 上传 bandwidth 跟得上吗？

我们 production：

1 GB SQLite
50 wps
litestream + S3：CPU < 1%，network 几 KB/s 平均
WAL upload latency P99 < 2s

不替代 HA

litestream 是 disaster recovery（备份 + restore），不是 HA。

HA 需要：

多 server / 多 region 同时活
failover 自动

SQLite + litestream 是"主备"模式：主挂了，备份机 restore + 起来 = 5 分钟
downtime。

真要 HA → Postgres + replication / managed DB。

read replica

litestream 0.4+ 实验性 read replica：

dbs:
  - path: /srv/myapp/db.sqlite3
    replicas:
      - type: s3
        bucket: ...

# 在别 server 上
litestream replicate \
    -read-only \
    -config replica.yml

从 S3 拉 + apply WAL → 本地 read-only SQLite。
读扩展可行（写仍主单机）。

价格

S3 storage：1 GB DB + 1 month WAL = 几 GB → $0.10/月。
S3 PUT：每秒一次 WAL upload → 86400 PUT/day × $0.005/1k = $0.40/day。

总：几刀/月。比 RDS db.t3.micro（$15/月）便宜很多。

SFTP / GCS / Azure

不一定 S3：

replicas:
  - type: sftp
    host: backup.example.com
    user: backup
    path: /backups/myapp
    key-path: ~/.ssh/backup_key

或 GCS:

replicas:
  - type: gcs
    bucket: my-bucket
    path: myapp/db

与 PG / MySQL 对比

	SQLite + Litestream	Postgres
写 QPS	千级	万级
多 reader	是	是
多 writer	单	是
HA	主备（5min downtime）	streaming replica
运维	极简	中
备份	litestream	pg_dump / WAL-G

简单 app + 单 server → SQLite + litestream。
多 server / 高 throughput / HA 必须 → Postgres。

真实部署

我个人项目 / 小 SaaS（< 1k DAU）：

VPS $5/月（Hetzner）
Django + SQLite + litestream → S3
nginx reverse proxy
Cloudflare CDN free

总成本 < $10/月。
db.sqlite3 + WAL 上 S3 自动 5 分钟内一份 (snapshot interval)。
服务器爆炸 → 新 VPS + restore + 部署，半小时上线。
SLA 不是 99.99% 但 99% 易达。

与 cron + cp 对比

cp db.sqlite3 /backup/db-$(date +%F).sqlite3

简单但：

间隔大（每日）→ RPO 一天
没 PITR
cp 时 WAL 可能不一致

litestream RPO 秒级 + PITR + WAL consistent。

监控

litestream 暴露 prometheus metrics：

addr: ":9090"

litestream_replica_position_bytes
litestream_replica_last_sync_seconds

报警：> 60s 没 sync。

踩过的坑

WAL 没启用：PRAGMA journal_mode=WAL; 必须。不是 WAL litestream
不能 tail。
multi-process write 麻烦：SQLite 多进程写有限制。
只让一个进程写 → litestream tail 那 WAL。
DB 删了：手动 rm db.sqlite3 → litestream 看到删，但 S3 上仍
有数据。restore 即可。但小心 --full-resync 误操作覆盖 backup。
快速 restore 慢：大 DB（10 GB+）restore 几分钟（下 snapshot
replay WAL）。RTO 不是 instant。
monitor 缺失：litestream 进程死了，应用照常跑没人知道 →
备份悄无声息断。systemd Restart=always + prometheus monitor。

精确评价共 0 人评价

可复现性

可复现 · 0 不可复现 · 0

文风

文风流畅 · 0 文风晦涩 · 0

立场

支持 · 0 反对 · 0

登录后即可对本帖作出评价。

评论区 0 条 · 所有人可在此交流

登录后参与评论。

还没有评论，来说两句。