全文搜索后端：Elasticsearch vs Meilisearch vs Quickwit

起因

要给 web app 加"搜索文章" 功能。SQL LIKE '%word%' 慢 + 不能模糊。
PG 的 tsvector / FTS5 能用但中文分词 + 高级 query 较弱。
专用搜索引擎选哪个？

三个主流选项：

Elasticsearch：业界事实标准，功能最全 + 最重
Meilisearch：Rust 写的"现代轻量"，开箱即用
Quickwit：Rust 写的"日志搜索专用"，对象存储友好

下面对比。

Elasticsearch (8.x)

老牌强者。基于 Lucene。

装

docker run -d -p 9200:9200 \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  -m 2g \
  docker.elastic.co/elasticsearch/elasticsearch:8.15.0

-m 2g 必须（ES Java 默认 1 GB heap + JVM overhead，最低 2 GB RAM）。

索引 + 查询

from elasticsearch import Elasticsearch

es = Elasticsearch('http://localhost:9200')

# 索引文档
es.index(index='articles', id='1', document={
    'title': 'PostgreSQL 全文搜索',
    'body': 'tsvector ts_rank ...',
    'tags': ['db', 'pg'],
    'created_at': '2024-05-24',
})

# 搜索
res = es.search(index='articles', query={
    'bool': {
        'must': [
            {'match': {'body': 'tsvector 全文'}},
        ],
        'filter': [
            {'term': {'tags': 'pg'}},
            {'range': {'created_at': {'gte': '2024-01-01'}}},
        ],
    },
})
for hit in res['hits']['hits']:
    print(hit['_score'], hit['_source']['title'])

优势

极完整：聚合 / 复杂 query / geo / vector
大规模成熟（PB 级集群）
生态广（Logstash / Kibana / Beats）
中文分词通过 IK plugin

劣势

资源吃货（最少 2 GB RAM；生产 8-32 GB / node）
集群运维复杂
API 庞大学习曲线陡
Elastic 公司协议 2021 改为 SSPL（云厂商不爽 fork AWS OpenSearch）

适合：日志 + 复杂搜索 + 已有 ES 经验 / 团队。

Meilisearch (1.x)

2018 年起的新晋。Rust 写。专为"产品内搜索"优化。

装

docker run -d -p 7700:7700 \
  -e MEILI_MASTER_KEY=your-key \
  -v meili-data:/meili_data \
  getmeili/meilisearch:v1.10

200 MB 镜像，启动几秒。

索引 + 查询

import meilisearch

client = meilisearch.Client('http://localhost:7700', 'your-key')
index = client.index('articles')

# 索引
index.add_documents([
    {'id': 1, 'title': 'PostgreSQL 全文搜索', 'body': '...', 'tags': ['db']},
    {'id': 2, 'title': 'Elasticsearch 入门', 'body': '...', 'tags': ['search']},
])

# 默认配置：所有字段都搜
results = index.search('全文搜索')
# {
#   "hits": [{"id": 1, "title": "...", ...}],
#   "processingTimeMs": 3
# }

# 配 filter / sort（需要先标记 filterable / sortable）
index.update_filterable_attributes(['tags'])
index.update_sortable_attributes(['created_at'])

results = index.search('搜索', {
    'filter': 'tags = db',
    'sort': ['created_at:desc'],
    'limit': 20,
})

优势

零配置开箱即用（typo-tolerance / instant search 默认开）
极快：百万级文档 P99 < 50ms
API 极简
资源占用低（150 MB RAM 跑 10w 文档）
内置 admin UI（http://localhost:7700）
支持中文 / 日文等亚洲语言（自动 tokenize）

劣势

不像 ES 那么强大的复杂 query
集群在 v1 是 Cloud-only feature（自托管单节点）
生态相对小
不适合大日志（无 time-series 优化）

适合：电商 / 文档 / 博客 / 内容站的"搜索框"。
对比 ES：90% 用户的"产品内搜索"用 Meilisearch 更省心。

TypoTolerance + Synonyms

index.update_typo_tolerance({
    'enabled': True,
    'minWordSizeForTypos': {'oneTypo': 4, 'twoTypos': 8},
})

index.update_synonyms({
    'js': ['javascript'],
    'k8s': ['kubernetes'],
})

打 "javasrcipt" 也能找到 "javascript"；搜 "js" 同时匹配 "javascript"。

Quickwit

针对日志搜索 优化的现代后端，基于 tantivy（Rust Lucene）。

装

docker run -d -p 7280:7280 -p 7281:7281 \
  quickwit/quickwit:v0.8 run

设计哲学

索引存对象存储（S3 / GCS / local），不依赖本地 SSD
计算 / 存储分离，scale-to-zero
log search 优化：append-only，按时间分片，老数据冷存

用法

# 创建索引
curl -X POST http://localhost:7280/api/v1/indexes -H 'Content-Type: application/yaml' --data '
version: 0.7
index_id: logs
doc_mapping:
  field_mappings:
    - name: timestamp
      type: datetime
      input_formats: ['unix_timestamp']
      fast: true
    - name: level
      type: text
      tokenizer: raw
    - name: message
      type: text
      tokenizer: default
  timestamp_field: timestamp
search_settings:
  default_search_fields: [message]
'

# Ingest logs
curl -X POST http://localhost:7280/api/v1/logs/ingest \
  -H 'Content-Type: application/json' \
  -d '{"timestamp": 1716543210, "level": "ERROR", "message": "DB connection failed"}'

# 查
curl 'http://localhost:7280/api/v1/logs/search?query=ERROR+database&start_timestamp=...&end_timestamp=...'

优势

极便宜（对象存储几 $0.02/GB/月 vs SSD 10-50x）
"无限"保留（S3 不限容量）
查询历史日志快（带时间过滤的 query）

劣势

仅 log / time-series 场景
不适合"产品内搜索"
复杂 query 不如 ES

适合：日志聚合 / 审计 / 任何"时间序列 + 文本搜索"。

决策矩阵

	Elasticsearch	Meilisearch	Quickwit	PG FTS / SQLite FTS5
入门门槛	高	极低	中	低
资源占用	2 GB+	150 MB	1 GB	共享 DB
文档规模	PB 级	千万级	EB 级 (log)	千万级
Query 复杂度	极高	中	中（log 限定）	中
集群 / HA	复杂	单	计算存储分离	跟 DB
实时性	秒	秒	秒	实时
价格	资源贵	便宜	极便宜	0

实战：Meilisearch 集成 Django

# requirements.txt
# meilisearch

import meilisearch

client = meilisearch.Client('http://localhost:7700', settings.MEILI_KEY)
index = client.index('posts')

# Django signal: 文档变更同步到 Meili
from django.db.models.signals import post_save, post_delete

@receiver(post_save, sender=Post)
def index_post(sender, instance, **kwargs):
    index.add_documents([{
        'id': instance.id,
        'title': instance.title,
        'body': instance.body,
        'tags': list(instance.tags.values_list('name', flat=True)),
        'author': instance.author.name,
        'created_at': instance.created_at.timestamp(),
    }])

@receiver(post_delete, sender=Post)
def delete_post(sender, instance, **kwargs):
    index.delete_document(instance.id)

# 搜索 view
def search(request):
    q = request.GET.get('q', '')
    results = index.search(q, {'limit': 20})
    return render(request, 'search.html', {'hits': results['hits']})

部署：Meilisearch 容器 + Django 配 MEILI_URL。

效果：

搜索从 PG ILIKE 几秒 → Meili 50ms
支持 typo / 高亮 / 相关性排序
资源占用比 PG 多 200 MB

踩过的坑

Meilisearch 没 master key：API 任何人能读写。生产必设。
ES heap size：默认 1 GB 不够大数据。Java options 设 -Xms4g -Xmx4g。
重索引慢：document 多了 reindex 几小时。设计时考虑增量 sync。
filterable 字段没声明 → filter 不生效报错。Meilisearch 每加
filter 字段要先 update_filterable_attributes。
数据一致性：DB write success + Meili sync failed → 数据漂移。
重要场景用 outbox pattern：写 DB + outbox table → background
worker 把 outbox sync 到 Meili。

全文搜索后端：Elasticsearch vs Meilisearch vs Quickwit

起因

Elasticsearch (8.x)

装

索引 + 查询

优势

劣势

Meilisearch (1.x)

装

索引 + 查询

优势

劣势

TypoTolerance + Synonyms

Quickwit

装

设计哲学

用法

优势

劣势

决策矩阵

推荐

实战：Meilisearch 集成 Django

踩过的坑