Cilium：基于 eBPF 的 K8s 网络，告别 iptables

起因

K8s 默认 CNI（flannel / calico iptables 模式）问题：

服务多了 iptables 规则数万条 → packet 经过 N rules 慢
network policy 通过 iptables 模拟 → 性能差 + 难调试
没原生 L7 (HTTP) policy
跨节点流量 encap (VXLAN) 开销

Cilium 用 eBPF 在内核态做：

pod 间通信（直接路由 / VXLAN / WireGuard）
network policy（L3/4/7）
service LB 替代 kube-proxy
observability（hubble）
mTLS

eBPF 不走 iptables 链 → 性能高 + 灵活。

装 (kind 本地)

# kind cluster 不带默认 CNI
kind create cluster --config kind-config.yaml

cat > kind-config.yaml <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  disableDefaultCNI: true
  kubeProxyMode: none           # cilium 替代
EOF

# 装 cilium
cilium install --version 1.16.0
cilium status

网络 policy（L3/L4）

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: api-policy
  namespace: app
spec:
  endpointSelector:
    matchLabels:
      app: api
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: frontend
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP

api pod 只接受 frontend pod 的 8080 流量。
其它源（包括同 namespace 别的 pod）阻断。

L7 policy (HTTP)

spec:
  endpointSelector:
    matchLabels:
      app: api
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: frontend
      toPorts:
        - ports:
            - port: "8080"
          rules:
            http:
              - method: "GET"
                path: "/api/users/.*"
              - method: "POST"
                path: "/api/login"

frontend 只能 GET /api/users/* 和 POST /api/login，其它 404。
传统 iptables 做不到。

hubble (observability)

cilium hubble enable

# CLI
hubble observe --namespace=app
TIMESTAMP    SOURCE                   DESTINATION         TYPE   VERDICT
12:34:56     frontend-xxx:34521       api-yyy:8080        L7     ALLOWED (GET /api/users/42)
12:34:57     frontend-xxx:34522       api-yyy:8080        L7     DROPPED (POST /admin/delete)

每 packet 看 source / dest / verdict。
debug network 神器。

hubble UI 浏览器看：

cilium hubble ui

实时 service map + flow log。

kube-proxy 替代

# install 时 --set kubeProxyReplacement=true
helm install cilium ... --set kubeProxyReplacement=true

Cilium 用 eBPF 实现 service routing → 删 kube-proxy → 删几万条 iptables。

性能：

	iptables	Cilium eBPF
service connect 延迟	50 μs	5 μs
pod-to-pod throughput	7 Gbps	9.5 Gbps
iptables rule 数	几万	0

大 cluster 显著。

跨节点流量

模式选：

VXLAN：兼容性最好（默认）
Geneve：VXLAN 替代
直接路由：节点同 L2 网（性能最好）
WireGuard：跨 region 加密
IPsec：类似

我们 prod 用直接路由（节点同 VPC）+ WireGuard 用于跨 region。

bandwidth manager

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/egress-bandwidth: "10M"

cilium 用 eBPF 限 pod 出口带宽 → 防 noisy neighbor。

clustermesh (多 cluster)

cilium clustermesh enable --context cluster1
cilium clustermesh connect --context cluster1 --destination-context cluster2

两个 cluster 互通 service：

# Service on cluster2
metadata:
  annotations:
    service.cilium.io/global: "true"

cluster1 的 pod 访问该 service → cilium 跨 cluster 路由。
无需 service mesh / API gateway。

mTLS (cilium 1.14+)

spec:
  endpointSelector:
    matchLabels: { app: api }
  ingress:
    - fromEndpoints:
        - matchLabels: { app: frontend }
      authentication:
        mode: required        # 强制 mTLS

cilium 用 SPIFFE 标识自动 mTLS。
比 Istio 简单（不需要 sidecar）。

性能

CNI throughput benchmark（10 Gbps 网络）：

	Pod-to-Pod
flannel VXLAN	6 Gbps
calico iptables	7 Gbps
calico eBPF	9 Gbps
Cilium (native routing)	9.5 Gbps

cilium 几乎跑满。

与 calico eBPF 对比

	Cilium	Calico (eBPF mode)
L7 policy	✅	弱
Hubble observability	✅ 强	基本
clustermesh	✅	弱
mTLS	✅	计划中
复杂度	高	中
生态	大（CNCF）	大

两者都好。cilium 是更现代 / 功能多。calico 更老更广。

与 Istio 对比

	Cilium	Istio
层次	CNI + L7 policy	service mesh (L7+)
sidecar	不需要	envoy sidecar
资源开销	低 (eBPF kernel)	高 (sidecar 每 pod)
mTLS	✅	✅
traffic policy	中	强 (canary / mirror 等)
复杂度	中	高

Cilium Service Mesh（cilium 1.12+）能部分替代 Istio。
重 traffic management 还是 Istio。

实战 case：从 calico 迁 cilium

我们 prod cluster 用 calico 几年，问题：

network policy 调试痛苦
没 L7 policy
缺 observability

迁 cilium:

装 cilium + 关 calico（drain 一台一台测）
policy 翻译（calico NetworkPolicy → CiliumNetworkPolicy，多数语法兼容）
启用 hubble + kube-proxy replacement
启用 mTLS（替代 Istio mTLS）

效果：

service-to-service latency 平均 -30%（无 iptables）
一次 debug "为啥 A 连不上 B" 从一上午 → 5 分钟（hubble 直接看 verdict）
节点 CPU -10%（kube-proxy 不跑了）

迁移挑战：

L7 policy 需要 sidecar (envoy embed in cilium)，资源 +20%
WireGuard 跨 region 需要内核 5.6+

监控

prometheus metrics 几百个：

cilium_drop_count_total
cilium_policy_l7_total
cilium_endpoint_state

Grafana 官方 dashboard 拉来直接看。

踩过的坑

内核版本：很多 eBPF feature 要内核 5.10+。Ubuntu 20.04 默认
5.4 → 升级。
kube-proxy replacement + HostNetwork：HostNetwork pod 跟
cilium service 不交互 → 部分场景 fall back iptables。
policy 默认 allow：没 CiliumNetworkPolicy 时 default 允许所有
流量。要 default-deny → 加 catch-all policy。
hubble retention：默认内存只存 4096 flow → 大 cluster 看不到
历史。配 hubble-export 推 ELK。
multi-cluster identity 冲突：clustermesh 时不同 cluster pod
labels 同 → 路由错。namespace / label 命名规范。