streamlit vs gradio：5 分钟搭一个内部数据 app

起因

数据团队常见需求：

给 PM / 业务做个交互式 dashboard
给一个 ML model 一个 demo UI
内部工具（查 user / 跑 cohort 分析）

完整前端（React + API）几天工作量。
streamlit / gradio：纯 Python 描述 UI，5 分钟出能用 app。

streamlit

# app.py
import streamlit as st
import pandas as pd

st.title('Sales Dashboard')

uploaded = st.file_uploader('上传 CSV', type='csv')
if uploaded:
    df = pd.read_csv(uploaded)
    st.dataframe(df)

    country = st.selectbox('国家', df.country.unique())
    filtered = df[df.country == country]

    st.bar_chart(filtered.groupby('product').amount.sum())
    st.metric('总销售', f"${filtered.amount.sum():,.0f}")

streamlit run app.py
# 浏览器自动开 localhost:8501

整个 app 一个 .py，自顶向下 imperative。
每次 widget 交互 → script 从头重跑（reactive）。

gradio

# app.py
import gradio as gr

def predict(image):
    # 跑 model
    return label, score

demo = gr.Interface(
    fn=predict,
    inputs=gr.Image(),
    outputs=[gr.Label(), gr.Number()],
    title='Image Classifier',
)
demo.launch()

gradio 更 function-centric：定义 input/output → 自动 wrap UI。

两个的定位

streamlit：通用 dashboard / 内部工具（多个 widget 交互）
gradio：ML model demo（input → 跑 → output）

streamlit 是 layout + state-aware app；gradio 是 function demo wrapper。

streamlit 细节

session state

if 'count' not in st.session_state:
    st.session_state.count = 0

if st.button('+1'):
    st.session_state.count += 1

st.write(f'count: {st.session_state.count}')

每次重跑 script，session_state 持久（同浏览器 session）。

cache

@st.cache_data
def load_data(path):
    return pd.read_csv(path)        # 慢操作

df = load_data('big.csv')           # 重 invocation cached

@st.cache_data 内容缓存；@st.cache_resource 资源（model）缓存。

不 cache 的话每次按 widget 都重读 CSV → 慢。

多页面

my_app/
├── app.py                # 主页
├── pages/
│   ├── 1_📊_Dashboard.py
│   ├── 2_🔍_Search.py
│   └── 3_⚙️_Settings.py

streamlit run app.py 左侧自动有 page 切换。

chart 库

streamlit 内置：

st.line_chart, st.bar_chart, st.area_chart（轻量）
st.pyplot() matplotlib
st.plotly_chart() plotly
st.altair_chart() altair
st.vega_lite_chart()

任意 chart 都能塞。

gradio 细节

复合 input/output

demo = gr.Interface(
    fn=lambda txt, slider: txt * slider,
    inputs=[
        gr.Textbox(label='文本'),
        gr.Slider(1, 10, step=1),
    ],
    outputs=gr.Textbox(),
)

Blocks (复杂布局)

with gr.Blocks() as demo:
    gr.Markdown('# My App')
    with gr.Row():
        with gr.Column():
            inp = gr.Textbox()
            btn = gr.Button('Run')
        with gr.Column():
            out = gr.Textbox()

    btn.click(fn=process, inputs=inp, outputs=out)

demo.launch()

Blocks 更接近 streamlit 灵活度。

chat interface

def respond(message, history):
    return f"echo: {message}"

gr.ChatInterface(respond).launch()

3 行起 LLM chat UI。huggingface space 上 90% chat demo 用 gradio。

性能 / scale

streamlit：每个 user 连接独立 session，但跑同一进程；重 compute
会卡其它 user
gradio：queue 系统，多个 request 排队跑

LLM demo 用 gradio（queue 默认）；多 user dashboard 用 streamlit。

部署

streamlit cloud / hugging face

streamlit cloud free tier：

连 GitHub repo
推 → 自动部署到 streamlit.app domain

hugging face space：

同样思路，免费 CPU
gradio / streamlit 都支持

自托管

FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
EXPOSE 8501
CMD streamlit run app.py --server.address 0.0.0.0

放 nginx 反代 + auth → 内部工具。

鉴权

两者都没原生用户系统。
方案：

nginx + basic auth
Cloudflare Access（zero-trust）
streamlit-authenticator package
OAuth proxy (oauth2-proxy)

内部工具我用 Cloudflare Access：5 分钟配，免维护。

与 dash / panel 对比

Dash（Plotly）：基于 Flask + React，更灵活但写得多
Panel（HoloViz）：科学计算友好，多 backend
Reflex（前 Pynecone）：写 Python 编 React，UI 强

streamlit / gradio 简单优先；dash / reflex 复杂应用。

我的选择

数据 dashboard → streamlit
ML 模型 demo → gradio
内部 admin tool → streamlit
需要复杂前端 → 接 SPA + FastAPI

case：客户演示工具

要给客户演示一个文本摘要 LLM：

import gradio as gr
from transformers import pipeline

summarizer = pipeline('summarization')

def summarize(text):
    return summarizer(text, max_length=100)[0]['summary_text']

gr.Interface(
    fn=summarize,
    inputs=gr.Textbox(lines=10, placeholder='粘贴长文'),
    outputs=gr.Textbox(label='摘要'),
    title='LLM 摘要 Demo',
    examples=[['Long text 1...'], ['Long text 2...']],
).launch(share=True)        # share=True 给临时公网 URL（gradio.live）

30 秒部署 + URL 发给客户 + 客户能直接试。比 PPT 强 100 倍。

share=True 临时 tunnel 72 小时有效。

内部 case：cohort 分析工具

import streamlit as st
import duckdb

st.title('User Cohort Analysis')

date_range = st.date_input('日期范围', value=(start, end))
group_by = st.selectbox('Group by', ['country', 'plan', 'source'])

@st.cache_data
def query(date_range, group_by):
    return duckdb.sql(f"""
        SELECT {group_by}, DATE_TRUNC('week', signup_date) AS cohort,
               COUNT(*) AS users
        FROM read_parquet('s3://.../users/*.parquet')
        WHERE signup_date BETWEEN '{date_range[0]}' AND '{date_range[1]}'
        GROUP BY 1, 2
    """).df()

df = query(date_range, group_by)
st.plotly_chart(px.line(df, x='cohort', y='users', color=group_by))
st.dataframe(df)

业务自己改 dropdown 看不同维度。
原本 BA 找数据团队跑 → 改成业务自助。

踩过的坑

state 重置：streamlit 每次交互重跑 script。耗时 op 没
cache → 卡。
gradio queue 默认关：高 concurrent 时阻塞。demo.queue()
打开。
streamlit 多 tab：同 user 多 tab → state 不共享。
st.session_state 是单 session 单 tab。
share=True 安全：gradio share 链接公网，没 auth。给 demo 用，
不要放 secret data。
upload size 限制：streamlit 默认 200 MB；要更大改
--server.maxUploadSize=1000。