Skip to content

场景 4: 长对话管理

模块:Trimming(消息裁剪)+ Summarization(对话摘要)优先级:🟡 P2(中)业务价值:成本优化,避免超出上下文限制

一、业务背景

1.1 当前问题

随着对话增长,会出现以下问题:

具体问题:

  • GPT-4o 上下文窗口 128K tokens,长对话容易超限
  • 每次请求携带全部历史,Token 消耗大
  • 旧消息相关性低,但仍占用上下文

1.2 解决方案对比


二、策略设计

2.1 智能管理策略

2.2 触发条件

模型类型上下文窗口建议阈值摘要触发
GPT-4o128K50 条消息30 条消息
GPT-4o-mini128K40 条消息25 条消息
Claude 3.5200K60 条消息35 条消息
Gemini Pro32K20 条消息15 条消息

2.3 摘要内容结构

对话摘要结构:
┌───────────────────────────────────────────────┐
│ 📝 对话摘要                                     │
├───────────────────────────────────────────────┤
│ 【主题】Python 异步编程讨论                      │
│                                               │
│ 【关键决策】                                    │
│ • 选择使用 asyncio 而非 threading              │
│ • 确定使用 aiohttp 作为 HTTP 客户端             │
│                                               │
│ 【用户偏好】                                    │
│ • 偏好简洁的代码示例                            │
│ • 希望包含中文注释                              │
│                                               │
│ 【待办事项】                                    │
│ • 研究异步数据库连接池                          │
│ • 测试并发性能                                  │
│                                               │
│ 【时间线】                                      │
│ 1. 讨论了异步编程基础                           │
│ 2. 对比了 asyncio vs threading                 │
│ 3. 给出了 aiohttp 示例                         │
│ 4. 讨论了错误处理最佳实践                        │
└───────────────────────────────────────────────┘

三、代码实现

3.1 对话摘要服务

创建文件: services/conversation_summarizer.py

python
"""对话摘要服务

智能管理长对话,通过摘要保留关键信息。
"""
from typing import List, Optional, Dict, Any
from dataclasses import dataclass
import os
import logging
from datetime import datetime

from langchain_core.messages import BaseMessage, HumanMessage, AIMessage, SystemMessage
from langchain_openai import ChatOpenAI

logger = logging.getLogger(__name__)


@dataclass
class ConversationSummary:
    """对话摘要"""
    topics: List[str]           # 讨论主题
    decisions: List[str]        # 关键决策
    user_preferences: List[str] # 用户偏好
    action_items: List[str]     # 待办事项
    timeline: List[str]         # 时间线要点
    raw_summary: str            # 原始摘要文本
    created_at: datetime
    updated_at: datetime


class ConversationSummarizer:
    """对话摘要管理器"""

    SUMMARY_PROMPT = """请分析以下对话,生成一个结构化的摘要。

对话内容:
{conversation}

请按以下格式输出 JSON:
{{
    "topics": ["主题1", "主题2"],
    "decisions": ["决策1", "决策2"],
    "user_preferences": ["偏好1", "偏好2"],
    "action_items": ["待办1", "待办2"],
    "timeline": ["要点1", "要点2", "要点3"],
    "summary": "一段完整的摘要文字"
}}"""

    UPDATE_PROMPT = """基于现有摘要和新对话内容,更新摘要。

现有摘要:
{existing_summary}

新对话内容:
{new_messages}

请更新摘要,保持 JSON 格式。"""

    def __init__(
        self,
        max_messages: int = 30,
        keep_recent: int = 10,
        summary_model: Optional[str] = None
    ):
        """
        初始化摘要器

        Args:
            max_messages: 触发摘要的消息数量阈值
            keep_recent: 保留的最近消息数量
            summary_model: 用于生成摘要的模型
        """
        self.max_messages = max_messages
        self.keep_recent = keep_recent
        self.summary_model = summary_model or "openai/gpt-4o-mini"

        self._summaries: Dict[str, ConversationSummary] = {}

    def _get_llm(self) -> ChatOpenAI:
        """获取用于摘要的 LLM"""
        return ChatOpenAI(
            model=self.summary_model,
            api_key=os.getenv("OPENROUTER_API_KEY"),
            base_url=os.getenv("OPENROUTER_BASE_URL", "https://openrouter.ai/api/v1"),
            temperature=0.3  # 低温度保证一致性
        )

    def should_summarize(self, messages: List[BaseMessage]) -> bool:
        """判断是否需要生成摘要"""
        return len(messages) >= self.max_messages

    def _format_messages(self, messages: List[BaseMessage]) -> str:
        """格式化消息为文本"""
        lines = []
        for msg in messages:
            role = "用户" if isinstance(msg, HumanMessage) else "AI" if isinstance(msg, AIMessage) else "系统"
            content = str(msg.content)
            # 截断过长内容
            if len(content) > 500:
                content = content[:500] + "..."
            lines.append(f"【{role}{content}")
        return "\n".join(lines)

    def generate_summary(
        self,
        messages: List[BaseMessage],
        existing_summary: Optional[ConversationSummary] = None
    ) -> ConversationSummary:
        """
        生成或更新对话摘要

        Args:
            messages: 消息列表
            existing_summary: 已有的摘要(更新模式)

        Returns:
            生成的摘要
        """
        llm = self._get_llm()

        if existing_summary:
            # 更新模式
            prompt = self.UPDATE_PROMPT.format(
                existing_summary=existing_summary.raw_summary,
                new_messages=self._format_messages(messages[-10:])  # 只用最近 10 条更新
            )
        else:
            # 新建模式
            prompt = self.SUMMARY_PROMPT.format(
                conversation=self._format_messages(messages)
            )

        try:
            response = llm.invoke(prompt)
            import json

            # 尝试解析 JSON
            content = response.content
            # 提取 JSON 部分
            if "```json" in content:
                content = content.split("```json")[1].split("```")[0]
            elif "```" in content:
                content = content.split("```")[1].split("```")[0]

            data = json.loads(content.strip())

            now = datetime.now()
            return ConversationSummary(
                topics=data.get("topics", []),
                decisions=data.get("decisions", []),
                user_preferences=data.get("user_preferences", []),
                action_items=data.get("action_items", []),
                timeline=data.get("timeline", []),
                raw_summary=data.get("summary", ""),
                created_at=existing_summary.created_at if existing_summary else now,
                updated_at=now
            )
        except Exception as e:
            logger.error(f"生成摘要失败: {e}")
            # 返回简单摘要
            return ConversationSummary(
                topics=[],
                decisions=[],
                user_preferences=[],
                action_items=[],
                timeline=[],
                raw_summary=f"对话摘要生成失败: {str(e)}",
                created_at=datetime.now(),
                updated_at=datetime.now()
            )

    def get_summary_message(self, summary: ConversationSummary) -> SystemMessage:
        """将摘要转换为系统消息"""
        parts = ["以下是之前对话的摘要:\n"]

        if summary.topics:
            parts.append(f"【讨论主题】{', '.join(summary.topics)}")

        if summary.decisions:
            parts.append("\n【关键决策】")
            for d in summary.decisions:
                parts.append(f"• {d}")

        if summary.user_preferences:
            parts.append("\n【用户偏好】")
            for p in summary.user_preferences:
                parts.append(f"• {p}")

        if summary.action_items:
            parts.append("\n【待办事项】")
            for a in summary.action_items:
                parts.append(f"• {a}")

        if summary.raw_summary:
            parts.append(f"\n【摘要】{summary.raw_summary}")

        return SystemMessage(content="\n".join(parts))

    def manage_messages(
        self,
        messages: List[BaseMessage],
        thread_id: str
    ) -> List[BaseMessage]:
        """
        管理消息列表,必要时生成摘要并裁剪

        Args:
            messages: 原始消息列表
            thread_id: 会话 ID

        Returns:
            管理后的消息列表
        """
        if not self.should_summarize(messages):
            return messages

        # 获取或创建摘要
        existing_summary = self._summaries.get(thread_id)

        # 需要摘要的消息(旧消息)
        old_messages = messages[:-self.keep_recent]

        # 生成/更新摘要
        summary = self.generate_summary(old_messages, existing_summary)
        self._summaries[thread_id] = summary

        # 构建新消息列表:摘要 + 最近消息
        summary_msg = self.get_summary_message(summary)
        recent_messages = messages[-self.keep_recent:]

        result = [summary_msg] + recent_messages

        logger.info(
            f"已管理长对话: 原始 {len(messages)} 条 -> "
            f"摘要 1 条 + 最近 {len(recent_messages)} 条 = {len(result)} 条"
        )

        return result


# ============ 简单裁剪器 ============

class MessageTrimmer:
    """简单消息裁剪器(无摘要)"""

    def __init__(
        self,
        max_tokens: int = 4000,
        keep_recent: int = 10
    ):
        self.max_tokens = max_tokens
        self.keep_recent = keep_recent

    def trim(self, messages: List[BaseMessage]) -> List[BaseMessage]:
        """裁剪消息到最近 N 条"""
        if len(messages) <= self.keep_recent:
            return messages

        # 保留第一条(通常是系统提示)+ 最近 N-1 条
        first = messages[0] if messages else None
        recent = messages[-(self.keep_recent - 1):]

        if first and not isinstance(first, SystemMessage):
            return recent

        return [first] + recent if first else recent

3.2 集成到聊天服务

修改 services/langgraph_chat.py:

python
# 在文件顶部添加
from services.conversation_summarizer import ConversationSummarizer, MessageTrimmer


class LangGraphChatService:
    """LangGraph 聊天服务"""

    def __init__(self, config: Optional[ChatConfig] = None):
        self.config = config or ChatConfig()
        # 初始化摘要器
        self.summarizer = ConversationSummarizer(
            max_messages=30,      # 30 条消息触发摘要
            keep_recent=10,       # 保留最近 10 条
            summary_model="openai/gpt-4o-mini"  # 用小模型生成摘要
        )
        # 简单裁剪器(可选)
        self.trimmer = MessageTrimmer(
            max_tokens=4000,
            keep_recent=10
        )

    def _call_model_with_management(
        self,
        state: MessagesState,
        thread_id: str,
        use_summary: bool = True
    ):
        """调用模型(带消息管理)"""
        messages = state["messages"]

        # 管理消息长度
        if use_summary:
            managed_messages = self.summarizer.manage_messages(messages, thread_id)
        else:
            managed_messages = self.trimmer.trim(messages)

        # 调用 LLM
        # ... 原有逻辑

四、前端展示

4.1 摘要提示组件

javascript
// static/js/summary-indicator.js

class ConversationSummaryIndicator {
    constructor() {
        this.summary = null;
    }

    show(summary) {
        this.summary = summary;

        const indicator = document.createElement('div');
        indicator.className = 'summary-indicator';
        indicator.innerHTML = `
            <div class="summary-header">
                <span class="summary-icon">📝</span>
                <span>对话摘要</span>
                <button class="toggle-btn" onclick="this.parentElement.parentElement.classList.toggle('expanded')">
                    展开
                </button>
            </div>
            <div class="summary-content">
                ${this.renderSummaryContent(summary)}
            </div>
        `;

        // 插入到消息容器顶部
        const messagesContainer = document.getElementById('messages');
        messagesContainer.insertBefore(indicator, messagesContainer.firstChild);
    }

    renderSummaryContent(summary) {
        let html = '';

        if (summary.topics?.length) {
            html += `
                <div class="summary-section">
                    <strong>讨论主题</strong>
                    <div class="tags">
                        ${summary.topics.map(t => `<span class="tag">${t}</span>`).join('')}
                    </div>
                </div>
            `;
        }

        if (summary.decisions?.length) {
            html += `
                <div class="summary-section">
                    <strong>关键决策</strong>
                    <ul>
                        ${summary.decisions.map(d => `<li>${d}</li>`).join('')}
                    </ul>
                </div>
            `;
        }

        if (summary.action_items?.length) {
            html += `
                <div class="summary-section">
                    <strong>待办事项</strong>
                    <ul>
                        ${summary.action_items.map(a => `<li>${a}</li>`).join('')}
                    </ul>
                </div>
            `;
        }

        return html;
    }
}

4.2 CSS 样式

css
/* static/css/summary.css */

.summary-indicator {
    background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
    border-radius: 12px;
    margin: 10px 20px;
    overflow: hidden;
    color: white;
}

.summary-header {
    padding: 12px 16px;
    display: flex;
    align-items: center;
    gap: 8px;
    cursor: pointer;
}

.summary-icon {
    font-size: 18px;
}

.toggle-btn {
    margin-left: auto;
    background: rgba(255, 255, 255, 0.2);
    border: none;
    color: white;
    padding: 4px 12px;
    border-radius: 4px;
    cursor: pointer;
    font-size: 12px;
}

.summary-content {
    max-height: 0;
    overflow: hidden;
    transition: max-height 0.3s ease;
    background: rgba(0, 0, 0, 0.1);
}

.summary-indicator.expanded .summary-content {
    max-height: 300px;
    padding: 16px;
}

.summary-section {
    margin-bottom: 12px;
}

.summary-section:last-child {
    margin-bottom: 0;
}

.summary-section strong {
    display: block;
    margin-bottom: 8px;
    font-size: 13px;
    opacity: 0.9;
}

.tags {
    display: flex;
    flex-wrap: wrap;
    gap: 6px;
}

.tag {
    background: rgba(255, 255, 255, 0.2);
    padding: 4px 10px;
    border-radius: 20px;
    font-size: 12px;
}

.summary-section ul {
    margin: 0;
    padding-left: 16px;
    font-size: 13px;
}

.summary-section li {
    margin: 4px 0;
    opacity: 0.9;
}

五、成本分析

5.1 Token 节省估算

节省计算:

  • 假设平均每条消息 1000 tokens
  • 50 条消息 = 50,000 tokens
  • 摘要 (约 500 tokens) + 10 条消息 (10,000 tokens) = 10,500 tokens
  • 节省:(50,000 - 10,500) / 50,000 = 79%

5.2 额外成本

项目成本说明
摘要生成~500 tokens每 30 条消息一次
摘要更新~300 tokens后续更新摘要
总额外成本约 1%相比节省的 79% 微不足道

六、实施计划

步骤任务预估时间
1创建 services/conversation_summarizer.py2h
2集成到 services/langgraph_chat.py1h
3前端摘要指示器1h
4测试和优化1h
总计5h (0.5天)