RAG过时了？揭秘CAG：缓存增强生成技术实战与优化

AI客服每天需要回答数千个问题，其中至少三分之一是重复的，例如“年假如何计算”、“差旅费如何报销”、“公积金比例是多少”等。这些问题的答案通常都已写入公司制度，且几个月内不会发生变化。

Contents

一、RAG虽好，但它有“健忘症”二、CAG：为AI装上“内存条”三、RAG+CAG融合架构四、如何判断何时应缓存？五、缓存更新策略六、生产级实现与最佳实践 6.2 完整的代码示例七、总结

但问题在于：每次有人提问，AI系统都需要重新去文档库中检索一遍。

这好比将常用的工具放在手边，而非每次都需前往仓库寻找。

本文将通过实际代码，完整实现从传统RAG到CAG的演进过程。每一步都附带可运行的代码，以便读者真正理解其工作原理。

RAG技术工作流程示意图

一、RAG虽好，但它有“健忘症”

谈及此，首先需了解当前流行的RAG技术。

RAG，全称“检索增强生成”，其原理直白易懂：AI在回答问题之前，会首先从知识库中检索相关资料，然后基于这些检索到的信息生成答案。

该方法有效解决了AI模型可能出现的“幻觉”问题。然而，它存在一个固有的缺陷——缺乏记忆能力。

1.1 什么是RAG？

RAG（检索增强生成）的工作流程如下：

用户提问
系统检索知识库中的相关文档
将检索结果与问题一并提供给AI
AI基于检索内容生成答案

这听起来很美好，但问题在于：每次查询都需进行检索。

1.2 传统RAG的完整实现

首先实现一个标准的RAG系统，以企业HR知识库为例：

import numpy as np
from typing import List, Dict
import time
from datetime import datetime

# 模拟向量数据库
class SimpleVectorDB:
    """简单的向量数据库实现"""

    def __init__(self):
        self.documents = []
        self.embeddings = []
        self.metadata = []

    def add_document(self, text: str, metadata: Dict = None):
        """添加文档到数据库"""
        # 这里用简单的词频向量模拟embedding
        embedding = self._text_to_vector(text)
        self.documents.append(text)
        self.embeddings.append(embedding)
        self.metadata.append(metadata or {})

    def _text_to_vector(self, text: str) -> np.ndarray:
        """将文本转换为向量（简化版）"""
        # 实际应该用OpenAI/HuggingFace的embedding模型
        # 这里简化处理：基于字符出现频率
        vector = np.zeros(100)
        for i, char in enumerate(text[:100]):
            vector[i] = ord(char) / 1000
        return vector

    def search(self, query: str, top_k: int = 3) -> List[Dict]:
        """检索最相关的文档"""
        query_vector = self._text_to_vector(query)

        # 计算余弦相似度
        similarities = []
        for i, doc_vector in enumerate(self.embeddings):
            similarity = np.dot(query_vector, doc_vector) / (
                np.linalg.norm(query_vector) * np.linalg.norm(doc_vector) + 1e-10
            )
            similarities.append({
                'index': i,
                'score': similarity,
                'text': self.documents[i],
                'metadata': self.metadata[i]
            })

        # 返回top_k结果
        similarities.sort(key=lambda x: x['score'], reverse=True)
        return similarities[:top_k]

class TraditionalRAG:
    """传统RAG系统"""

    def __init__(self):
        self.vector_db = SimpleVectorDB()
        self.search_count = 0  # 统计检索次数
        self.search_times = []  # 记录每次检索耗时

    def add_knowledge(self, text: str, metadata: Dict = None):
        """添加知识到系统"""
        self.vector_db.add_document(text, metadata)

    def query(self, question: str) -> Dict:
        """处理查询"""
        start_time = time.time()

        # 每次都要检索
        search_results = self.vector_db.search(question, top_k=2)

        search_time = time.time() - start_time
        self.search_count += 1
        self.search_times.append(search_time)

        # 组装上下文
        context = "

".join([r['text'] for r in search_results])

        # 模拟LLM生成答案（实际应调用GPT/Claude API）
        answer = self._generate_answer(question, context)

        return {
            'question': question,
            'answer': answer,
            'context': context,
            'search_time': search_time,
            'total_searches': self.search_count
        }

    def _generate_answer(self, question: str, context: str) -> str:
        """模拟LLM生成答案"""
        # 实际应该调用OpenAI API或其他LLM
        return f"基于知识库：{context[:100]}… 回答：[模拟答案]"

    def get_statistics(self) -> Dict:
        """获取性能统计"""
        return {
            'total_searches': self.search_count,
            'avg_search_time': np.mean(self.search_times) if self.search_times else 0,
            'total_time': sum(self.search_times)
        }

# 使用示例
def demo_traditional_rag():
    """演示传统RAG的问题"""
    print("=" * 60)
    print("传统RAG系统演示")
    print("=" * 60)

    # 创建RAG系统
    rag = TraditionalRAG()

    # 添加企业知识（这些都是稳定的制度文档）
    knowledge_base = [
        {
            "text": "公司年假政策：入职满1年员工享有5天年假，满3年享有10天，满5年享有15天。年假必须在当年使用，不可跨年累积.",
            "metadata": {"category": "HR政策", "update_date": "2024-01-01"}
        },
        {
            "text": "差旅费报销标准：国内出差每天补贴200元，住宿费实报实销上限500元/天。需提供发票和出差申请单.",
            "metadata": {"category": "财务制度", "update_date": "2024-01-01"}
        },
        {
            "text": "公积金缴纳比例：公司和个人各缴纳12%，基数为上年度月平均工资。每年7月调整一次.",
            "metadata": {"category": "薪酬福利", "update_date": "2024-01-01"}
        },
        {
            "text": "病假规定：员工因病需请假，需提供医院证明。病假工资按基本工资的80%发放，每年累计不超过30天.",
            "metadata": {"category": "HR政策", "update_date": "2024-01-01"}
        }
    ]

    for kb in knowledge_base:
        rag.add_knowledge(kb['text'], kb['metadata'])
    print(f"
已加载 {len(knowledge_base)} 条企业知识
")

    # 模拟重复查询（这是关键问题所在）
    repeated_questions = [
        "年假怎么算？",
        "年假政策是什么？",
        "我能休几天年假？",
        "差旅费怎么报销？",
        "出差补贴标准是多少？",
        "年假能累积吗？",  # 又问年假
        "公积金比例是多少？",
        "年假政策详细说明",  # 再问年假
    ]

    print("开始处理查询…
")
    for i, question in enumerate(repeated_questions, 1):
        result = rag.query(question)
        print(f"查询 {i}: {question}")
        print(f"  检索耗时: {result['search_time']*1000:.2f}ms")
        print(f"  累计检索次数: {result['total_searches']}")
        print()

    # 显示统计信息
    stats = rag.get_statistics()
    print("=" * 60)
    print("性能统计")
    print("=" * 60)
    print(f"总检索次数: {stats['total_searches']}")
    print(f"平均检索耗时: {stats['avg_search_time']*1000:.2f}ms")
    print(f"总耗时: {stats['total_time']*1000:.2f}ms")
    print()
    print("⚠️  问题分析：")
    print("  - 关于'年假'的问题被问了4次，但每次都重新检索")
    print("  - 这些制度文档几个月都不会变，却要反复访问数据库")
    print("  - 随着查询量增加，成本和延迟线性上升")
    print()
# 运行演示
demo_traditional_rag()

1.3 问题暴露：成本与延迟

运行上述代码，将观察到：

关于“年假”的问题被询问4次，系统也进行了4次检索。
每次检索都需要访问向量数据库。
累计检索次数随查询量线性增长。

在实际生产环境中的影响包括：

成本：向量数据库的调用费用（例如Pinecone按查询次数收费）。
延迟：网络往返及相似度计算耗时，通常在50-200ms之间。
资源：数据库连接数和CPU占用率。

通过上述示例可以清晰地发现，即使相同问题被询问上百次，AI系统仍会执行上百次检索。访问数据库、匹配文档、提取信息……这一系列流程下来，既耗时又耗费资源。

尤其对于那些几乎不会变化的知识，例如公司规章制度、产品说明书、法律条文等，每次都重新检索，无疑是“杀鸡用牛刀”。

二、CAG：为AI装上“内存条”

为解决上述问题，一种新思路应运而生，即缓存增强生成（CAG）技术。

简而言之，CAG技术旨在为AI系统配备“内存”，将那些稳定不变的知识直接存储在系统内部的记忆库中。当再次遇到相关问题时，系统便无需重新检索外部文档，可直接从“记忆”中快速调取信息。

这好比将常用工具放置在手边，而非每次都需前往仓库寻找。

其效果立竿见影：

速度更快: 无需反复访问数据库，响应时间可大幅缩短。
成本更低: 检索次数减少，服务器压力降低，自然节省了成本。
回答更稳定: 对于固定知识的表述更具一致性，避免信息前后不一。

2.1 CAG的核心思想

CAG（缓存增强生成）的核心任务是：

识别哪些知识属于“静态的”（长期不变）。
将这些静态知识直接缓存到内存中。
查询时首先检查缓存，若命中则无需进行外部检索。

那么，是不是所有知识都应该被缓存呢？

当然不是。若将所有知识都塞入缓存，将迅速耗尽AI系统的“内存容量”。

2.2 CAG系统的完整代码实现

import hashlib
from typing import Optional, Dict
import json
from datetime import datetime # 修正：datetime导入应在顶层

class KnowledgeCache:
    """知识缓存管理器"""

    def __init__(self, max_size: int = 100):
        self.cache = {}  # 缓存存储
        self.max_size = max_size
        self.hit_count = 0  # 命中次数
        self.miss_count = 0  # 未命中次数
        self.access_log = []  # 访问日志

    def _generate_key(self, query: str) -> str:
        """生成查询的缓存键"""
        # 使用语义哈希（这里简化为文本哈希）
        # 实际应该用embedding的相似度匹配
        normalized = query.lower().strip()
        return hashlib.md5(normalized.encode()).hexdigest()[:16]

    def get(self, query: str, similarity_threshold: float = 0.85) -> Optional[Dict]:
        """从缓存获取答案"""
        # 简化版：精确匹配
        # 实际应该用语义相似度匹配
        query_key = query.lower().strip()

        # 查找语义相似的缓存项
        for cached_query, cached_data in self.cache.items():
            if self._is_similar(query_key, cached_query):
                self.hit_count += 1
                self.access_log.append({
                    'query': query,
                    'result': 'HIT',
                    'timestamp': datetime.now().isoformat()
                })
                return cached_data

        self.miss_count += 1
        self.access_log.append({
            'query': query,
            'result': 'MISS',
            'timestamp': datetime.now().isoformat()
        })
        return None

    def _is_similar(self, query1: str, query2: str) -> bool:
        """判断两个查询是否相似"""
        # 简化版：包含关键词就算相似
        # 实际应该用向量相似度
        keywords1 = set(query1.split())
        keywords2 = set(query2.split())

        if not keywords1 or not keywords2:
            return False

        intersection = keywords1 & keywords2
        union = keywords1 | keywords2
        similarity = len(intersection) / len(union)

        return similarity > 0.5

    def set(self, query: str, context: str, answer: str, metadata: Dict = None):
        """设置缓存"""
        query_key = query.lower().strip()

        # 检查容量限制
        if len(self.cache) >= self.max_size:
            # 简单的LRU：删除最旧的项
            oldest_key = next(iter(self.cache))
            del self.cache[oldest_key]

        self.cache[query_key] = {
            'query': query,
            'context': context,
            'answer': answer,
            'metadata': metadata or {},
            'cached_at': datetime.now().isoformat()
        }

    def get_statistics(self) -> Dict:
        """获取缓存统计"""
        total_access = self.hit_count + self.miss_count
        hit_rate = self.hit_count / total_access if total_access > 0 else 0

        return {
            'hit_count': self.hit_count,
            'miss_count': self.miss_count,
            'total_access': total_access,
            'hit_rate': hit_rate,
            'cache_size': len(self.cache)
        }

class CAGSystem:
    """CAG（缓存增强生成）系统"""

    def __init__(self, cache_size: int = 100):
        self.vector_db = SimpleVectorDB()
        self.cache = KnowledgeCache(max_size=cache_size)
        self.search_count = 0
        self.search_times = []
        self.cache_hit_times = []

    def add_knowledge(self, text: str, metadata: Dict = None, cacheable: bool = False):
        """添加知识"""
        self.vector_db.add_document(text, metadata)

        # 如果标记为可缓存，预先生成常见问题的缓存
        if cacheable and metadata and 'common_questions' in metadata:
            for question in metadata['common_questions']:
                # 预先缓存答案
                answer = f"基于缓存：{text[:100]}…"
                self.cache.set(question, text, answer, metadata)

    def query(self, question: str) -> Dict:
        """处理查询（带缓存）"""
        start_time = time.time()

        # 先查缓存
        cached_result = self.cache.get(question)

        if cached_result:
            # 缓存命中！
            cache_time = time.time() - start_time
            self.cache_hit_times.append(cache_time)

            return {
                'question': question,
                'answer': cached_result['answer'],
                'context': cached_result['context'],
                'source': 'CACHE',
                'response_time': cache_time,
                'total_searches': self.search_count
            }

        # 缓存未命中，执行检索
        search_results = self.vector_db.search(question, top_k=2)
        search_time = time.time() - start_time
        self.search_count += 1
        self.search_times.append(search_time)

        # 组装上下文
        context = "

".join([r['text'] for r in search_results])
        answer = self._generate_answer(question, context)

        # 存入缓存（如果是静态知识）
        if search_results and self._is_cacheable(search_results[0]):
            self.cache.set(question, context, answer,
                              search_results[0].get('metadata', {}))

        return {
            'question': question,
            'answer': answer,
            'context': context,
            'source': 'RETRIEVAL',
            'response_time': search_time,
            'total_searches': self.search_count
        }

    def _is_cacheable(self, search_result: Dict) -> bool:
        """判断检索结果是否应该缓存"""
        metadata = search_result.get('metadata', {})
        # 如果有更新日期且超过30天未更新，认为是静态知识
        update_date = metadata.get('update_date')
        if update_date:
            # 简化判断：只要有update_date就认为是静态的
            return True
        return False

    def _generate_answer(self, question: str, context: str) -> str:
        """模拟LLM生成答案"""
        return f"基于知识库：{context[:100]}… 回答：[模拟答案]"

    def get_statistics(self) -> Dict:
        """获取完整统计信息"""
        cache_stats = self.cache.get_statistics()

        return {
            'retrieval': {
                'total_searches': self.search_count,
                'avg_search_time': np.mean(self.search_times) if self.search_times else 0,
                'total_time': sum(self.search_times)
            },
            'cache': {
                'hit_count': cache_stats['hit_count'],
                'miss_count': cache_stats['miss_count'],
                'hit_rate': cache_stats['hit_rate'],
                'avg_hit_time': np.mean(self.cache_hit_times) if self.cache_hit_times else 0,
                'cache_size': cache_stats['cache_size']
            },
            'overall': {
                'total_queries': cache_stats['total_access'],
                'searches_saved': cache_stats['hit_count'],
                'cost_reduction': f"{cache_stats['hit_rate']*100:.1f}%"
            }
        }

# 使用示例
def demo_cag_system():
    """演示CAG系统的优势"""
    print("=" * 60)
    print("CAG系统演示（带缓存优化）")
    print("=" * 60)

    # 创建CAG系统
    cag = CAGSystem(cache_size=50)

    # 添加知识（标记静态知识为可缓存）
    knowledge_base = [
        {
            "text": "公司年假政策：入职满1年员工享有5天年假，满3年享有10天，满5年享有15天。年假必须在当年使用，不可跨年累积.",
            "metadata": {
                "category": "HR政策",
                "update_date": "2024-01-01",
                "common_questions": [
                    "年假怎么算",
                    "年假政策是什么",
                    "我能休几天年假",
                    "年假能累积吗"
                ]
            },
            "cacheable": True
        },
        {
            "text": "差旅费报销标准：国内出差每天补贴200元，住宿费实报实销上限500元/天。需提供发票和出差申请单.",
            "metadata": {
                "category": "财务制度",
                "update_date": "2024-01-01",
                "common_questions": [
                    "差旅费怎么报销",
                    "出差补贴标准是多少",
                    "出差住宿费报销"
                ]
            },
            "cacheable": True
        },
        {
            "text": "公积金缴纳比例：公司和个人各缴纳12%，基数为上年度月平均工资。每年7月调整一次.",
            "metadata": {
                "category": "薪酬福利",
                "update_date": "2024-01-01",
                "common_questions": [
                    "公积金比例是多少",
                    "公积金怎么缴纳"
                ]
            },
            "cacheable": True
        }
    ]

    for kb in knowledge_base:
        cag.add_knowledge(kb['text'], kb['metadata'], kb['cacheable'])
    print(f"
已加载 {len(knowledge_base)} 条企业知识（已预缓存常见问题）
")

    # 模拟重复查询
    test_questions = [
        "年假怎么算？",  # 第1次：缓存命中
        "年假政策是什么？",  # 第2次：缓存命中
        "我能休几天年假？",  # 第3次：缓存命中
        "差旅费怎么报销？",  # 第1次：缓存命中
        "出差补贴标准是多少？",  # 第2次：缓存命中
        "年假能累积吗？",  # 第4次：缓存命中
        "公积金比例是多少？",  # 第1次：缓存命中
        "年假政策详细说明",  # 第5次：缓存命中
    ]

    print("开始处理查询…
")
    for i, question in enumerate(test_questions, 1):
        result = cag.query(question)

        # 显示结果
        source_icon = "⚡ [缓存]" if result['source'] == 'CACHE' else "🔍 [检索]"
        print(f"查询 {i}: {question}")
        print(f"  数据源: {source_icon}")
        print(f"  响应时间: {result['response_time']*1000:.2f}ms")
        print(f"  累计检索次数: {result['total_searches']}")
        print()

    # 显示详细统计
    stats = cag.get_statistics()
    print("=" * 60)
    print("性能统计对比")
    print("=" * 60)

    print("
【检索统计】")
    print(f"  实际检索次数: {stats['retrieval']['total_searches']}")
    print(f"  平均检索耗时: {stats['retrieval']['avg_search_time']*1000:.2f}ms")

    print("
【缓存统计】")
    print(f"  缓存命中次数: {stats['cache']['hit_count']}")
    print(f"  缓存未命中: {stats['cache']['miss_count']}")
    print(f"  缓存命中率: {stats['cache']['hit_rate']*100:.1f}%")
    print(f"  平均缓存响应: {stats['cache']['avg_hit_time']*1000:.2f}ms")
    print(f"  当前缓存大小: {stats['cache']['cache_size']}")

    print("
【整体优化】")
    print(f"  总查询次数: {stats['overall']['total_queries']}")
    print(f"  节省检索次数: {stats['overall']['searches_saved']}")
    print(f"  成本降低: {stats['overall']['cost_reduction']}")

    print("
✅ 优势总结：")
    print("  - 重复问题直接从缓存返回，无需检索")
    print("  - 响应时间从 50-200ms 降低到 <5ms")
    print("  - 数据库访问次数大幅减少，成本显著降低")
    print()
# 运行CAG演示
demo_cag_system()

2.3 CAG与RAG的性能对比

直接对比两种系统：

def compare_rag_vs_cag():
    """直接对比RAG和CAG的性能"""
    print("=" * 60)
    print("RAG vs CAG 性能对比实验")
    print("=" * 60)

    # 准备测试数据
    knowledge = {
        "text": "公司年假政策：入职满1年员工享有5天年假，满3年享有10天，满5年享有15天.",
        "metadata": {
            "category": "HR政策",
            "common_questions": ["年假怎么算", "年假政策", "休几天年假"]
        }
    }

    # 重复查询100次
    questions = ["年假怎么算？"] * 100

    # 测试传统RAG
    print("
【测试1：传统RAG】")
    rag = TraditionalRAG()
    rag.add_knowledge(knowledge['text'], knowledge['metadata'])

    rag_start = time.time()
    for q in questions:
        rag.query(q)
    rag_total_time = time.time() - rag_start
    rag_stats = rag.get_statistics()

    print(f"总耗时: {rag_total_time*1000:.2f}ms")
    print(f"检索次数: {rag_stats['total_searches']}")
    print(f"平均延迟: {rag_stats['avg_search_time']*1000:.2f}ms")

    # 测试CAG
    print("
【测试2：CAG系统】")
    cag = CAGSystem()
    cag.add_knowledge(knowledge['text'], knowledge['metadata'], cacheable=True)

    cag_start = time.time()
    for q in questions:
        cag.query(q)
    cag_total_time = time.time() - cag_start
    cag_stats = cag.get_statistics()

    print(f"总耗时: {cag_total_time*1000:.2f}ms")
    print(f"检索次数: {cag_stats['retrieval']['total_searches']}")
    print(f"缓存命中率: {cag_stats['cache']['hit_rate']*100:.1f}%")
    print(f"平均延迟: {cag_stats['cache']['avg_hit_time']*1000:.2f}ms")

    # 性能提升计算
    print("
" + "=" * 60)
    print("性能提升")
    print("=" * 60)
    speedup = rag_total_time / cag_total_time
    search_reduction = (rag_stats['total_searches'] - cag_stats['retrieval']['total_searches']) / rag_stats['total_searches']

    print(f"速度提升: {speedup:.1f}x")
    print(f"检索次数减少: {search_reduction*100:.1f}%")
    print(f"成本节约: ~{search_reduction*100:.1f}%")
# 运行对比测试
compare_rag_vs_cag()

三、RAG+CAG融合架构

3.1 为什么需要融合？

这如同人类大脑的工作机制：九九乘法表、家庭住址等信息早已牢记，而今日午餐内容或明日天气状况则仍需即时查询。

这种“内存+外脑”的双引擎模式，才是未来知识型AI的标配。

class HybridRAGCAG:
    """混合RAG+CAG系统"""

    def __init__(self, cache_size: int = 100):
        # 静态知识缓存
        self.static_cache = KnowledgeCache(max_size=cache_size)

        # 动态知识向量库
        self.dynamic_db = SimpleVectorDB()

        # 静态知识向量库（用于缓存未命中时的后备）
        self.static_db = SimpleVectorDB()

        # 统计信息
        self.stats = {
            'static_cache_hits': 0,
            'static_db_queries': 0,
            'dynamic_db_queries': 0,
            'total_queries': 0
        }

    def add_static_knowledge(self, text: str, metadata: Dict = None,
                                 common_questions: List[str] = None):
        """添加静态知识（长期不变）"""
        # 添加到静态数据库
        self.static_db.add_document(text, metadata)

        # 预缓存常见问题
        if common_questions:
            for question in common_questions:
                answer = f"[静态知识] {text}"
                self.static_cache.set(question, text, answer, metadata)

    def add_dynamic_knowledge(self, text: str, metadata: Dict = None):
        """添加动态知识（经常更新）"""
        # 只添加到动态数据库，不缓存
        self.dynamic_db.add_document(text, metadata)

    def query(self, question: str, require_realtime: bool = False) -> Dict:
        """智能查询：自动判断用缓存还是检索

        Args:
            question: 用户问题
            require_realtime: 是否强制要求实时数据
        """
        self.stats['total_queries'] += 1
        start_time = time.time()

        # 如果不要求实时数据，先查静态缓存
        if not require_realtime:
            cached_result = self.static_cache.get(question)
            if cached_result:
                self.stats['static_cache_hits'] += 1
                return {
                    'question': question,
                    'answer': cached_result['answer'],
                    'source': 'STATIC_CACHE',
                    'response_time': time.time() - start_time,
                    'confidence': 'high'
                }

        # 判断问题类型：需要动态数据还是静态数据？
        question_type = self._classify_question(question)

        if question_type == 'dynamic' or require_realtime:
            # 查询动态数据库
            results = self.dynamic_db.search(question, top_k=2)
            self.stats['dynamic_db_queries'] += 1
            source = 'DYNAMIC_RETRIEVAL'
        else:
            # 查询静态数据库
            results = self.static_db.search(question, top_k=2)
            self.stats['static_db_queries'] += 1
            source = 'STATIC_RETRIEVAL'

        # 将结果缓存起来，下次直接用
        if results:
            context = results[0]['text']
            answer = f"[静态知识] {context}"
            self.static_cache.set(question, context, answer,
                                  results[0].get('metadata', {}))

        # 生成答案
        context = "
".join([r['text'] for r in results]) if results else ""
        answer = self._generate_answer(question, context, source)

        return {
            'question': question,
            'answer': answer,
            'source': source,
            'response_time': time.time() - start_time,
            'confidence': 'high' if results else 'low'
        }

    def _classify_question(self, question: str) -> str:
        """判断问题需要动态数据还是静态数据"""
        # 简化版：通过关键词判断
        dynamic_keywords = ['今天', '最新', '现在', '当前', '实时', '昨天', '最近']
        static_keywords = ['政策', '制度', '规定', '标准', '流程', '怎么', '如何']

        question_lower = question.lower()

        # 包含动态关键词，返回dynamic
        for keyword in dynamic_keywords:
            if keyword in question_lower:
                return 'dynamic'

        # 包含静态关键词，返回static
        for keyword in static_keywords:
            if keyword in question_lower:
                return 'static'

        # 默认当作静态
        return 'static'

    def _generate_answer(self, question: str, context: str, source: str) -> str:
        """生成答案"""
        if not context:
            return "抱歉，没有找到相关信息。"
        return f"基于{source}：{context[:150]}…"

    def get_statistics(self) -> Dict:
        """获取详细统计"""
        cache_stats = self.static_cache.get_statistics()
        total_queries = self.stats['total_queries']

        return {
            'total_queries': total_queries,
            'static_cache_hits': self.stats['static_cache_hits'],
            'static_db_queries': self.stats['static_db_queries'],
            'dynamic_db_queries': self.stats['dynamic_db_queries'],
            'cache_hit_rate': cache_stats['hit_rate'],
            'db_access_rate': (self.stats['static_db_queries'] + self.stats['dynamic_db_queries']) / total_queries if total_queries > 0 else 0
        }

def demo_hybrid_system():
    """演示混合系统"""
    print("=" * 60)
    print("混合RAG+CAG系统演示")
    print("=" * 60)

    # 创建混合系统
    hybrid = HybridRAGCAG(cache_size=50)

    # 添加静态知识（制度文档）
    print("
【加载静态知识】")
    static_knowledge = [
        {
            "text": "公司年假政策：入职满1年员工享有5天年假，满3年享有10天，满5年享有15天。年假必须在当年使用，不可跨年累积.",
            "common_questions": ["年假怎么算", "年假政策", "休几天年假", "年假能累积吗"]
        },
        {
            "text": "报销流程：提交申请单→部门主管审批→财务审核→财务打款。处理时间约3-5个工作日.",
            "common_questions": ["怎么报销", "报销流程", "报销要多久"]
        }
    ]

    for kb in static_knowledge:
        hybrid.add_static_knowledge(
            kb['text'],
            {'type': 'static', 'category': 'policy'},
            kb['common_questions']
        )
    print(f"已加载 {len(static_knowledge)} 条静态知识（已预缓存）")

    # 添加动态知识（实时数据）
    print("
【加载动态知识】")
    dynamic_knowledge = [
        {
            "text": "今天公司食堂菜单：午餐有红烧肉、清蒸鱼、麻婆豆腐。晚餐有宫保鸡丁、酸菜鱼、素炒时蔬.",
            "metadata": {'type': 'dynamic', 'date': '2025-11-05'}
        },
        {
            "text": "本周会议通知：周三下午3点全体会议，周五上午10点部门例会。请提前准备材料.",
            "metadata": {'type': 'dynamic', 'date': '2025-11-05'}
        }
    ]

    for kb in dynamic_knowledge:
        hybrid.add_dynamic_knowledge(kb['text'], kb['metadata'])
    print(f"已加载 {len(dynamic_knowledge)} 条动态知识")

    # 测试不同类型的查询
    print("
" + "=" * 60)
    print("开始测试查询")
    print("=" * 60)

    test_cases = [
        {"q": "年假怎么算？", "type": "静态问题（应命中缓存）"},
        {"q": "年假政策是什么？", "type": "静态问题（应命中缓存）"},
        {"q": "报销流程是什么？", "type": "静态问题（应命中缓存）"},
        {"q": "今天食堂吃什么？", "type": "动态问题（应查询动态库）"},
        {"q": "本周有什么会议？", "type": "动态问题（应查询动态库）"},
        {"q": "年假能累积吗？", "type": "静态问题（应命中缓存）"},
        {"q": "今天食堂有什么菜？", "type": "动态问题（应查询动态库）"},
    ]

    for i, test in enumerate(test_cases, 1):
        result = hybrid.query(test['q'])

        # 根据source显示不同图标
        if result['source'] == 'STATIC_CACHE':
            icon = "⚡"
            color = "缓存"
        elif result['source'] == 'STATIC_RETRIEVAL':
            icon = "📚"
            color = "静态库"
        else:
            icon = "🔄"
            color = "动态库"

        print(f"
查询 {i}: {test['q']}")
        print(f"  类型: {test['type']}")
        print(f"  数据源: {icon} {color}")
        print(f"  响应时间: {result['response_time']*1000:.2f}ms")

    # 显示统计
    print("
" + "=" * 60)
    print("系统统计")
    print("=" * 60)
    stats = hybrid.get_statistics()

    print(f"
总查询次数: {stats['total_queries']}")
    print(f"  ├─ 静态缓存命中: {stats['static_cache_hits']} ({stats['cache_hit_rate']*100:.1f}%)")
    print(f"  ├─ 静态库查询: {stats['static_db_queries']}")
    print(f"  └─ 动态库查询: {stats['dynamic_db_queries']}")
    print(f"
数据库访问率: {stats['db_access_rate']*100:.1f}%")
    print(f"成本节约: ~{(1-stats['db_access_rate'])*100:.1f}%")

    print("
✅ 混合架构优势：")
    print("  - 静态知识走缓存，响应极快")
    print("  - 动态知识走检索，保证实时性")
    print("  - 自动判断问题类型，智能路由")
    print("  - 兼顾速度、成本和准确性")
# 运行混合系统演示
demo_hybrid_system()

四、如何判断何时应缓存？

4.1 为什么需要选择性缓存？

并非所有知识都适合缓存。若盲目缓存，将面临两大问题：

内存爆炸：缓存过多内容将占用大量内存资源。
命中率低：缓存不常用内容，导致空间浪费。

因此，需要一套智能缓存策略。

4.2 基于访问频率的智能缓存

class SmartCache:
    """智能缓存系统（基于LFU+LRU）"""

    def __init__(self, max_size: int = 100, min_access_count: int = 3):
        self.cache = {}
        self.access_count = {}  # 访问计数
        self.last_access = {}  # 最后访问时间
        self.max_size = max_size
        self.min_access_count = min_access_count  # 最小访问次数才缓存

        # 候选池：访问次数不够的暂存这里
        self.candidate_pool = {}
        self.candidate_access = {}

    def should_cache(self, key: str) -> bool:
        """判断是否应该缓存"""
        # 如果已经在候选池
        if key in self.candidate_access:
            self.candidate_access[key] += 1

            # 访问次数达到阈值，提升到正式缓存
            if self.candidate_access[key] >= self.min_access_count:
                return True
        else:
            # 首次访问，加入候选池
            self.candidate_access[key] = 1

        return False

    def set(self, key: str, value: Dict):
        """设置缓存（只缓存热数据）"""
        if not self.should_cache(key):
            # 暂存到候选池
            self.candidate_pool[key] = value
            return False

        # 达到缓存条件，正式缓存
        if len(self.cache) >= self.max_size:
            # 淘汰策略：LFU + LRU
            self._evict()

        self.cache[key] = value
        self.access_count[key] = self.candidate_access.get(key, 1)
        self.last_access[key] = time.time()

        # 从候选池移除
        if key in self.candidate_pool:
            del self.candidate_pool[key]

        return True

    def get(self, key: str) -> Optional[Dict]:
        """获取缓存"""
        if key in self.cache:
            # 更新访问统计
            self.access_count[key] += 1
            self.last_access[key] = time.time()
            return self.cache[key]

        # 检查候选池
        if key in self.candidate_pool:
            self.candidate_access[key] += 1
            # 如果访问够多了，提升到正式缓存
            if self.candidate_access[key] >= self.min_access_count:
                self.set(key, self.candidate_pool[key])
                return self.cache[key]
            return self.candidate_pool[key]

        return None

    def _evict(self):
        """淘汰缓存项（LFU + LRU组合）"""
        if not self.cache:
            return

        # 找出访问次数最少的项
        min_count = min(self.access_count.values())
        candidates = [k for k, v in self.access_count.items() if v == min_count]

        # 如果有多个，选最久未访问的
        if len(candidates) > 1:
            evict_key = min(candidates, key=lambda k: self.last_access[k])
        else:
            evict_key = candidates[0]

        # 删除
        del self.cache[evict_key]
        del self.access_count[evict_key]
        del self.last_access[evict_key]

    def get_statistics(self) -> Dict:
        """获取统计信息"""
        return {
            'cache_size': len(self.cache),
            'candidate_size': len(self.candidate_pool),
            'total_size': len(self.cache) + len(self.candidate_pool),
            'avg_access_count': np.mean(list(self.access_count.values())) if self.access_count else 0,
            'hot_items': sorted(
                [(k, v) for k, v in self.access_count.items()],
                key=lambda x: x[1],
                reverse=True
            )[:5]  # 前5个热门项
        }

class SmartCachingSystem:
    """带智能缓存的完整系统"""

    def __init__(self, cache_size: int = 50, min_access: int = 3):
        self.vector_db = SimpleVectorDB()
        self.smart_cache = SmartCache(max_size=cache_size, min_access_count=min_access)

        self.stats = {
            'total_queries': 0,
            'cache_hits': 0,
            'db_queries': 0,
            'promoted_to_cache': 0  # 从候选池提升到正式缓存的次数
        }

    def add_knowledge(self, text: str, metadata: Dict = None):
        """添加知识"""
        self.vector_db.add_document(text, metadata)

    def query(self, question: str) -> Dict:
        """查询"""
        self.stats['total_queries'] += 1
        start_time = time.time()

        # 查缓存
        cached = self.smart_cache.get(question)
        if cached and question in self.smart_cache.cache:  # 正式缓存命中
            self.stats['cache_hits'] += 1
            return {
                'question': question,
                'answer': cached['answer'],
                'source': 'CACHE',
                'response_time': time.time() - start_time
            }

        # 检索
        results = self.vector_db.search(question, top_k=2)
        self.stats['db_queries'] += 1

        context = "
".join([r['text'] for r in results]) if results else ""
        answer = f"基于检索: {context[:100]}…"

        # 尝试缓存（智能判断）
        cached_result = self.smart_cache.set(question, {
            'answer': answer,
            'context': context,
            'metadata': results[0].get('metadata', {}) if results else {}
        })

        if cached_result:
            self.stats['promoted_to_cache'] += 1

        return {
            'question': question,
            'answer': answer,
            'source': 'RETRIEVAL',
            'response_time': time.time() - start_time,
            'will_cache': cached_result
        }

    def get_statistics(self) -> Dict:
        """获取统计"""
        cache_stats = self.smart_cache.get_statistics()

        return {
            'queries': self.stats,
            'cache': cache_stats,
            'cache_hit_rate': self.stats['cache_hits'] / self.stats['total_queries'] if self.stats['total_queries'] > 0 else 0
        }

def demo_smart_caching():
    """演示智能缓存"""
    print("=" * 60)
    print("智能缓存系统演示")
    print("=" * 60)

    # 创建系统
    system = SmartCachingSystem(cache_size=10, min_access=3)

    # 添加知识
    knowledge = [
        "年假政策：入职满1年5天，满3年10天，满5年15天",
        "报销流程：提交申请→审批→财务审核→打款",
        "公积金比例：公司和个人各12%",
        "加班政策：工作日1.5倍，周末2倍，节假日3倍",
        "社保缴纳：养老8%医疗2%失业0.5%"
    ]

    for kb in knowledge:
        system.add_knowledge(kb)

    print(f"
已加载 {len(knowledge)} 条知识
")

    # 模拟真实查询分布（符合二八定律）
    print("模拟真实查询场景（80%查询集中在20%的问题）
")

    # 热门问题（会被频繁查询）
    hot_questions = [
        "年假怎么算",
        "怎么报销",
        "公积金比例"
    ]

    # 冷门问题（偶尔查一次）
    cold_questions = [
        "加班怎么算",
        "社保比例",
        "病假政策",
        "迟到扣款",
        "离职流程"
    ]

    # 生成查询序列（80/20分布）
    query_sequence = []
    for _ in range(50):
        if np.random.random() < 0.8:  # 80%概率查热门问题
            query_sequence.append(np.random.choice(hot_questions))
        else:  # 20%概率查冷门问题
            query_sequence.append(np.random.choice(cold_questions))

    # 执行查询
    print("开始处理50次查询…
")
    cache_hits_timeline = []

    for i, question in enumerate(query_sequence, 1):
        result = system.query(question)

        if i <= 10 or i % 10 == 0:  # 只显示部分结果
            source_icon = "⚡" if result['source'] == 'CACHE' else "🔍"
            cached_tag = " [已提升到缓存]" if result.get('will_cache') else ""
            print(f"查询{i:2d}: {question:15s} {source_icon} {result['source']}{cached_tag}")

        # 记录命中率变化
        stats = system.get_statistics()
        cache_hits_timeline.append(stats['cache_hit_rate'])

    # 最终统计
    print("
" + "=" * 60)
    print("最终统计")
    print("=" * 60)

    final_stats = system.get_statistics()

    print(f"
【查询统计】")
    print(f"  总查询次数: {final_stats['queries']['total_queries']}")
    print(f"  缓存命中: {final_stats['queries']['cache_hits']}")
    print(f"  数据库查询: {final_stats['queries']['db_queries']}")
    print(f"  提升到缓存: {final_stats['queries']['promoted_to_cache']}")

    print(f"
【缓存统计】")
    print(f"  正式缓存: {final_stats['cache']['cache_size']} 项")
    print(f"  候选池: {final_stats['cache']['candidate_size']} 项")
    print(f"  总存储: {final_stats['cache']['total_size']} 项")
    print(f"  缓存命中率: {final_stats['cache_hit_rate']*100:.1f}%")
    print(f"  平均访问次数: {final_stats['cache']['avg_access_count']:.1f}")

    print(f"
【热门问题Top5】")
    for i, (question, count) in enumerate(final_stats['cache']['hot_items'], 1):
        print(f"  {i}. {question} - 访问{count}次")

    print("
✅ 智能缓存特点：")
    print("  - 只缓存被多次访问的热门问题（访问≥3次）")
    print("  - 冷门问题不占用宝贵的缓存空间")
    print("  - 自动淘汰不常用的缓存项")
    print("  - 符合真实业务场景的访问分布")

    # 显示命中率趋势
    print(f"
【命中率趋势】前20次查询：")
    for i in range(0, min(20, len(cache_hits_timeline)), 5):
        rate = cache_hits_timeline[i]
        bar = "█" * int(rate * 50)
        print(f"  查询{i+1:2d}: {bar} {rate*100:.1f}%")
# 运行智能缓存演示
demo_smart_caching()

五、缓存更新策略

5.1 如何处理知识更新？

静态知识也可能发生更新，例如：

公司政策调整
产品信息变更
法律法规修订

此时，需要有效的缓存失效机制。

5.2 完整的缓存更新实现

from datetime import datetime, timedelta
import time # 修正：time导入应在顶层

class CacheWithTTL:
    """带过期时间的缓存"""

    def __init__(self, max_size: int = 100, default_ttl: int = 86400):
        """
        Args:
            max_size: 最大缓存数量
            default_ttl: 默认过期时间（秒），默认24小时
        """
        self.cache = {}
        self.max_size = max_size
        self.default_ttl = default_ttl

        self.stats = {
            'hits': 0,
            'misses': 0,
            'expires': 0,
            'invalidations': 0
        }

    def set(self, key: str, value: Dict, ttl: Optional[int] = None):
        """设置缓存项

        Args:
            key: 缓存键
            value: 缓存值
            ttl: 过期时间（秒），None则使用默认值
        """
        if len(self.cache) >= self.max_size:
            self._evict_oldest()

        expire_at = time.time() + (ttl if ttl is not None else self.default_ttl)

        self.cache[key] = {
            'value': value,
            'expire_at': expire_at,
            'created_at': time.time(),
            'version': value.get('metadata', {}).get('version', 1)
        }

    def get(self, key: str) -> Optional[Dict]:
        """获取缓存项"""
        if key not in self.cache:
            self.stats['misses'] += 1
            return None

        item = self.cache[key]

        # 检查是否过期
        if time.time() > item['expire_at']:
            self.stats['expires'] += 1
            del self.cache[key]
            return None

        self.stats['hits'] += 1
        return item['value']

    def invalidate(self, key: str):
        """主动失效某个缓存"""
        if key in self.cache:
            del self.cache[key]
            self.stats['invalidations'] += 1

    def invalidate_by_pattern(self, pattern: str):
        """按模式批量失效"""
        keys_to_delete = [k for k in self.cache.keys() if pattern in k]
        for key in keys_to_delete:
            self.invalidate(key)

    def update_version(self, key: str, new_version: int):
        """更新版本号（触发重新缓存）"""
        if key in self.cache:
            current_version = self.cache[key]['version']
            if new_version > current_version:
                # 版本更新，失效旧缓存
                self.invalidate(key)

    def _evict_oldest(self):
        """淘汰最旧的项"""
        if not self.cache:
            return
        oldest_key = min(self.cache.keys(),
                             key=lambda k: self.cache[k]['created_at'])
        del self.cache[oldest_key]

    def get_statistics(self) -> Dict:
        """获取统计"""
        total = self.stats['hits'] + self.stats['misses']
        return {
            **self.stats,
            'hit_rate': self.stats['hits'] / total if total > 0 else 0,
            'cache_size': len(self.cache)
        }

class VersionedKnowledgeBase:
    """带版本控制的知识库"""

    def __init__(self):
        self.documents = {}  # key: doc_id, value: {content, version, metadata}
        self.cache = CacheWithTTL(max_size=50, default_ttl=3600)  # 1小时TTL
        self.vector_db = SimpleVectorDB()

    def add_or_update_document(self, doc_id: str, content: str,
                                 metadata: Dict = None, version: int = 1):
        """添加或更新文档"""
        # 检查是否是更新
        is_update = doc_id in self.documents
        old_version = self.documents[doc_id]['version'] if is_update else 0

        # 保存文档
        self.documents[doc_id] = {
            'content': content,
            'version': version,
            'metadata': metadata or {},
            'updated_at': datetime.now().isoformat()
        }

        # 更新向量数据库
        self.vector_db.add_document(content, {
            'doc_id': doc_id,
            'version': version,
            **(metadata or {})
        })

        # 如果是更新，失效相关缓存
        if is_update and version > old_version:
            print(f"📝 文档 {doc_id} 更新: v{old_version} -> v{version}")
            self.cache.invalidate_by_pattern(doc_id)
            return True

        return False

    def query(self, question: str, doc_id: Optional[str] = None) -> Dict:
        """查询（支持版本检查）"""
        # 构建缓存键
        cache_key = f"{doc_id}:{question}" if doc_id else question

        # 查缓存
        cached = self.cache.get(cache_key)
        if cached:
            return {
                'question': question,
                'answer': cached['answer'],
                'source': 'CACHE',
                'version': cached.get('version', 'unknown')
            }

        # 检索
        results = self.vector_db.search(question, top_k=2)
        if not results:
            return {'question': question, 'answer': '未找到相关信息', 'source': 'NONE'}

        # 生成答案
        context = results[0]['text']
        result_doc_id = results[0]['metadata'].get('doc_id', 'unknown')
        result_version = results[0]['metadata'].get('version', 1)
        answer = f"[v{result_version}] {context}"

        # 缓存结果
        self.cache.set(cache_key, {
            'answer': answer,
            'context': context,
            'metadata': {
                'doc_id': result_doc_id,
                'version': result_version
            }
        }, ttl=3600)  # 1小时过期

        return {
            'question': question,
            'answer': answer,
            'source': 'RETRIEVAL',
            'version': result_version,
            'doc_id': result_doc_id
        }

def demo_cache_update():
    """演示缓存更新机制"""
    print("=" * 60)
    print("缓存更新与版本控制演示")
    print("=" * 60)

    kb = VersionedKnowledgeBase()

    # 场景1：初始知识
    print("
【场景1：初始加载知识】")
    kb.add_or_update_document(
        doc_id="policy_annual_leave",
        content="年假政策v1：入职满1年5天，满3年10天，满5年15天",
        metadata={'category': 'HR'},
        version=1
    )
    print("✅ 已添加：年假政策 v1")

    # 第一次查询
    print("
第1次查询：年假怎么算？")
    result1 = kb.query("年假怎么算")
    print(f"  来源: {result1['source']}")
    print(f"  版本: {result1['version']}")
    print(f"  答案: {result1['answer'][:50]}…")

    # 第二次查询（应该命中缓存）
    print("
第2次查询：年假怎么算？")
    result2 = kb.query("年假怎么算")
    print(f"  来源: {result2['source']} ⚡")
    print(f"  版本: {result2['version']}")

    # 场景2：政策更新
    print("
" + "=" * 60)
    print("【场景2：政策更新】")
    print("公司调整年假政策…")

    kb.add_or_update_document(
        doc_id="policy_annual_leave",
        content="年假政策v2：入职满1年7天，满3年12天，满5年20天。新增：满10年25天",
        metadata={'category': 'HR'},
        version=2
    )

    # 再次查询（缓存已失效，应该返回新版本）
    print("
第3次查询：年假怎么算？")
    result3 = kb.query("年假怎么算")
    print(f"  来源: {result3['source']}")
    print(f"  版本: {result3['version']}")
    print(f"  答案: {result3['answer'][:60]}…")

    # 场景3：缓存重建
    print("
第4次查询：年假怎么算？")
    result4 = kb.query("年假怎么算")
    print(f"  来源: {result4['source']} ⚡ (新版本已缓存)")
    print(f"  版本: {result4['version']}")

    # 统计
    print("
" + "=" * 60)
    print("缓存统计")
    print("=" * 60)
    stats = kb.cache.get_statistics()
    print(f"缓存命中: {stats['hits']}")
    print(f"缓存未命中: {stats['misses']}")
    print(f"缓存失效: {stats['invalidations']}")
    print(f"命中率: {stats['hit_rate']*100:.1f}%")

    print("
✅ 更新机制总结：")
    print("  - 文档更新时自动失效相关缓存")
    print("  - 版本号控制确保数据一致性")
    print("  - 支持TTL自动过期")
    print("  - 下次查询会获取最新版本并重新缓存")
# 运行缓存更新演示
demo_cache_update()

5.3 三种缓存失效策略对比

def compare_invalidation_strategies():
    """对比不同的缓存失效策略"""
    print("=" * 60)
    print("三种缓存失效策略对比")
    print("=" * 60)

    print("
【策略1：固定TTL（Time To Live）】")
    print("特点：设置固定过期时间")
    print("优点：实现简单，自动清理")
    print("缺点：可能返回过期数据")
    print("适用：可以容忍短期延迟的场景")
    print("
示例代码：")
    print("""    cache.set('question', answer, ttl=3600)  # 1小时后过期
    """)

    print("
【策略2：版本号控制】")
    print("特点：每次更新增加版本号")
    print("优点：精确控制，不会返回旧数据")
    print("缺点：需要维护版本号系统")
    print("适用：数据一致性要求高的场景")
    print("
示例代码：")
    print("""    # 更新文档时
    doc.version += 1
    cache.invalidate_by_version(doc.id, doc.version)
    """)

    print("
【策略3：主动推送失效】")
    print("特点：内容更新时主动通知缓存失效")
    print("优点：实时性最好")
    print("缺点：需要额外的通知机制")
    print("适用：分布式系统、多节点部署")
    print("
示例代码：")
    print("""    # 发布更新事件
    event_bus.publish('document_updated', doc_id='policy_123')

    # 监听器失效缓存
    @event_bus.subscribe('document_updated')
    def on_document_updated(doc_id):
        cache.invalidate_by_pattern(doc_id)
    """)

    # 实际测试对比
    print("
" + "=" * 60)
    print("实际场景测试")
    print("=" * 60)

    # 模拟：文档每小时更新一次，查询每分钟一次
    ttl_configs = [
        {'name': 'TTL=10分钟', 'ttl': 600, 'update_interval': 3600},
        {'name': 'TTL=30分钟', 'ttl': 1800, 'update_interval': 3600},
        {'name': 'TTL=60分钟', 'ttl': 3600, 'update_interval': 3600},
    ]

    print("
假设：文档每小时更新，查询每分钟一次（共120次查询）")
    print("
不同TTL配置的效果：
")

    for config in ttl_configs:
        ttl = config['ttl']
        update_interval = config['update_interval']

        # 计算可能返回过期数据的次数
        stale_responses = max(0, (update_interval - ttl) / 60)  # 分钟
        freshness_rate = (60 - stale_responses) / 60 * 100

        print(f"{config['name']}:")
        print(f"  可能过期的响应: ~{int(stale_responses)}次")
        print(f"  数据新鲜度: {freshness_rate:.1f}%")
        print()

    print("💡 建议：")
    print("  - 制度类文档：TTL = 24小时 + 版本控制")
    print("  - 产品信息：TTL = 1小时 + 版本控制")
    print("  - 实时数据：不缓存或TTL < 5分钟")
# 运行对比
compare_invalidation_strategies()

六、生产级实现与最佳实践

6.1 完整的生产级CAG系统

import logging
from typing import Callable, Optional, List, Dict # 修正：添加List, Dict
from dataclasses import dataclass
from enum import Enum
import time # 修正：time导入应在顶层
import numpy as np # 修正：np导入应在顶层
from datetime import datetime # 修正：datetime导入应在顶层

# SimpleVectorDB, KnowledgeCache, CacheWithTTL 类的定义需要在这里被导入或再次声明
# 为了简化，这里假设它们已经定义在其他地方或者在当前代码块的上方
# 实际生产代码中，这些类应该被合理组织并导入。
# 为了让这个示例代码独立运行，我将这些类重新定义在这里，但生产环境建议模块化。

# --- 重新定义前面用到的辅助类，以确保此代码块独立运行 --- 
# 注意：实际优化输出时，这些重复的类定义会因为前面的代码块而存在，这里是为了避免Python执行报错而做的补充。
# 在最终HTML中，这些类会出现在它们首次出现的地方。

class SimpleVectorDB:
    """简单的向量数据库实现 - 用于生产级系统演示"""
    def __init__(self):
        self.documents = []
        self.embeddings = []
        self.metadata = []

    def add_document(self, text: str, metadata: Dict = None):
        embedding = self._text_to_vector(text)
        self.documents.append(text)
        self.embeddings.append(embedding)
        self.metadata.append(metadata or {})

    def _text_to_vector(self, text: str) -> np.ndarray:
        vector = np.zeros(100) # 简化处理
        for i, char in enumerate(text[:100]):
            vector[i] = ord(char) / 1000
        return vector

    def search(self, query: str, top_k: int = 3) -> List[Dict]:
        query_vector = self._text_to_vector(query)
        similarities = []
        for i, doc_vector in enumerate(self.embeddings):
            similarity = np.dot(query_vector, doc_vector) / (
                np.linalg.norm(query_vector) * np.linalg.norm(doc_vector) + 1e-10
            )
            similarities.append({
                'index': i,
                'score': similarity,
                'text': self.documents[i],
                'metadata': self.metadata[i]
            })
        similarities.sort(key=lambda x: x['score'], reverse=True)
        return similarities[:top_k]

class KnowledgeCache:
    """知识缓存管理器 - 用于生产级系统演示"""
    def __init__(self, max_size: int = 100):
        self.cache = {}
        self.max_size = max_size
        self.hit_count = 0
        self.miss_count = 0
        self.access_log = []

    def _is_similar(self, query1: str, query2: str) -> bool:
        # 简化版
        keywords1 = set(query1.split())
        keywords2 = set(query2.split())
        if not keywords1 or not keywords2: return False
        intersection = keywords1 & keywords2
        union = keywords1 | keywords2
        similarity = len(intersection) / len(union)
        return similarity > 0.5

    def get(self, query: str) -> Optional[Dict]:
        query_key = query.lower().strip()
        for cached_query, cached_data in self.cache.items():
            if self._is_similar(query_key, cached_query):
                self.hit_count += 1
                return cached_data
        self.miss_count += 1
        return None

    def set(self, query: str, context: str, answer: str, metadata: Dict = None):
        query_key = query.lower().strip()
        if len(self.cache) >= self.max_size:
            oldest_key = next(iter(self.cache))
            del self.cache[oldest_key]
        self.cache[query_key] = {
            'query': query,
            'context': context,
            'answer': answer,
            'metadata': metadata or {},
            'cached_at': datetime.now().isoformat()
        }

    def get_statistics(self) -> Dict:
        total_access = self.hit_count + self.miss_count
        hit_rate = self.hit_count / total_access if total_access > 0 else 0
        return {
            'hit_count': self.hit_count,
            'miss_count': self.miss_count,
            'total_access': total_access,
            'hit_rate': hit_rate,
            'cache_size': len(self.cache)
        }

class CacheWithTTL:
    """带过期时间的缓存 - 用于生产级系统演示"""
    def __init__(self, max_size: int = 100, default_ttl: int = 86400):
        self.cache = {}
        self.max_size = max_size
        self.default_ttl = default_ttl
        self.stats = {
            'hits': 0, 'misses': 0, 'expires': 0, 'invalidations': 0
        }

    def set(self, key: str, value: Dict, ttl: Optional[int] = None):
        if len(self.cache) >= self.max_size:
            self._evict_oldest()
        expire_at = time.time() + (ttl if ttl is not None else self.default_ttl)
        self.cache[key] = {
            'value': value, 'expire_at': expire_at, 'created_at': time.time(),
            'version': value.get('metadata', {}).get('version', 1)
        }

    def get(self, key: str) -> Optional[Dict]:
        if key not in self.cache: self.stats['misses'] += 1; return None
        item = self.cache[key]
        if time.time() > item['expire_at']: self.stats['expires'] += 1; del self.cache[key]; return None
        self.stats['hits'] += 1; return item['value']

    def invalidate(self, key: str):
        if key in self.cache: del self.cache[key]; self.stats['invalidations'] += 1

    def invalidate_by_pattern(self, pattern: str):
        keys_to_delete = [k for k in list(self.cache.keys()) if pattern in k]
        for key in keys_to_delete: self.invalidate(key)

    def _evict_oldest(self):
        if not self.cache: return
        oldest_key = min(self.cache.keys(), key=lambda k: self.cache[k]['created_at'])
        del self.cache[oldest_key]

    def get_statistics(self) -> Dict:
        total = self.stats['hits'] + self.stats['misses']
        return {
            **self.stats,
            'hit_rate': self.stats['hits'] / total if total > 0 else 0,
            'cache_size': len(self.cache)
        }

class SmartCache:
    """智能缓存系统（基于LFU+LRU） - 用于生产级系统演示"""
    def __init__(self, max_size: int = 100, min_access_count: int = 3):
        self.cache = {}; self.access_count = {}; self.last_access = {}
        self.max_size = max_size; self.min_access_count = min_access_count
        self.candidate_pool = {}; self.candidate_access = {}

    def should_cache(self, key: str) -> bool:
        if key in self.candidate_access:
            self.candidate_access[key] += 1
            if self.candidate_access[key] >= self.min_access_count: return True
        else: self.candidate_access[key] = 1
        return False

    def set(self, key: str, value: Dict):
        if not self.should_cache(key):
            self.candidate_pool[key] = value; return False
        if len(self.cache) >= self.max_size: self._evict()
        self.cache[key] = value
        self.access_count[key] = self.candidate_access.get(key, 1)
        self.last_access[key] = time.time()
        if key in self.candidate_pool: del self.candidate_pool[key]
        return True

    def get(self, key: str) -> Optional[Dict]:
        if key in self.cache: self.access_count[key] += 1; self.last_access[key] = time.time(); return self.cache[key]
        if key in self.candidate_pool:
            self.candidate_access[key] += 1
            if self.candidate_access[key] >= self.min_access_count:
                self.set(key, self.candidate_pool[key]); return self.cache[key]
            return self.candidate_pool[key]
        return None

    def _evict(self):
        if not self.cache: return
        min_count = min(self.access_count.values())
        candidates = [k for k, v in self.access_count.items() if v == min_count]
        evict_key = min(candidates, key=lambda k: self.last_access[k]) if len(candidates) > 1 else candidates[0]
        del self.cache[evict_key]; del self.access_count[evict_key]; del self.last_access[evict_key]

    def get_statistics(self) -> Dict:
        return {
            'cache_size': len(self.cache),
            'candidate_size': len(self.candidate_pool),
            'total_size': len(self.cache) + len(self.candidate_pool),
            'avg_access_count': np.mean(list(self.access_count.values())) if self.access_count else 0,
            'hot_items': sorted([(k, v) for k, v in self.access_count.items()], key=lambda x: x[1], reverse=True)[:5]
        }

# --- 辅助类定义结束 ---

class CacheStrategy(Enum):
    """缓存策略"""
    ALWAYS = "always"  # 总是缓存
    SMART = "smart"  # 智能判断
    NEVER = "never"  # 从不缓存

@dataclass
class CacheConfig:
    """缓存配置"""
    max_size: int = 100
    default_ttl: int = 3600
    min_access_count: int = 3
    strategy: CacheStrategy = CacheStrategy.SMART
    enable_metrics: bool = True

class ProductionCAGSystem:
    """生产级CAG系统"""
    def __init__(self, config: CacheConfig = None):
        self.config = config or CacheConfig()

        # 核心组件
        self.static_cache = CacheWithTTL(
            max_size=self.config.max_size,
            default_ttl=self.config.default_ttl
        )
        self.smart_cache = SmartCache(
            max_size=self.config.max_size,
            min_access_count=self.config.min_access_count
        )
        self.vector_db = SimpleVectorDB()

        # 监控指标
        self.metrics = {
            'total_queries': 0,
            'cache_hits': 0,
            'db_queries': 0,
            'avg_response_time': [],
            'errors': 0
        }
        # 日志
        self.logger = self._setup_logger()

    def _setup_logger(self):
        """设置日志"""
        logger = logging.getLogger('CAGSystem')
        logger.setLevel(logging.INFO)
        if not logger.handlers:
            handler = logging.StreamHandler()
            formatter = logging.Formatter(
                '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
            )
            handler.setFormatter(formatter)
            logger.addHandler(handler)
        return logger

    def add_knowledge(self,
                          text: str,
                          doc_id: str,
                          metadata: Dict = None,
                          cache_strategy: CacheStrategy = None,
                          ttl: int = None):
        """添加知识"""
        try:
            full_metadata = {
                'doc_id': doc_id,
                'cache_strategy': (cache_strategy or self.config.strategy).value,
                'ttl': ttl or self.config.default_ttl,
                **(metadata or {})
            }
            self.vector_db.add_document(text, full_metadata)
            self.logger.info(f"Added document: {doc_id}")
        except Exception as e:
            self.logger.error(f"Error adding document {doc_id}: {str(e)}")
            self.metrics['errors'] += 1
            raise

    def query(self, question: str, force_refresh: bool = False) -> Dict:
        """查询"""
        start_time = time.time()
        self.metrics['total_queries'] += 1
        try:
            # 强制刷新则跳过缓存
            if not force_refresh:
                # 先查静态缓存
                cached = self.static_cache.get(question)
                if cached:
                    self.metrics['cache_hits'] += 1
                    response_time = time.time() - start_time
                    self.metrics['avg_response_time'].append(response_time)
                    self.logger.info(f"Cache hit: {question[:30]}…")
                    return {
                        'question': question,
                        'answer': cached['answer'],
                        'source': 'STATIC_CACHE',
                        'response_time': response_time,
                        'cached': True
                    }
                # 再查智能缓存
                smart_cached = self.smart_cache.get(question)
                if smart_cached and question in self.smart_cache.cache:
                    self.metrics['cache_hits'] += 1
                    response_time = time.time() - start_time
                    self.metrics['avg_response_time'].append(response_time)
                    self.logger.info(f"Smart cache hit: {question[:30]}…")
                    return {
                        'question': question,
                        'answer': smart_cached['answer'],
                        'source': 'SMART_CACHE',
                        'response_time': response_time,
                        'cached': True
                    }

            # 缓存未命中，执行检索
            self.logger.info(f"Cache miss, retrieving: {question[:30]}…")
            results = self.vector_db.search(question, top_k=2)
            self.metrics['db_queries'] += 1
            if not results:
                response_time = time.time() - start_time
                self.metrics['avg_response_time'].append(response_time)
                return {
                    'question': question,
                    'answer': '未找到相关信息',
                    'source': 'NONE',
                    'response_time': response_time,
                    'cached': False
                }

            # 生成答案
            context = results[0]['text']
            metadata = results[0].get('metadata', {})
            answer = f"基于检索: {context[:100]}…"

            # 根据策略决定是否缓存
            strategy = CacheStrategy(metadata.get('cache_strategy', 'smart'))
            ttl = metadata.get('ttl', self.config.default_ttl)
            if strategy == CacheStrategy.ALWAYS:
                # 直接缓存到静态缓存
                self.static_cache.set(question, {
                    'answer': answer,
                    'context': context,
                    'metadata': metadata
                }, ttl=ttl)
                self.logger.info(f"Cached to static (ALWAYS): {question[:30]}…")
            elif strategy == CacheStrategy.SMART:
                # 让智能缓存决定
                cached = self.smart_cache.set(question, {
                    'answer': answer,
                    'context': context,
                    'metadata': metadata
                })
                if cached:
                    self.logger.info(f"Promoted to smart cache: {question[:30]}…")

            response_time = time.time() - start_time
            self.metrics['avg_response_time'].append(response_time)
            return {
                'question': question,
                'answer': answer,
                'source': 'RETRIEVAL',
                'response_time': response_time,
                'cached': False,
                'cache_strategy': strategy.value
            }
        except Exception as e:
            self.logger.error(f"Error processing query '{question}': {str(e)}")
            self.metrics['errors'] += 1
            raise

    def invalidate_cache(self, pattern: str = None, doc_id: str = None):
        """失效缓存"""
        try:
            if pattern:
                self.static_cache.invalidate_by_pattern(pattern)
                self.logger.info(f"Invalidated cache by pattern: {pattern}")
            if doc_id:
                self.static_cache.invalidate_by_pattern(doc_id)
                self.logger.info(f"Invalidated cache for doc: {doc_id}")
        except Exception as e:
            self.logger.error(f"Error invalidating cache: {str(e)}")
            raise

    def get_health_status(self) -> Dict:
        """获取系统健康状态"""
        total_queries = self.metrics['total_queries']
        cache_hit_rate = self.metrics['cache_hits'] / total_queries if total_queries > 0 else 0
        avg_response = np.mean(self.metrics['avg_response_time']) if self.metrics['avg_response_time'] else 0
        # 健康评分
        health_score = 100
        if cache_hit_rate < 0.3:
            health_score -= 20  # 命中率低
        if avg_response > 0.1:
            health_score -= 15  # 响应慢
        if self.metrics['errors'] > 0:
            health_score -= 30  # 有错误
        status = 'healthy' if health_score >= 80 else 'degraded' if health_score >= 50 else 'unhealthy'
        return {
            'status': status,
            'health_score': health_score,
            'metrics': {
                'total_queries': total_queries,
                'cache_hit_rate': f"{cache_hit_rate*100:.1f}%",
                'avg_response_time': f"{avg_response*1000:.2f}ms",
                'db_queries': self.metrics['db_queries'],
                'errors': self.metrics['errors']
            },
            'cache_info': {
                'static_cache_size': self.static_cache.get_statistics()['cache_size'],
                'smart_cache_size': self.smart_cache.get_statistics()['cache_size']
            }
        }

    def export_metrics(self) -> Dict:
        """导出指标（用于监控系统）"""
        static_stats = self.static_cache.get_statistics()
        smart_stats = self.smart_cache.get_statistics()
        return {
            'timestamp': datetime.now().isoformat(),
            'queries': {
                'total': self.metrics['total_queries'],
                'cache_hits': self.metrics['cache_hits'],
                'db_queries': self.metrics['db_queries'],
                'errors': self.metrics['errors']
            },
            'performance': {
                'cache_hit_rate': self.metrics['cache_hits'] / self.metrics['total_queries'] if self.metrics['total_queries'] > 0 else 0,
                'avg_response_time': np.mean(self.metrics['avg_response_time']) if self.metrics['avg_response_time'] else 0,
                'p95_response_time': np.percentile(self.metrics['avg_response_time'], 95) if len(self.metrics['avg_response_time']) > 0 else 0
            },
            'cache': {
                'static': static_stats,
                'smart': smart_stats
            }
        }

def demo_production_system():
    """演示生产级系统"""
    print("=" * 60)
    print("生产级CAG系统演示")
    print("=" * 60)
    # 创建系统（不同配置）
    config = CacheConfig(
        max_size=50,
        default_ttl=3600,
        min_access_count=2,
        strategy=CacheStrategy.SMART,
        enable_metrics=True
    )
    system = ProductionCAGSystem(config)
    print(f"
系统配置:")
    print(f"  缓存大小: {config.max_size}")
    print(f"  默认TTL: {config.default_ttl}秒")
    print(f"  最小访问次数: {config.min_access_count}")
    print(f"  缓存策略: {config.strategy.value}")
    # 添加不同类型的知识
    print("
" + "=" * 60)
    print("添加知识")
    print("=" * 60)
    # 1. 静态知识（总是缓存）
    system.add_knowledge(
        text="公司年假政策：入职满1年5天，满3年10天，满5年15天",
        doc_id="policy_001",
        metadata={'category': '制度', 'type': 'static'},
        cache_strategy=CacheStrategy.ALWAYS,
        ttl=86400  # 24小时
    )
    print("✅ 添加静态知识: 年假政策（ALWAYS缓存）")
    # 2. 半静态知识（智能缓存）
    system.add_knowledge(
        text="产品价格表：基础版99元/月，专业版199元/月，企业版499元/月",
        doc_id="product_002",
        metadata={'category': '产品', 'type': 'semi-static'},
        cache_strategy=CacheStrategy.SMART,
        ttl=3600  # 1小时
    )
    print("✅ 添加半静态知识: 产品价格（SMART缓存）")
    # 3. 动态知识（不缓存）
    system.add_knowledge(
        text="今日促销：所有产品8折优惠，仅限今天！",
        doc_id="promo_003",
        metadata={'category': '促销', 'type': 'dynamic'},
        cache_strategy=CacheStrategy.NEVER,
        ttl=300  # 5分钟
    )
    print("✅ 添加动态知识: 促销信息（NEVER缓存）")
    # 模拟真实查询场景
    print("
" + "=" * 60)
    print("模拟真实查询")
    print("=" * 60)
    queries = [
        # 静态问题（高频）
        ("年假怎么算", 5),
        ("年假政策", 3),
        # 半静态问题（中频）
        ("产品价格", 3),
        ("多少钱", 2),
        # 动态问题（低频）
        ("今天有优惠吗", 2),
        ("促销活动", 1)
    ]
    print("
执行查询…")
    for question, count in queries:
        for i in range(count):
            result = system.query(question)
            if i == 0:  # 只显示首次查询
                cache_tag = "⚡" if result['cached'] else "🔍"
                print(f"  {cache_tag} {question}: {result['source']} ({result['response_time']*1000:.2f}ms)")
    # 显示健康状态
    print("
" + "=" * 60)
    print("系统健康状态")
    print("=" * 60)
    health = system.get_health_status()
    status_icon = "✅" if health['status'] == 'healthy' else "⚠️" if health['status'] == 'degraded' else "❌"
    print(f"
状态: {status_icon} {health['status'].upper()}")
    print(f"健康评分: {health['health_score']}/100")
    print(f"
指标:")
    for key, value in health['metrics'].items():
        print(f"  {key}: {value}")
    print(f"
缓存信息:")
    for key, value in health['cache_info'].items():
        print(f"  {key}: {value}")
    # 导出指标
    print("
" + "=" * 60)
    print("性能指标（可接入Prometheus/Grafana）")
    print("=" * 60)
    metrics = system.export_metrics()
    print(f"
时间戳: {metrics['timestamp']}")
    print(f"
查询统计:")
    print(f"  总查询: {metrics['queries']['total']}")
    print(f"  缓存命中: {metrics['queries']['cache_hits']}")
    print(f"  数据库查询: {metrics['queries']['db_queries']}")
    print(f"  错误数: {metrics['queries']['errors']}")
    print(f"
性能指标:")
    print(f"  缓存命中率: {metrics['performance']['cache_hit_rate']*100:.1f}%")
    print(f"  平均响应时间: {metrics['performance']['avg_response_time']*1000:.2f}ms")
    print(f"  P95响应时间: {metrics['performance']['p95_response_time']*1000:.2f}ms")
    # 最佳实践总结
    print("
" + "=" * 60)
    print("生产环境最佳实践")
    print("=" * 60)
    print("""1. 【分层缓存策略】
- 静态知识：ALWAYS + 长TTL（24小时）
- 半静态知识：SMART + 中TTL（1小时）
- 动态知识：NEVER 或 短TTL（5分钟）
2. 【监控指标】
- 缓存命中率：目标 >50%
- 平均响应时间：目标 <50ms
- P95响应时间：目标 <100ms
- 错误率：目标 <0.1%
3. 【容量规划】
- 缓存大小 = 日查询量 × 0.2（二八定律）
- 预留20%扩展空间
- 设置告警阈值：命中率<30%、响应>100ms
4. 【失效策略】
- 定时失效：使用TTL
- 主动失效：文档更新时触发
- 批量失效：支持按模式匹配
5. 【高可用保障】
- 缓存失败降级到检索
- 异常捕获和日志记录
- 健康检查接口
- 指标导出到监控系统
""")
# 运行生产系统演示
demo_production_system()

6.2 完整的代码示例

现在将所有代码整合到一起，提供一个完整可运行的demo：

def run_complete_demo():
    """运行完整演示"""
    print("

")
    print("="* 80)
    print(" " * 20 + "CAG完整演示：从RAG到生产级CAG")
    print("=" * 80)

    print("
这个演示将展示：")
    print("  1. 传统RAG的性能问题")
    print("  2. CAG如何解决这些问题")
    print("  3. RAG+CAG混合架构")
    print("  4. 智能缓存策略")
    print("  5. 缓存更新机制")
    print("  6. 生产级系统实现")

    print("
" + "=" * 80)
    input("按回车键开始演示…")

    # 依次运行各个演示
    demos = [
        ("传统RAG系统", demo_traditional_rag),
        ("CAG系统", demo_cag_system),
        ("RAG vs CAG性能对比", compare_rag_vs_cag),
        ("混合RAG+CAG系统", demo_hybrid_system),
        ("智能缓存", demo_smart_caching),
        ("缓存更新机制", demo_cache_update),
        ("缓存失效策略对比", compare_invalidation_strategies),
        ("生产级系统", demo_production_system),
    ]

    for i, (name, demo_func) in enumerate(demos, 1):
        print(f"

{'='*80}")
        print(f"演示 {i}/{len(demos)}: {name}")
        print("="*80)
        input("按回车继续…")
        demo_func()
        print("
演示完成!")
        if i < len(demos):
            input("按回车进入下一个演示…")
    print("

" + "="*80)
    print(" " * 30 + "所有演示完成！")
    print("="*80)
    print("
📚 你已经学会了：")
    print("  ✅ RAG的基本原理和问题")
    print("  ✅ CAG如何通过缓存提升性能")
    print("  ✅ 如何设计混合架构")
    print("  ✅ 智能缓存策略的实现")
    print("  ✅ 缓存更新和失效机制")
    print("  ✅ 生产级系统的完整实现")
    print("
💡 下一步：")
    print("  - 在自己的项目中应用这些技术")
    print("  - 根据实际场景调整缓存策略")
    print("  - 接入监控系统持续优化")
    print("  - 考虑分布式缓存（Redis等）")
# 如果直接运行此文件，执行完整演示
if __name__ == "__main__":
    run_complete_demo()

七、总结

传统的AI系统往往“现学现卖”，每次都需要临时检索信息。

而CAG技术则赋予了AI真正的“记忆力”，使其能够牢固记住核心知识，并在需要时随时调取。

这不仅仅是简单的技术升级，更是AI从“查询工具”向“智能助手”的本质性跃迁。

试想未来，AI助手不仅能知晓如何查询信息，更重要的是，它将能够记住用户的习惯、偏好以及每一次对话历史……

届时，它将真正成为用户的“数字分身”。

而这一切的起点，正是从赋予AI“记忆”能力开始。

RAG过时了？揭秘CAG：缓存增强生成技术实战与优化

一、RAG虽好，但它有“健忘症”

1.1 什么是RAG？

1.2 传统RAG的完整实现

1.3 问题暴露：成本与延迟

二、CAG：为AI装上“内存条”

2.1 CAG的核心思想

2.2 CAG系统的完整代码实现

2.3 CAG与RAG的性能对比

三、RAG+CAG融合架构

3.1 为什么需要融合？

四、如何判断何时应缓存？

4.1 为什么需要选择性缓存？

4.2 基于访问频率的智能缓存

五、缓存更新策略

5.1 如何处理知识更新？

5.2 完整的缓存更新实现

5.3 三种缓存失效策略对比

六、生产级实现与最佳实践

6.1 完整的生产级CAG系统

6.2 完整的代码示例

七、总结

发表回复取消回复

最新内容

《亚洲水发展展望2025》深度解读：亚太水安全喜忧参半，未来挑战何在？

谷歌支付6800万美元和解语音助手监听诉讼，你的隐私可能被“误触发”录音

甲骨文豪掷500亿美元押注AI基建，美国数据中心版图加速扩张

OpenAI总裁豪掷2500万美元支持特朗普，科技巨头与政坛的深度捆绑引关注

相关内容

Big AI Data 时代：小红书数据架构的演进与实践

OpenAI 再度开源安全分类模型 gpt-oss-safeguard：准确率超越 GPT-5，详解其优势与应用

OpenAI 2025开发者日：开幕主题演讲精要与多项重磅功能升级

Ring-1T：开源万亿思考模型，深度解析其训练与卓越性能

分类

快速链接

一、RAG虽好，但它有“健忘症”

You Might Also Like

1.1 什么是RAG？

1.2 传统RAG的完整实现

1.3 问题暴露：成本与延迟

二、CAG：为AI装上“内存条”

2.1 CAG的核心思想

2.2 CAG系统的完整代码实现

2.3 CAG与RAG的性能对比

三、RAG+CAG融合架构

3.1 为什么需要融合？

四、如何判断何时应缓存？

4.1 为什么需要选择性缓存？

4.2 基于访问频率的智能缓存

五、缓存更新策略

5.1 如何处理知识更新？

5.2 完整的缓存更新实现

5.3 三种缓存失效策略对比

六、生产级实现与最佳实践

6.1 完整的生产级CAG系统

6.2 完整的代码示例

七、总结

发表回复 取消回复

最新内容

分类

快速链接

发表回复取消回复