2025年RAG最佳Reranker模型：深入解析其工作原理、优势与选择

检索增强生成（RAG）标志着自然语言处理向前迈出了重要一步。它允许大型语言模型（LLM）在生成响应前，检索训练数据之外的外部信息，从而显著提升性能。这意味着LLM能够高效处理特定领域的知识或最新信息，而无需进行昂贵的模型再训练。RAG中的重排序器（Reranker）在优化检索到的信息方面发挥着至关重要的作用，确保提供最相关的上下文。RAG将信息检索与文本生成相结合，从而生成准确、相关且听起来自然的答案。

Contents

为什么初始检索还不够引入Reranker：优化搜索重新排序如何改善 RAG 2025年最佳Reranker模型如何判断 Reranker 是否正常工作根据需求选择合适的 Reranker 写在最后常见问题

为什么初始检索还不够

RAG的第一步是查找与用户查询相关的文档。系统通常使用关键词搜索或向量相似度等方法。这些方法是一个良好的开端，但它们返回的文档可能并非都同样有用。所使用的嵌入模型可能无法掌握挑选最相关信息所需的细节。

向量搜索（用于查找相似含义）在处理简短查询或专业术语时可能会面临挑战。此外，LLM对其上下文处理能力也存在限制。输入过多的文档，即使是稍微相关的文档，也会使模型混乱，降低最终答案的质量。这种初始的“噪声”检索会分散LLM的注意力，降低其性能。因此，需要一种方法来精炼首批检索到的信息。

RAG系统架构图

RAG系统架构

此图描述了RAG的检索和生成步骤：用户提出问题后，系统通过搜索向量库，根据问题提取结果。检索到的内容连同问题一起传递给LLM，LLM随后提供结构化的输出。

引入Reranker：优化搜索

此时，重排序器（Reranker）变得至关重要。重排序可以显著提高搜索结果的精准度。重排序器使用智能算法分析最初检索到的文档，并根据它们与用户特定问题和意图的匹配程度进行重新排序。

在RAG中，重排序器作为质量过滤器。它们会检查第一组结果，并优先选择那些为查询提供最佳信息的文档。其目标是将最相关的部分提升到最顶部。重排序器可以理解为一位专家，它会仔细检查初始搜索，利用对语言的更深入理解，找到文档与问题之间的最佳匹配。

Reranker工作流程示意图

Reranker

此图展示了一个两阶段的搜索过程。第二阶段是重新排序，在此阶段，基于语义或关键词匹配的初始搜索结果集将进行优化，以显著提高最终结果的相关性和排序，从而为用户的查询提供更准确、更实用的结果。

重新排序如何改善 RAG

重排序器提升了提供给LLM的上下文的准确性。它们会分析用户问题与每篇检索到的文档之间的含义和关系，而不仅仅是简单的关键词匹配。这种更深入的理解有助于识别最有用的信息。

通过将LLM的注意力集中在更小、更优质的文档集上，重排序器可以得出更精确的答案。LLM获得高质量的上下文，从而能够形成更明智、更直接的响应。重排序器会计算一个分数，显示文档与查询在语义上的接近程度，从而实现更优化的最终排序。即使没有完全匹配的关键词，它们也能找到相关信息。

这种对高质量上下文的关注有助于减少LLM的“幻觉”——即模型生成错误但看似合理的信息。将LLM建立在经过重排序器验证的文档之上，可以使最终输出更加可信。

标准RAG流程包括检索和生成。增强型RAG流程在中间添加了重新排序步骤。

检索：获取一组初始候选文档。
重排：使用重新排名模型根据与查询的相关性对这些文档进行重新排序。
生成：仅向LLM提供排名靠前、最相关的文档来创建答案。

这种两阶段方法允许初始检索广撒网（召回率），而重排序器则专注于从中挑选出最佳项（精确度）。这种划分改进了整体流程，并为LLM提供了最佳输入。

Reranker提升RAG性能图示

重新排序可提高RAG

使用查询来搜索向量数据库，检索出最相关的前25个文档。然后，这些文档被传递到“Reranker”模块。重排序器会优化结果，选择最相关的前3个文档作为最终输出。

2025年最佳Reranker模型

以下是2025年备受关注的重排序模型盘点。

Reranker模型对比图

重排序模型

有几种重排序模型是RAG流程的热门选择：

重排序器	模型类型	来源	优点	缺点	适用场景
Cohere[1]	交叉编码器（API）	闭源	高精度、多语言、易于使用、速度（灵活）	成本（API费用）、闭源	通用RAG、企业、多语言、易于使用
bge-reranker[2]	交叉编码器	开源	高精度、开源、可在中等硬件上运行	需要自托管	通用RAG、开源偏好、注重预算
Voyage[3]	交叉编码器（API）	闭源	顶级相关性/准确性	成本（API费用）、潜在更高的延迟（顶级模型）	最大准确度需求（金融、法律）、相关性关键型应用程序
Jina[4]	交叉编码器/ColBERT变体	混合	性能均衡、性价比高、长文档（Jina-ColBERT）	可能无法达到峰值准确度	通用RAG、长文档、平衡成本/性能
FlashRank[5]	轻量级交叉编码器	开源	速度非常快，资源占用低，易于集成	准确率低于大型模型	速度关键型应用程序、资源受限的环境
ColBERT[6]	多载体（晚期相互作用）	开源	规模高效（大型集合），快速检索	索引计算/存储密集型	非常大的文档集，大规模效率
MixedBread (mxbai-rerank-v2)[7]	交叉编码器	开源	SOTA Perf（声称）、快速推理、多语言、长上下文、多功能	需要自托管，相对较新	高性能RAG、多语言、长文档/代码/JSON、开源首选项

Cohere Rerank

Cohere Rerank使用一个复杂的神经网络（可能基于Transformer架构）充当交叉编码器。它会同时处理查询和文档，以精确判断相关性。它是一个专有模型，可通过API访问。

主要功能：其主要功能之一是支持超过100种语言，使其能够灵活适用于全球应用。它可以轻松集成为托管服务。Cohere还提供“Rerank 3 Nimble”，该版本旨在显著提高生产环境中的性能，同时保持高精度。
性能：Cohere Rerank在初始检索步骤中使用的各种嵌入模型中始终保持高精度。Nimble版本显著缩短了响应时间。成本取决于API使用情况。
优势：通过API轻松集成、性能强大可靠、多语言能力出色、速度优化选项（Nimble）。
缺点：它是一种闭源的商业服务，因此需要按使用付费并且不能修改模型。
理想用例：适用于通用RAG应用程序、企业搜索平台、客户支持聊天机器人以及需要广泛语言支持而无需管理模型基础设施的情况。

示例代码

首先安装Cohere库。

%pip install --upgrade --quiet &nbsp;cohere

设置Cohere和ContextualCompressionRetriever。

from&nbsp;langchain.retrievers.contextual_compression&nbsp;import&nbsp;ContextualCompressionRetrieverfrom&nbsp;langchain_cohere&nbsp;import&nbsp;CohereRerankfrom&nbsp;langchain_community.llms&nbsp;import&nbsp;Coherefrom&nbsp;langchain.chains&nbsp;import&nbsp;RetrievalQAllm = Cohere(temperature=0)compressor = CohereRerank(model="rerank-english-v3.0")compression_retriever = ContextualCompressionRetriever(&nbsp; &nbsp;base_compressor=compressor, base_retriever=retriever)chain = RetrievalQA.from_chain_type(&nbsp; &nbsp;llm=Cohere(temperature=0), retriever=compression_retriever)

输出：

{'query':&nbsp;'What did the president say about Ketanji Brown Jackson','result':&nbsp;" The president speaks highly of Ketanji Brown Jackson, stating that she&nbsp;is one of the nation's top legal minds, and will continue the legacy of excellence&nbsp;of Justice Breyer. The president also mentions that he worked with her family and&nbsp;that she comes from a family of public school educators and police officers. Since&nbsp;her nomination, she has received support from various groups, including the&nbsp;Fraternal Order of Police and judges from both major political parties. 

Would&nbsp;you like me to extract another sentence from the provided text? "}

bge-reranker（Base/Large）

这些模型来自北京人工智能研究院（BAAI），并且是开源的（Apache 2.0许可证）。它们基于Transformer，类似交叉编码器，专为重排序任务而设计。它们提供不同大小的版本，例如Base版和Large版。

主要特点：开源让用户可以自由部署和修改模型。例如，bge-reranker-v2-m3模型的参数数量不到6亿，这使得它能够在包括消费级GPU在内的常见硬件上高效运行。
性能：这些模型表现非常出色，尤其是大型版本，通常能够达到接近顶级商业模型的结果。它们表现出强大的平均倒数排名（MRR）得分。成本主要在于自托管所需的计算资源。
优势：无许可费用（开源）、准确性高、自托管灵活性强，即使在中等硬件上也能表现出色。
缺点：需要用户管理部署、基础设施和更新。性能取决于托管硬件。
理想用例：适用于一般RAG任务、研究项目、偏爱开源工具的团队、注重预算的应用程序以及习惯于自托管的用户。

示例代码

from&nbsp;langchain.retrievers&nbsp;import&nbsp;ContextualCompressionRetrieverfrom&nbsp;langchain.retrievers.document_compressors&nbsp;import&nbsp;CrossEncoderRerankerfrom&nbsp;langchain_community.cross_encoders&nbsp;import&nbsp;HuggingFaceCrossEncodermodel = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")compressor = CrossEncoderReranker(model=model, top_n=3)compression_retriever = ContextualCompressionRetriever(&nbsp; &nbsp;base_compressor=compressor, base_retriever=retriever)compressed_docs = compression_retriever.invoke("What is the plan for the economy?")pretty_print_docs(compressed_docs)

输出：

Document 1:More infrastructure and innovation&nbsp;in&nbsp;America.&nbsp;More goods moving faster and cheaper&nbsp;in&nbsp;America.&nbsp;More&nbsp;jobswhere&nbsp;you can earn a good living&nbsp;in&nbsp;America.&nbsp;And instead of relying on foreign supply chains,&nbsp;let’s make it&nbsp;in&nbsp;America.&nbsp;Economists call it “increasing the productive capacity of our economy.”&nbsp;I call it building a better America.&nbsp;My plan to fight inflation will lower your costs and lower the deficit.----------------------------------------------------------------------------------------------------Document 2:Second – cut energy costs&nbsp;for&nbsp;families an average of&nbsp;$500&nbsp;a year by combatting&nbsp;climate change. &nbsp;Let’s provide investments and tax credits to weatherize your homes and businesses to&nbsp;be energy efficient and you get a tax credit; double America’s clean energy&nbsp;production&nbsp;in&nbsp;solar, wind, and so much more; &nbsp;lower the price of electric vehicles,&nbsp;saving you another&nbsp;$80&nbsp;a month because you’ll never have to pay at the gas pump&nbsp;again.----------------------------------------------------------------------------------------------------Document 3:Look at cars.&nbsp;Last year, there weren’t enough semiconductors to make all the cars that people&nbsp;wanted to buy.&nbsp;And guess what, prices of automobiles went up.&nbsp;So—we have a choice.&nbsp;One way to fight inflation is to drive down wages and make Americans poorer. &nbsp;I have a better plan to fight inflation.&nbsp;Lower your costs, not your wages.&nbsp;Make more cars and semiconductors&nbsp;in&nbsp;America.&nbsp;More infrastructure and innovation&nbsp;in&nbsp;America.&nbsp;More goods moving faster and cheaper&nbsp;in&nbsp;America.

Voyage Rerank

Voyage AI提供专有的神经网络模型（voyage-rerank-2、voyage-rerank-2-lite），可通过API访问。这些模型很可能是经过精细调整的高级交叉编码器，旨在实现最高的相关性评分。

主要特点：其主要特色在于在基准测试中获得了顶级的相关性得分。Voyage提供了一个简单的Python客户端库，方便集成。精简版在性能和速度/成本之间取得了平衡。
性能：voyage-rerank-2在纯相关性准确度方面经常领先于基准测试。精简模型的表现与其他强劲竞争对手相当。高精度rerank-2模型的延迟可能比某些竞争对手略高。成本与API使用情况相关。
优势：一流的相关性，可能是目前最准确的选择。通过Python客户端即可轻松使用。
缺点：基于专有API的服务，且需要支付相关费用。准确率最高的模型可能比其他模型略慢。
理想用例：最适合最大化相关性至关重要的应用程序，例如财务分析、法律文件审查或准确性超过轻微速度差异的高风险问答。

示例代码

首先安装Voyage库

%pip install --upgrade --quiet &nbsp;voyageai%pip install --upgrade --quiet &nbsp;langchain-voyageai

设置Cohere和ContextualCompressionRetriever

from&nbsp;langchain_community.document_loaders&nbsp;import&nbsp;TextLoaderfrom&nbsp;langchain_community.vectorstores&nbsp;import&nbsp;FAISSfrom&nbsp;langchain.retrievers&nbsp;import&nbsp;ContextualCompressionRetrieverfrom&nbsp;langchain_openai&nbsp;import&nbsp;OpenAIfrom&nbsp;langchain_voyageai&nbsp;import&nbsp;VoyageAIRerankfrom&nbsp;langchain_text_splitters&nbsp;import&nbsp;RecursiveCharacterTextSplitterfrom&nbsp;langchain_voyageai&nbsp;import&nbsp;VoyageAIEmbeddingsdocuments = TextLoader("../../how_to/state_of_the_union.txt").load()text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)texts = text_splitter.split_documents(documents)retriever = FAISS.from_documents(&nbsp; &nbsp;texts, VoyageAIEmbeddings(model="voyage-law-2")).as_retriever(search_kwargs={"k":&nbsp;20})llm = OpenAI(temperature=0)compressor = VoyageAIRerank(model="rerank-lite-1", voyageai_api_key=os.environ["VOYAGE_API_KEY"], top_k=3)compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=retriever)compressed_docs = compression_retriever.invoke("What did the president say about Ketanji Jackson Brown")pretty_print_docs(compressed_docs)

输出：

Document 1:One of the most serious constitutional responsibilities a President has is&nbsp;nominating someone to serve on the United States Supreme Court.And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji&nbsp;Brown Jackson. One of our nation’s top legal minds, who will&nbsp;continue&nbsp;Justice&nbsp;Breyer’s legacy of excellence.----------------------------------------------------------------------------------------------------Document 2:So&nbsp;let’s not abandon our streets. Or choose between safety and equal justice.Let’s come together to protect our communities, restore trust, and hold law&nbsp;enforcement accountable.That’s why the Justice Department required body cameras, banned chokeholds, and&nbsp;restricted no-knock warrants&nbsp;for&nbsp;its officers.----------------------------------------------------------------------------------------------------Document 3:I spoke with their families and told them that we are forever&nbsp;in&nbsp;debt&nbsp;for&nbsp;their&nbsp;sacrifice, and we will carry on their mission to restore the trust and safety every&nbsp;community deserves.I’ve worked on these issues a long time.I know what works: Investing&nbsp;in&nbsp;crime prevention and community police officers&nbsp;who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and&nbsp;safety.So&nbsp;let’s not abandon our streets. Or choose between safety and equal justice.

Jina Reranker

Jina提供了重排序解决方案，包括Jina Reranker v2和Jina-ColBERT等神经模型。Jina Reranker v2很可能是一个跨编码器风格的模型。Jina-ColBERT使用Jina的基础模型实现了ColBERT架构（下文将详细介绍）。

主要特点：Jina提供经济实惠且性能卓越的选项。其突出特点是Jina-ColBERT能够处理超长文档，支持高达8,000个词条的上下文长度。这减少了对长文本进行大段分块的需要。开源组件也是Jina生态系统的一部分。
性能：Jina Reranker v2在速度、成本和相关性方面均表现出色。Jina-ColBERT在处理长源文档方面表现出色。其成本通常也具有竞争力。
优势：性能均衡、经济高效、通过Jina-ColBERT出色地处理长文档、具有可用的开源部件的灵活性。
弱点：标准Jina重排器可能无法达到Voyage顶级等专业模型的绝对峰值准确度。
理想用例：通用RAG系统、处理长文档（技术手册、研究论文、书籍）的应用程序、需要在成本和性能之间取得良好平衡的项目。

示例代码

from&nbsp;langchain_community.document_loaders&nbsp;import&nbsp;TextLoaderfrom&nbsp;langchain_community.embeddings&nbsp;import&nbsp;JinaEmbeddingsfrom&nbsp;langchain_community.vectorstores&nbsp;import&nbsp;FAISSfrom&nbsp;langchain_text_splitters&nbsp;import&nbsp;RecursiveCharacterTextSplitterdocuments = TextLoader(&nbsp; &nbsp;"../../how_to/state_of_the_union.txt",).load()text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)texts = text_splitter.split_documents(documents)embedding = JinaEmbeddings(model_name="jina-embeddings-v2-base-en")retriever = FAISS.from_documents(texts, embedding).as_retriever(search_kwargs={"k":&nbsp;20})query =&nbsp;"What did the president say about Ketanji Brown Jackson"docs = retriever.get_relevant_documents(query)

使用Jina进行重新排名

from&nbsp;langchain.retrievers&nbsp;import&nbsp;ContextualCompressionRetrieverfrom&nbsp;langchain_community.document_compressors&nbsp;import&nbsp;JinaRerankcompressor = JinaRerank()compression_retriever = ContextualCompressionRetriever(&nbsp; &nbsp;base_compressor=compressor, base_retriever=retriever)compressed_docs = compression_retriever.get_relevant_documents(&nbsp; &nbsp;"What did the president say about Ketanji Jackson Brown")pretty_print_docs(compressed_docs)

输出：

Document 1:So&nbsp;let’s not abandon our streets. Or choose between safety and equal justice.Let’s come together to protect our communities, restore trust, and hold law&nbsp;enforcement accountable.That’s why the Justice Department required body cameras, banned chokeholds, and&nbsp;restricted no-knock warrants&nbsp;for&nbsp;its officers.----------------------------------------------------------------------------------------------------Document 2:I spoke with their families and told them that we are forever&nbsp;in&nbsp;debt&nbsp;for&nbsp;their&nbsp;sacrifice, and we will carry on their mission to restore the trust and safety every&nbsp;community deserves.I’ve worked on these issues a long time.I know what works: Investing&nbsp;in&nbsp;crime prevention and community police officers&nbsp;who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and&nbsp;safety.So&nbsp;let’s not abandon our streets. Or choose between safety and equal justice.

ColBERT

ColBERT（基于BERT的后训练的）是一个多向量模型。它不是用一个向量来表示文档，而是创建多个语境化向量（通常每个标记一个）。它使用一种“后期交互”机制，将查询向量与编码后的多个文档向量进行比较。这使得文档向量可以预先计算并索引。

主要特点：其架构允许在文档被索引后从大型集合中高效检索。多向量方法支持查询词与文档内容之间的细粒度比较。这是一种开源方法。
性能：ColBERT在检索有效性和效率之间实现了良好的平衡，尤其是在大规模情况下。初始索引步骤之后，检索延迟较低。主要成本在于索引和自托管的计算成本。
优点：对大型文档集非常高效、可扩展检索、开源灵活性。
缺点：初始索引过程可能需要大量计算并需要大量存储空间。
理想用例：大规模RAG应用程序、需要快速检索数百万或数十亿文档的系统、预计算时间可接受的场景。

示例代码

安装Ragtouille库以使用ColBERT重新排序器。

pip install -U ragatouille

现在设置ColBERT重新排序器

from&nbsp;ragatouille&nbsp;import&nbsp;RAGPretrainedModelfrom&nbsp;langchain.retrievers&nbsp;import&nbsp;ContextualCompressionRetrieverRAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")compression_retriever = ContextualCompressionRetriever(&nbsp; &nbsp; base_compressor=RAG.as_langchain_document_compressor(), base_retriever=retriever)compressed_docs = compression_retriever.invoke(&nbsp; &nbsp;&nbsp;"What animation studio did Miyazaki found")print(compressed_docs[0])

输出：

Document(page_content='In June 1985, Miyazaki, Takahata, Tokuma and Suzuki founded&nbsp;the animation production company Studio Ghibli, with funding from Tokuma Shoten.&nbsp;Studio Ghibli's first film, Laputa: Castle&nbsp;in&nbsp;the Sky (1986), employed the same&nbsp;production crew of Nausicaä. Miyazaki's designs for the film's setting were&nbsp;inspired by Greek architecture and&nbsp;"European urbanistic templates". Some of the&nbsp;architecture&nbsp;in&nbsp;the film was also inspired by a Welsh mining town; Miyazaki&nbsp;witnessed the mining strike upon his first', metadata={'relevance_score':&nbsp;26.5194149017334})

FlashRank

FlashRank被设计为一个非常轻量且快速的重排序库，通常利用较小且经过优化的Transformer模型（通常是较大模型的精简或修剪版本）。它旨在以最小的计算开销，在简单的相似性搜索基础上显著提升相关性。它的功能类似于交叉编码器，但使用了一些技术来加速处理过程。它通常以开源Python库的形式提供。

主要特点：其主要特点是速度和效率。它易于集成，资源消耗低（CPU或中等GPU使用率）。通常只需极少的代码即可实现。
性能：虽然FlashRank的准确率无法达到Cohere或Voyage等大型交叉编码器的峰值，但其目标是在无重排序或基本双编码器重排序的基础上实现显著提升。其速度使其适用于实时或高吞吐量场景。成本极低（自托管计算）。
优点：推理速度非常快，计算要求低，易于集成，开源。
缺点：准确率可能低于更大、更复杂的重排序模型。与更广泛的框架相比，模型选择可能更有限。
理想用例：需要在资源受限的硬件（如CPU或边缘设备）上快速重新排序的应用程序、延迟至关重要的大容量搜索系统、寻求简单且“聊胜于无”的重新排序步骤且复杂性最小的项目。

示例代码

from&nbsp;langchain.retrievers&nbsp;import&nbsp;ContextualCompressionRetrieverfrom&nbsp;langchain.retrievers.document_compressors&nbsp;import&nbsp;FlashrankRerankfrom&nbsp;langchain_openai&nbsp;import&nbsp;ChatOpenAIllm = ChatOpenAI(temperature=0)compressor = FlashrankRerank()compression_retriever = ContextualCompressionRetriever(&nbsp; &nbsp;base_compressor=compressor, base_retriever=retriever)compressed_docs = compression_retriever.invoke(&nbsp; &nbsp;"What did the president say about Ketanji Jackson Brown")print([doc.metadata["id"]&nbsp;for&nbsp;doc&nbsp;in&nbsp;compressed_docs])pretty_print_docs(compressed_docs)

此代码片段利用ContextualCompressionRetriever中的FlashrankRerank函数来提升检索到的文档的相关性。它根据查询“总统对Ketanji Jackson Brown有何评价”的相关性，对基础检索器（用检索器表示）获取的文档进行重新排序。最后，它会打印文档ID以及压缩后、重新排序后的文档。

输出：

[0, 5, 3]Document 1:One of the most serious constitutional responsibilities a President has is&nbsp;nominating someone to serve on the United States Supreme Court.And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji&nbsp;Brown Jackson. One of our nation’s top legal minds, who will&nbsp;continue&nbsp;Justice&nbsp;Breyer’s legacy of excellence.----------------------------------------------------------------------------------------------------Document 2:He met the Ukrainian people.From President Zelenskyy to every Ukrainian, their fearlessness, their courage,&nbsp;their determination, inspires the world.Groups of citizens blocking tanks with their bodies. Everyone from students to&nbsp;retirees teachers turned soldiers defending their homeland.In this struggle as President Zelenskyy said&nbsp;in&nbsp;his speech to the European&nbsp;Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United&nbsp;States is here tonight.----------------------------------------------------------------------------------------------------Document 3:And tonight, I’m announcing that the Justice Department will name a chief prosecutorfor&nbsp;pandemic fraud.By the end of this year, the deficit will be down to less than half what it was&nbsp;before I took office. &nbsp;The only president ever to cut the deficit by more than one trillion dollars&nbsp;in&nbsp;a&nbsp;single year.Lowering your costs also means demanding more competition.I’m a capitalist, but capitalism without competition isn’t capitalismIt’s exploitation—and it drives up prices.The output shoes it reranks the retrieved chunks based on the relevancy.

MixedBread

该系列由Mixedbread AI提供，包括mxbai-rerank-base-v2（0.5亿参数）和mxbai-rerank-large-v2（1.5亿参数）。它们是基于Qwen-2.5架构的开源（Apache 2.0许可证）交叉编码器。其关键区别在于训练过程，在初始训练的基础上融入了三阶段强化学习（RL）方法（GRPO、对比学习、偏好学习）。

主要特点：声称在各项基准测试（例如BEIR）中均拥有顶尖性能。支持超过100种语言。可处理多达8k个token的长上下文（并兼容32k个token）。设计用于处理多种数据类型，包括文本、代码、JSON，并支持LLM工具选择。可通过Hugging Face和Python库获取。
性能：Mixedbread发布的基准测试表明，这些模型在BEIR上的表现优于其他顶级开源和闭源竞争对手，例如Cohere和Voyage（大型模型得分57.49，基础模型得分55.57）。它们还展现出显著的速度优势，15亿参数模型在延迟测试中明显快于其他大型开源重排序模型。成本为自托管计算资源。
优势：高基准性能（声称SOTA）、开源许可、相对于准确度的快速推理速度、广泛的语言支持、非常长的上下文窗口、跨数据类型的多功能性（代码、JSON）。
缺点：需要自托管和基础设施管理。作为相对较新的模型，长期性能和社区审查仍在进行中。
理想用例：需要顶级性能的通用RAG、多语言应用程序、处理代码、JSON或长文档的系统、LLM工具/函数调用选择、偏爱高性能开源模型的团队。

示例代码

!pip install mxbai_rerankfrom&nbsp;mxbai_rerank&nbsp;import&nbsp;MxbaiRerankV2# Load the model, here we use our base sized modelmodel = MxbaiRerankV2("mixedbread-ai/mxbai-rerank-base-v2")# Example query and documentsquery =&nbsp;"Who wrote To Kill a Mockingbird?"documents = ["To Kill a Mockingbird is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.","The novel Moby-Dick was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.","Harper Lee, an American novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.","Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.","The Harry Potter series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.","The Great Gatsby, a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."]# Calculate the scoresresults = model.rank(query, documents)print(results)

输出：

[RankResult(index=0, score=9.847987174987793, document='To Kill a Mockingbird is a&nbsp;novel by Harper Lee published in 1960. It was immediately successful, winning the&nbsp;Pulitzer Prize, and has become a classic of modern American literature.'),&nbsp;RankResult(index=2, score=8.258672714233398, document='Harper Lee, an American&nbsp;novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in&nbsp;Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.'),RankResult(index=3, score=3.579845428466797, document='Jane Austen was an English&nbsp;novelist known primarily for her six major novels, which interpret, critique and&nbsp;comment upon the British landed gentry at the end of the 18th century.'),&nbsp;RankResult(index=4, score=2.716982841491699, document='The Harry Potter series,&nbsp;which consists of seven fantasy novels written by British author J.K. Rowling, is&nbsp;among the most popular and critically acclaimed books of the modern era.'),&nbsp;RankResult(index=1, score=2.233165740966797, document='The novel Moby-Dick was&nbsp;written by Herman Melville and first published in 1851. It is considered a&nbsp;masterpiece of American literature and deals with complex themes of obsession,&nbsp;revenge, and the conflict between good and evil.'),&nbsp;RankResult(index=5, score=1.8150043487548828, document='The Great Gatsby, a novel&nbsp;written by American author F. Scott Fitzgerald, was published in 1925. The story is&nbsp;set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit&nbsp;of Daisy Buchanan.')]

如何判断 Reranker 是否正常工作

评估重排序工具非常重要。常用指标有助于衡量其有效性：

Accuracy@k：相关文档在前k个结果中出现的频率。
Precision@k：前k个结果中相关文档的比例。
Recall@k：在前k个结果中找到的所有相关文档的比例。
归一化折扣累积增益（NDCG）：通过综合考虑相关性和位置来衡量排名质量。排名较高的相关文档对得分的贡献更大。该值已归一化（0到1），以便进行比较。
平均倒数排名（MRR）：关注找到的第一个相关文档的排名。它是多个查询的1/rank的平均值。当需要快速找到一个合适的结果时，该指标非常有用。
F1-score：精确度和召回率的调和平均值，提供平衡的观点。

根据需求选择合适的 Reranker

选择最佳的重排序器需要平衡几个因素：

相关性需求：应用程序需要的结果有多准确？
延迟：重排序器必须多快返回结果？速度对于实时应用至关重要。
可扩展性：该模型能否处理当前和未来的数据量和用户负载？
集成：重排序器如何轻松融入现有的RAG管道（嵌入模型、矢量数据库、LLM框架）？
领域特异性：是否需要一个针对特定领域数据进行训练的模型？
成本：考虑私有模型的API费用或自托管模型的计算成本。

存在一些权衡：

交叉编码器精度较高，但速度较慢。
双编码器速度更快、可扩展，但精度可能稍差。
基于LLM的重排序器可能非常准确，但成本高昂且速度慢。
多向量模型旨在实现平衡。
基于分数的方法速度最快，但可能缺乏语义深度。

最佳的重排序器应适合特定的性能、效率和成本要求。

写在最后

RAG的重排序器对于充分利用RAG系统至关重要。它们可以优化输入到LLM的信息，从而获得更优、更可靠的结果。市面上有各种各样的模型可供选择，从高精度交叉编码器到高效的双编码器，再到像ColBERT这样的专用模型，开发者可以自由选择。选择合适的模型需要理解准确率、速度、可扩展性和成本之间的权衡。随着RAG的发展，尤其是在处理多样化数据类型方面，RAG的重排序器将继续在构建更智能、更可靠的AI应用中发挥关键作用。谨慎的评估和选择仍然是成功的关键。

常见问题

Q1. 什么是检索增强生成（RAG）？

答：RAG是一种改进大型语言模型（LLM）的技术，它允许模型在生成响应之前检索外部信息。这使得模型更加准确、适应性更强，并且无需重新训练即可吸收新知识。

Q2.为什么在RAG系统中初始检索不够？

答：初始检索方法（例如关键词搜索或向量相似度）可以返回许多文档，但并非所有文档都高度相关。这可能会导致输入噪声，从而降低LLM的性能。为了提高答案质量，有必要对这些结果进行优化。

Q3. Reranker在RAG中起什么作用？

答：重排序器会根据检索到的文档与查询的相关性对其进行重新排序。它们充当质量过滤器，确保最相关的信息在传递给LLM生成答案之前得到优先处理。

Q4. 为什么Cohere Rerank是一个不错的选择？

答：Cohere Rerank提供高精度、多语言支持和基于API的集成。其“Nimble”版本针对更快的响应进行了优化，使其成为实时应用的理想选择。

Q5. 为什么bge-reranker受到开源用户的欢迎？

答：bge-reranker是开源的，可以自行托管，在保持高精度的同时降低成本。它适合希望完全掌控模型的团队。

参考资料

[1] Cohere: https://cohere.com/rerank
[2] bge-reranker: https://huggingface.co/BAAI/bge-reranker-large
[3] Voyage: https://docs.voyageai.com/docs/reranker
[4] Jina: https://huggingface.co/jinaai/jina-embeddings-v2-base-en
[5] FlashRank: https://github.com/PrithivirajDamodaran/FlashRank
[6] ColBERT: https://huggingface.co/colbert-ir/colbertv2.0
[7] MixedBread (mxbai-rerank-v2): https://www.mixedbread.com/blog/mxbai-rerank-v2

2025年RAG最佳Reranker模型：深入解析其工作原理、优势与选择

为什么初始检索还不够

引入Reranker：优化搜索

重新排序如何改善 RAG

2025年最佳Reranker模型

Cohere Rerank

示例代码

bge-reranker（Base/Large）

示例代码

Voyage Rerank

示例代码

Jina Reranker

示例代码

ColBERT

示例代码

FlashRank

示例代码

MixedBread

示例代码

如何判断 Reranker 是否正常工作

根据需求选择合适的 Reranker

写在最后

常见问题

参考资料

发表回复取消回复

最新内容

《亚洲水发展展望2025》深度解读：亚太水安全喜忧参半，未来挑战何在？

谷歌支付6800万美元和解语音助手监听诉讼，你的隐私可能被“误触发”录音

甲骨文豪掷500亿美元押注AI基建，美国数据中心版图加速扩张

OpenAI总裁豪掷2500万美元支持特朗普，科技巨头与政坛的深度捆绑引关注

相关内容

OpenAI 再度开源安全分类模型 gpt-oss-safeguard：准确率超越 GPT-5，详解其优势与应用

零一万物联合开源中国推出OAK平台：构建Agent世界的基础设施

洞察生成式AI用户：产品设计必须跨越的认知鸿沟

DeepSeek-OCR深度解析：面向PM的视觉语言模型与智能体上下文工程新范式

分类

快速链接

为什么初始检索还不够

You Might Also Like

引入Reranker：优化搜索

重新排序如何改善 RAG

2025年最佳Reranker模型

Cohere Rerank

示例代码

bge-reranker（Base/Large）

示例代码

Voyage Rerank

示例代码

Jina Reranker

示例代码

ColBERT

示例代码

FlashRank

示例代码

MixedBread

示例代码

如何判断 Reranker 是否正常工作

根据需求选择合适的 Reranker

写在最后

常见问题

参考资料

发表回复 取消回复

最新内容

分类

快速链接

发表回复取消回复