LangChain基础02-ExampleSelectors

128 阅读7分钟

什么是ExampleSelectors

顾名思义 是示例选择器 我们在进行提示词的编写的时候我们通常会使用few-shot的方式 给大模型一些实例的提示 那么我们要如何从示例库中找到最符合上下文含义的N个实例呢? 这个就是示例选择器要做的事情

基本的使用

基类

class BaseExampleSelector(ABC):
    """Interface for selecting examples to include in prompts."""
    # 添加示例
    @abstractmethod
    def add_example(self, example: Dict[str, str]) -> Any:
        """Add new example to store."""
    # 异步添加示例
    async def aadd_example(self, example: Dict[str, str]) -> Any:
        """Add new example to store."""
        return await run_in_executor(None, self.add_example, example)
        
    # 示例选择
    @abstractmethod
    def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:
        """Select which examples to use based on the inputs."""
    # 示例选择
    async def aselect_examples(self, input_variables: Dict[str, str]) -> List[dict]:
        """Select which examples to use based on the inputs."""
        return await run_in_executor(None, self.select_examples, input_variables)

基于相似度的选择器

from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

examples = [
    {
        "question": "Who lived longer, Muhammad Ali or Alan Turing?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali
""",
    },
    {
        "question": "When was the founder of craigslist born?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the final answer is: December 6, 1952
""",
    },
    {
        "question": "Who was the maternal grandfather of George Washington?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball
""",
    },
    {
        "question": "Are both the directors of Jaws and Casino Royale from the same country?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who is the director of Jaws?
Intermediate Answer: The director of Jaws is Steven Spielberg.
Follow up: Where is Steven Spielberg from?
Intermediate Answer: The United States.
Follow up: Who is the director of Casino Royale?
Intermediate Answer: The director of Casino Royale is Martin Campbell.
Follow up: Where is Martin Campbell from?
Intermediate Answer: New Zealand.
So the final answer is: No
""",
    },
]
embeddings = OpenAIEmbeddings(openai_api_key="YOUR-OPEN-API-KEY",
                              openai_api_base="BASE-URL")
example_selector = SemanticSimilarityExampleSelector.from_examples(
    # 示例集合
    examples,
    # 词嵌入模型
    embeddings,
    # 向量数据库
    Chroma,
    # 要生成的示例数量
    k=1,
)

question = "Who was the father of Mary Ball Washington?"
selected_examples = example_selector.select_examples({"question": question})
print(f"Examples most similar to the input: \n"
      f"question:\n"
      f"{selected_examples[0]['question']}\n"
      f"answer:\n"
      f"{selected_examples[0]['answer']}"
      )

OUTPUT:

Examples most similar to the input: 
question:
Who was the maternal grandfather of George Washington?
answer:

Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball

基于最大边际相关项性(多样性)的选择器

其实这种选择器一般会用在推荐系统多一些 上面所说的相关性检索 只会通过用户的输出进行相似度的匹配 返回最符合条件的一些项 但是这就可能会使用户看到很多相似的内容 这也就是我们下面所要介绍的基于MMR的选择器

首先他也是通过相似度的搜索 所以也需要基于向量数据库 去做相似度的检索

基本使用

example_selector = MaxMarginalRelevanceExampleSelector.from_examples(
    # 示例集合
    examples,
    # 词嵌入模型
    embeddings,
    # 向量数据库
    Chroma,
    # 要生成的示例数量
    k=2,
)

question = "Who was the father of Mary Ball Washington?"
selected_examples = example_selector.select_examples({"question": question})
print(f"Examples most similar to the input:{selected_examples}")

基于最大子序列的选择器

基于n-gram重叠度算法 进行排序

基本概念:

  • n-gram:一个 n-gram 是一个连续的词序列,长度为 n。例如,对于 n=2(二元组),文本 "The quick brown fox" 可以分解为 ["The quick", "quick brown", "brown fox"]。
  • 重叠度:两个文本之间的 n-gram 重叠度可以衡量它们在语义上的相似性。

使用:

from langchain_community.example_selectors import NGramOverlapExampleSelector
from langchain_core.prompts import PromptTemplate

examples = [
    {
        "question": "Who lived longer, Muhammad Ali or Alan Turing?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali
""",
    },
    {
        "question": "When was the founder of craigslist born?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the final answer is: December 6, 1952
""",
    },
    {
        "question": "Who was the maternal grandfather of George Washington?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball
""",
    },
    {
        "question": "Are both the directors of Jaws and Casino Royale from the same country?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who is the director of Jaws?
Intermediate Answer: The director of Jaws is Steven Spielberg.
Follow up: Where is Steven Spielberg from?
Intermediate Answer: The United States.
Follow up: Who is the director of Casino Royale?
Intermediate Answer: The director of Casino Royale is Martin Campbell.
Follow up: Where is Martin Campbell from?
Intermediate Answer: New Zealand.
So the final answer is: No
""",
    },
]
example_prompt = PromptTemplate.from_template("Question: {question}\n{answer}")
selector = NGramOverlapExampleSelector(
    examples=examples,
    example_prompt=example_prompt
)
question = "Who was the father of Mary Ball Washington?"
selected_examples = selector.select_examples({"question": question})
print(f"selected:{selected_examples}")

OUTPUT:

selected:[{'question': 'Who was the maternal grandfather of George Washington?', 'answer': '\nAre follow up questions needed here: Yes.\nFollow up: Who was the mother of George Washington?\nIntermediate answer: The mother of George Washington was Mary Ball Washington.\nFollow up: Who was the father of Mary Ball Washington?\nIntermediate answer: The father of Mary Ball Washington was Joseph Ball.\nSo the final answer is: Joseph Ball\n'}, {'question': 'When was the founder of craigslist born?', 'answer': '\nAre follow up questions needed here: Yes.\nFollow up: Who was the founder of craigslist?\nIntermediate answer: Craigslist was founded by Craig Newmark.\nFollow up: When was Craig Newmark born?\nIntermediate answer: Craig Newmark was born on December 6, 1952.\nSo the final answer is: December 6, 1952\n'}, {'question': 'Who lived longer, Muhammad Ali or Alan Turing?', 'answer': '\nAre follow up questions needed here: Yes.\nFollow up: How old was Muhammad Ali when he died?\nIntermediate answer: Muhammad Ali was 74 years old when he died.\nFollow up: How old was Alan Turing when he died?\nIntermediate answer: Alan Turing was 41 years old when he died.\nSo the final answer is: Muhammad Ali\n'}, {'question': 'Are both the directors of Jaws and Casino Royale from the same country?', 'answer': '\nAre follow up questions needed here: Yes.\nFollow up: Who is the director of Jaws?\nIntermediate Answer: The director of Jaws is Steven Spielberg.\nFollow up: Where is Steven Spielberg from?\nIntermediate Answer: The United States.\nFollow up: Who is the director of Casino Royale?\nIntermediate Answer: The director of Casino Royale is Martin Campbell.\nFollow up: Where is Martin Campbell from?\nIntermediate Answer: New Zealand.\nSo the final answer is: No\n'}]



基于长度的选择器

基于长度的选择器 如果超过设置的最大长度示例将会被截断

结合prompt的使用

from langchain_core.prompts import FewShotPromptTemplate
from langchain_core.prompts import PromptTemplate
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

examples = [
    {
        "question": "Who lived longer, Muhammad Ali or Alan Turing?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali
""",
    },
    {
        "question": "When was the founder of craigslist born?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the final answer is: December 6, 1952
""",
    },
    {
        "question": "Who was the maternal grandfather of George Washington?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball
""",
    },
    {
        "question": "Are both the directors of Jaws and Casino Royale from the same country?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who is the director of Jaws?
Intermediate Answer: The director of Jaws is Steven Spielberg.
Follow up: Where is Steven Spielberg from?
Intermediate Answer: The United States.
Follow up: Who is the director of Casino Royale?
Intermediate Answer: The director of Casino Royale is Martin Campbell.
Follow up: Where is Martin Campbell from?
Intermediate Answer: New Zealand.
So the final answer is: No
""",
    },
]
embeddings = OpenAIEmbeddings(openai_api_key="OPEN-API-KEY",
                              openai_api_base="BASE_URL")
example_prompt = PromptTemplate.from_template("Question: {question}\n{answer}")
example_selector = SemanticSimilarityExampleSelector.from_examples(
    # This is the list of examples available to select from.
    examples,
    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.
    embeddings,
    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.
    Chroma,
    # This is the number of examples to produce.
    k=1,
)

prompt = FewShotPromptTemplate(
    example_prompt=example_prompt,
    suffix="Question: {input}",
    input_variables=["input"],
    example_selector=example_selector
)
question = "Who was the father of Mary Ball Washington?"
# selected_examples = example_selector.select_examples({"question": question})
res = prompt.invoke({"input": question})
print(f"res:{res}")
res:text='Question: Who was the maternal grandfather of George Washington?\n\nAre follow up questions needed here: Yes.\nFollow up: Who was the mother of George Washington?\nIntermediate 

answer: The mother of George Washington was Mary Ball Washington.\nFollow up: Who was the father of Mary Ball Washington?\nIntermediate answer: The father of Mary Ball Washington was Joseph Ball.\nSo the final answer is: Joseph Ball\n\n\nQuestion: Who was the father of Mary Ball Washington?'

总结

我们在这篇文章中为大家介绍了几种常见的示例选择器 根据不同的业务类型我们可以选择不同的示例选择器 并结合few-shot prompt进行使用 下一篇我们将会为大家带来chat-model 篇 function-call,结构化返回,流式调用等

持续更新中!!