Zero-shot-CoT(零样本思维链)解决多步推理任务方面取得了显着的成功,但在100个算术测试示例中的结果仍然存在三个问题:
-
计算错误:计算错误导致了错误的答案;(7%)
-
缺失步骤错误:在涉及多个步骤时,会漏掉一些中间推理步骤;(12%)
-
语义误解:在问题的语义理解和推理步骤的连贯性方面存在其他错误,这可能是由于LLMs的能力不足造成的。(27%)
为了解决Zero-shot-CoT中由于缺失推理步骤引起的问题,有了计划与解决(PS)提示方案Plan-and-Execute,Plan-and-Execute是一种提示策略或任务分解方法,通常用于提升语言模型在复杂问题求解中的表现。
论文中使用GPT-3的示例输入和输出,其中(a)使用Zero-shot-CoT提示输入,(b)使用Plan-and-Solve提示输入和(c)为Plan-and-Solve答案。虽然Zero-shot-CoT提示鼓励LLMs生成具有“让我们逐步思考”的多步推理,但在问题复杂时仍可能生成错误的推理步骤。与Zero-shot-CoT不同,PS提示首先要求LLMs制定解决问题的计划,生成逐步计划并执行计划以找到答案。
我们这边所使用的环境配置如下:
-
开发语言:nodejs
package.json的核心依赖:
"@langchain/community": "^0.3.35", "@langchain/core": "^0.3.42 ", "@langchain/langgraph": "^0.2.51", "@langchain/ollama": "^0.2.0", "langchain": "^0.2.4", "langfuse-langchain": "^3.36.0",
大模型:qwen2.5:14b与llama3.3:latest,qwen2.5:14b对中文支持比较好,而llama3.3:latest对英文支持比较好
搜索引擎:searXNGSearch,这边使用自己搭建的搜索引擎,当然也可以使用SerpAPI
模型管理平台: ollama,它是一个专注于本地部署大型语言模型的工具,通过提供便捷的模型管理、丰富的预建模型库、跨平台支持以及灵活的自定义选项,使得开发者和研究人员能够在本地环境中高效利用大型语言模型进行各种自然语言处理任务,而无需依赖云服务或复杂的基础设施设置。
模型跟踪平台:langfuse,它是一个专门用于维护和管理大型语言模型(Large Language Models, LLMs)的平台。它旨在提高效的工具和技术支持,以帮助用户在生产环境中部署、监控和优化大型语言模型。
-
核心代码解说
-
计划
const plannerPrompt = ChatPromptTemplate.fromTemplate( `For the given objective, come up with a simple step by step plan. \ This plan should involve individual tasks, that if executed correctly will yield the correct answer. Do not add any superfluous steps. \ The result of the final step should be the final answer. Make sure that each step has all the information needed - do not skip steps. {objective}`, ); const model = llm.withStructuredOutput(plan,{ name: "plan", description: "This tool is used to plan the steps to follow" }); const planner = plannerPrompt.pipe(model);告诉大模型需要把问题进行拆解成步骤,针对中文prompt使用下面,可以直接把上面的英文进行翻译:
对于给定的objective,制定一个简单的分步计划。 该计划应包括单独的任务,如果执行得当,将产生正确的答案。不要添加任何多余的步骤。 最后一步的结果应该是最终答案。确保每个步骤都有所需的所有信息,不要跳过步骤。 {objective} -
指定了计划后,按计划执行,当前串行执行,一次只执行一个计划,执行器主要也是由搜索引擎执行
const tools = [searXNGSearchTool,currentDateTool,new Calculator()]; const agentExecutor = createReactAgent({ llm: llm, tools: tools, });这边仅添加三个调用工具,
searXNGSearchTool:搜索引擎工具
currentDateTool:获取当前的系统时间
new Calculator():计算工具
-
根据执行的结果,再次计划
const replannerPrompt = ChatPromptTemplate.fromTemplate( `For the given objective, come up with a simple step by step plan. This plan should involve individual tasks, that if executed correctly will yield the correct answer. Do not add any superfluous steps. The result of the final step should be the final answer. Make sure that each step has all the information needed - do not skip steps. Your objective was this: {input} Your original plan was this: {plan} You have currently done the follow steps: {pastSteps} Update your plan accordingly. If no more steps are needed and you can return to the user, then respond with that and use the 'response' function. Otherwise, fill out the plan. Only add steps to the plan that still NEED to be done. Do not return previously done steps as part of the plan.`, ); const parser = new JsonOutputToolsParser(); const replanner = replannerPrompt .pipe( llm.bindTools([ planTool, responseTool, ]), ).pipe(parser);此prompt可以把之前的计划、执行步骤、执行结果进行汇总分析,如果已经完成目标内容,则返回结果,如果没有完成,则继续拆解任务。
中文版本的prompt
对于给定的目标,制定一个简单的分步计划。 该计划应包括单独的任务,如果执行得当,将产生正确的答案。不要添加任何多余的步骤。 最后一步的结果应该是最终答案。确保每个步骤都有所需的所有信息,不要跳过步骤。 你的目标是: {input} 你最初的计划是这样的: {plan} 您当前已完成以下步骤: {pastSteps} 相应地更新你的计划。如果不需要更多步骤,并且您可以返回给用户,则使用“response”功能进行响应。 否则,填写plan。 只在计划中添加仍然需要完成的steps。不要将之前完成的步骤作为计划的一部分。完整的执行的流程图如下:
-
完整代码如下:
```
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
import { JsonOutputToolsParser } from "@langchain/core/output_parsers/openai_tools";
import {ChatOllama} from "@langchain/ollama";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { Annotation } from "@langchain/langgraph";
import { CallbackHandler } from "langfuse-langchain";
const PlanExecuteState = Annotation.Root({
input: Annotation({
reducer: (x, y) => y ?? x ?? "",
}),
plan: Annotation({
reducer: (x, y) => y ?? x ?? [],
}),
pastSteps: Annotation({
reducer: (x, y) => x.concat(y),
}),
response: Annotation({
reducer: (x, y) => y ?? x,
}),
})
import {tool} from "@langchain/core/tools";
import axios from "axios";
const searXNGSearchTool = tool(
async (input)=> {
try {
// 发送搜索请求到 SearXNG
// console.info("searXNGSearchTool called>>>")
const response = await axios.get(`http://引擎IP:引擎端口/search`, {
params: {
q: input, // 搜索关键词
format: "json" // 返回 JSON 格式
// engines: "google" // 可选:指定搜索引擎(默认使用所有引擎)
},
});
// 提取搜索结果
const results = response.data.results || [];
if (results.length === 0) {
return "No results found.";
}
// 格式化结果(标题 + 摘要 + 链接)
const formattedResults = results
.slice(0, 5) // 限制返回前 5 条结果,避免过长
.map((result, index) => {
return `${index + 1}. ${result.title}\n ${result.content || "No snippet available"}\n ${result.url}`;
})
.join("\n\n");
return formattedResults || "No relevant data extracted.";
} catch (error) {
console.error("SearXNG search failed:", error.message);
return `Error: ${error.message}`;
}
},
{
name:"searXNGSearchTool",
description:"A tool to search the web using a self-hosted SearXNG instance.",
schema:z.string()
}
);
const currentDateTool = tool(async (input)=> {
return new Date().toISOString();
},
{
name:"currentDateTool",
description:"A tool to get the current date and time.",
schema:z.object()
})
import { createReactAgent } from "@langchain/langgraph/prebuilt";
import { Calculator } from "@langchain/community/tools/calculator";
const llm = new ChatOllama({
baseUrl: 'http://IP:端口', // 默认 Ollama 本地地址
// model: 'huihui_ai/deepseek-r1-abliterated:32b', // 替换为你运行的模型名称,例如 "mistral" 或 "llama3"
model: 'llama3.3:latest', // 替换为你运行的模型名称,例如 "mistral" 或 "llama3"
// model: 'qwq:latest', // 替换为你运行的模型名称,例如 "mistral" 或 "llama3"
temperature: 0
})
const tools = [searXNGSearchTool,currentDateTool,new Calculator()];
const agentExecutor = createReactAgent({
llm: llm,
tools: tools,
});
const plan = zodToJsonSchema(
z.object({
steps: z
.array(z.string())
.describe("different steps to follow, should be in sorted order"),
}),
);
/*const planFunction = {
name: "plan",
description: "This tool is used to plan the steps to follow",
parameters: plan,
};*/
const planTool = {
type: "function",
function: {
name: "plan",
description: "This tool is used to plan the steps to follow",
parameters: plan,
},
};
const plannerPrompt = ChatPromptTemplate.fromTemplate(
`For the given objective, come up with a simple step by step plan. \
This plan should involve individual tasks, that if executed correctly will yield the correct answer. Do not add any superfluous steps. \
The result of the final step should be the final answer. Make sure that each step has all the information needed - do not skip steps.
{objective}`,
);
const model = llm.withStructuredOutput(plan,{ name: "plan",
description: "This tool is used to plan the steps to follow" });
const planner = plannerPrompt.pipe(model);
const response = zodToJsonSchema(
z.object({
response: z.string().describe("Response to user."),
}),
);
const responseTool = {
type: "function",
function: {
name: "response",
description: "Response to user.",
parameters: response,
},
};
const replannerPrompt = ChatPromptTemplate.fromTemplate(
`For the given objective, come up with a simple step by step plan.
This plan should involve individual tasks, that if executed correctly will yield the correct answer. Do not add any superfluous steps.
The result of the final step should be the final answer. Make sure that each step has all the information needed - do not skip steps.
Your objective was this:
{input}
Your original plan was this:
{plan}
You have currently done the follow steps:
{pastSteps}
Update your plan accordingly. If no more steps are needed and you can return to the user, then respond with that and use the 'response' function.
Otherwise, fill out the plan.
Only add steps to the plan that still NEED to be done. Do not return previously done steps as part of the plan.`,
);
const parser = new JsonOutputToolsParser();
const replanner = replannerPrompt
.pipe(
llm.bindTools([
planTool,
responseTool,
]),
).pipe(parser);
import { END, START, StateGraph } from "@langchain/langgraph";
import {HumanMessage} from "@langchain/core/messages";
async function executeStep(
state,
config,
) {
if(typeof state.plan == "string"){
state.plan = JSON.parse(state.plan);
}
const task = state.plan[0];
const input = {
messages: [new HumanMessage(task)],
};
const { messages } = await agentExecutor.invoke(input, config);
return {
pastSteps: [[task, messages[messages.length - 1].content.toString()]],
plan: state.plan.slice(1),
};
}
async function planStep(
state,
){
const plan = await planner.invoke({ objective: state.input });
return { plan: plan.steps };
}
async function replanStep(
state
) {
const input = {
input: state.input,
plan: state.plan.join("\n"),
pastSteps: state.pastSteps
.map(([step, result]) => `${step}: ${result}`)
.join("\n"),
}
console.info("input>>>",input)
const output = await replanner.invoke(input);
const toolCall = output[0];
if (toolCall.type == "response") {
return { response: toolCall.args?.response };
}
let plan = toolCall.args?.steps;
//需要返回数组
if(typeof plan == "string"){
plan = JSON.parse(plan);
}
return { plan: plan };
}
function shouldEnd(state) {
return state.response ? "true" : "false";
}
const workflow = new StateGraph(PlanExecuteState)
.addNode("planner", planStep)
.addNode("agent", executeStep)
.addNode("replan", replanStep)
.addEdge(START, "planner")
.addEdge("planner", "agent")
.addEdge("agent", "replan")
.addConditionalEdges("replan", shouldEnd, {
true: END,
false: "agent",
});
// Finally, we compile it!
// This compiles it into a LangChain Runnable,
// meaning you can use it as you would any other runnable
const app = workflow.compile();
const langfuseHandler = new CallbackHandler({
publicKey: "pk-lf-b11de3a4-651a-4000-83d1-d7b76f29f2a8",
secretKey: "sk-lf-c92e2786-871d-4a16-8ca6-da9ae6bd383d",
baseUrl: "http://langfuseIP:langfuse端口"
});
const config = { recursionLimit: 50,callbacks: [langfuseHandler] };
const inputs = {
input: "Where are the hometowns of the 2024 Olympic table tennis mixed doubles champions?",
};
let i = 0;
for await (const event of await app.stream(inputs, config)) {
console.log(">>>"+i++,JSON.stringify(event));
}
```
注意:
其中ip与端口需要换成自己的,另外中文模型使用qwen2.5:14b效果比较好,如果是英文使用llama3.3:latest,如果使用的不对那么答案会胡说八道。另外prompt也要换成对应的语言。如果问题复杂,会执行多轮agent与replan,就执行的比较久。
结语:
Plan-and-Solve(计划与解决)通常用于提升语言模型在复杂问题求解中的表现。它主要针对以下问题和挑战:
- 复杂问题的分解困难
-
问题:许多任务(例如数学题、多步骤推理、规划类问题)涉及多个子步骤,直接让模型一次性生成答案容易出错,因为模型可能跳过关键步骤、遗漏信息或推理不完整。
-
解决
:Plan-and-Solve 将任务分为两个阶段:
- 计划(Plan) :先制定一个清晰的步骤计划,明确需要做什么。
- 执行(Solve) :按照计划逐步执行,确保每个步骤都得到正确处理。
-
效果:通过分解问题,降低模型在单次推理中的认知负担,提高准确性。
- 缺乏系统性推理
- 问题:语言模型在处理需要系统性推理的任务时(如逻辑推理、数学计算、长篇问答),可能会生成无序或不连贯的响应,缺乏结构化思维。
- 解决:Plan-and-Solve 强制模型先规划再执行,确保推理过程有条理。例如,在代码中,replanner 通过提示要求模型生成有序的步骤计划(steps),然后逐步完成。
- 效果:输出更符合逻辑顺序,避免“想到哪说到哪”的混乱。
- 中间步骤的错误累积
- 问题:在多步骤任务中,模型可能在某个中间步骤出错,导致后续步骤全错,最终答案偏离正确结果。
- 解决:Plan-and-Solve 允许在计划阶段识别潜在问题,并在执行阶段逐步验证每一步。例如,在代码中 pastSteps 记录已完成步骤,模型可以根据已知信息调整计划,避免重复错误。
- 效果:通过显式记录和更新状态,减少错误传播,提高鲁棒性。
- 一次性回答的局限性
- 问题:对于需要外部信息或多次交互的任务(例如搜索、计算、数据分析),模型无法在单次响应中完成所有工作。
- 解决:Plan-and-Solve 将任务拆分为可执行的子任务,结合工具调用(如 searXNGSearchTool)或外部资源逐步完成。例如,在代码中模型先识别 2024 年澳网冠军(Jannik Sinner),再计划查找其家乡。
- 效果:支持动态调整和外部协作,适用于复杂、开放式问题。
- 用户需求的明确性不足
- 问题:用户提出的问题可能模糊或隐含多重目标,模型难以准确把握最终输出形式。
- 解决:Plan-and-Solve 通过规划阶段明确任务目标和路径。例如,代码中目标是“查找 2024 年澳网冠军的家乡”,计划明确分为“识别冠军”和“查找家乡”两步。
- 效果:提高对用户意图的理解,确保回答贴合需求。