Phi-3.5-mini-instruct-onnx模型创建一个对话机器人本文介绍了如何使用C#调用Phi-3.5-mi

模型地址microsoft/Phi-3.5-mini-instruct-onnx at main，这里我下载的是 cpu_and_mobile 版本的。

首先下载 Microsoft.ML.OnnxRuntimeGenAI nuget库，版本为0.52.0

注意，最好使用该版本的，高版本的api不兼容。

使用phi-cpu小模型

using Microsoft.ML.OnnxRuntimeGenAI;  
  
namespace small_model_example;  
  
class Program  
{  
    static void Main(string[] args)  
    {        
     // 指定模型路径  
        var modelPath = @"F:\AI-Models\phi-3.5model";  
        // 创建Model对象，加载模型  
        var model = new Model(modelPath);  
        // 创建Tokenizer对象，用于文本的编码和解码  
        var tokenizer = new Tokenizer(model);  
        // 设置系统提示，定义AI助手的行为风格  
        //“您是一个帮助人们查找信息的AI助手。请使用直接的风格回答问题。不要分享用户未请求的额外信息。”  
        var systemPrompt = "You are an AI assistant that helps people find information. Answer questions using a direct style. Do not share more information that the requested by the users.";  
        // 提示用户输入问题，空字符串退出  
        Console.WriteLine(@"Ask your question. Type an empty string to Exit.");  
        // 循环等待用户输入问题  
        while (true)  
        {            
         Console.WriteLine();  
            Console.Write(@"Q: ");  
            var userQ = Console.ReadLine();  
            // 如果用户输入为空字符串，则退出循环  
            if (string.IsNullOrEmpty(userQ))  
            {                
             break;  
            }        
            // 显示AI助手的回答前缀  
            Console.Write("Phi3: ");  
            // 构建完整的提示文本，包括系统提示、用户问题和AI助手的开始标记  
            var fullPrompt = $"<|system|>{systemPrompt}<|end|><|user|>{userQ}<|end|><|assistant|>";  
            // 使用Tokenizer将文本编码为tokens  
            var tokens = tokenizer.Encode(fullPrompt);  
         // 创建GeneratorParams对象，设置生成参数  
            var generatorParams = new GeneratorParams(model);  
            // 设置最大生成长度  
            generatorParams.SetSearchOption("max_length", 2048);  
            // 设置past和present是否共享缓冲区，这里设置为false  
            generatorParams.SetSearchOption("past_present_share_buffer", false);  
            // 设置输入序列  
            generatorParams.SetInputSequences(tokens);  
         // 创建Generator对象，用于生成文本  
            var generator = new Generator(model, generatorParams);  
            // 循环生成文本，直到生成完成  
            while (!generator.IsDone())  
            {                
             // 计算logits  
                generator.ComputeLogits();  
                // 生成下一个token  
                generator.GenerateNextToken();  
                // 获取当前生成的序列  
                var outputTokens = generator.GetSequence(0);  
                // 获取新生成的token  
                var newToken = outputTokens.Slice(outputTokens.Length - 1, 1);  
                // 解码新生成的token为文本  
                var output = tokenizer.Decode(newToken);  
                // 输出生成的文本  
                Console.Write(output);  
            }            
             // 换行，准备下一轮输入  
            Console.WriteLine();  
        }    
    }
}

初始的内存占用为2700MB，对话了6轮对话后，到了3863MB。

Pasted image 20250526200400.png 整体的一个效果，中文差强人意，会输出大量无关的内容和字符。

Pasted image 20250526200536.png

英文的效果还是可以的，这个和该模型的训练预料有关，后期如果想在中文方面加强的话，可以使用中文语料进行微调。

下面是英文对话的效果，效果还是可以的，从速度上来看，对于CPU来说是很快的，基本可以达到当前一些模型的效果。

Pasted image 20250526200743.png