Spring AI + MongoDB + 本地小模型构建 RAG 系统：代码解析向量算分与小模型痛点分析一、代码

一、代码案例核心作用解析

本案例基于 Java + Spring AI 框架，完整实现了私有化部署的轻量级 RAG（检索增强生成）问答系统，核心解决 “故障编码精准解释” 的垂直场景需求。整体代码分为四大核心模块，实现了从 “知识库向量化存储” 到 “用户查询 - 向量检索 - 小模型生成回答” 的全流程闭环。

1. 核心模块与代码作用

类名	核心作用
`MongoConfig`	配置 MongoDB 连接（认证、地址、库名），生成`MongoTemplate` Bean，为向量存储提供数据库操作入口
`MongoDBVectorStore`	自定义 MongoDB 向量存储实现，核心实现余弦相似度计算和向量检索逻辑，是整个系统的核心
`VectorStoreConfig`	配置 MongoDB 向量存储 Bean，关联`MongoTemplate`和 Embedding 模型
`OllamaLLMConfig`	配置 Ollama 本地小模型（qwen3:0.6b/deepseek-r1:1.5b）的连接和 ChatClient
`RagController`	对外提供 HTTP 接口，实现 “知识库初始化”“向量检索”“RAG 问答” 三大核心功能
`RagApplication`	项目启动类，扫描包并初始化 Spring 上下文

编辑

1.1MongoConfig

配置 MongoDB 连接（认证、地址、库名），生成MongoTemplate Bean，为向量存储提供数据库操作入口

package com.conca.ai.mongodb;

import com.mongodb.MongoClientSettings;
import com.mongodb.MongoCredential;
import com.mongodb.ServerAddress;
import com.mongodb.client.MongoClients;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.mongodb.core.MongoTemplate;

import java.util.Arrays;
import java.util.Collections;

@Configuration
public class MongoConfig {

    // 从配置文件读取参数（也可以硬编码，不推荐）
    private final String mongoHost = "127.0.0.1";
    private final int mongoPort = 27017;
    private final String mongoDatabase = "chat_memory_db";
    private final String mongoUsername = "root";
    private final String mongoPassword = "123123";
    private final String authDatabase = "admin";
    
    @Bean
    public MongoTemplate mongoTemplate() {
        // 1. 创建认证信息
        MongoCredential credential = MongoCredential.createCredential(
                mongoUsername,       // 用户名
                authDatabase,        // 认证数据库
                mongoPassword.toCharArray() // 密码
        );

        // 2. 配置 MongoClient 连接参数
        MongoClientSettings settings = MongoClientSettings.builder()
                .applyToClusterSettings(builder -> 
                        builder.hosts(Collections.singletonList(new ServerAddress(mongoHost, mongoPort))))
                .credential(credential) // 添加认证信息
                .build();

        // 3. 创建 MongoClient 并初始化 MongoTemplate
        return new MongoTemplate(MongoClients.create(settings), mongoDatabase);
    }
    
   
}

1.2 MongoDBVectorStore

自定义 MongoDB 向量存储实现，核心实现余弦相似度计算和向量检索逻辑，是整个系统的核心

/*
 * Copyright 2023-2025 the original author or authors.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *      https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package com.conca.ai.mongodb;

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.stream.Collectors;

import com.mongodb.MongoCommandException;
import com.mongodb.client.result.DeleteResult;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import org.springframework.ai.document.Document;
import org.springframework.ai.document.DocumentMetadata;
import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.ai.embedding.EmbeddingOptionsBuilder;
import org.springframework.ai.model.EmbeddingUtils;
import org.springframework.ai.observation.conventions.VectorStoreProvider;
import org.springframework.ai.vectorstore.AbstractVectorStoreBuilder;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.filter.Filter;
import org.springframework.ai.vectorstore.observation.AbstractObservationVectorStore;
import org.springframework.ai.vectorstore.observation.VectorStoreObservationContext;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.data.mongodb.UncategorizedMongoDbException;
import org.springframework.data.mongodb.core.MongoTemplate;
import org.springframework.data.mongodb.core.aggregation.Aggregation;
import org.springframework.data.mongodb.core.query.BasicQuery;
import org.springframework.data.mongodb.core.query.Criteria;
import org.springframework.data.mongodb.core.query.Query;
import org.springframework.util.Assert;

/**
 * MongoDB Atlas-based vector store implementation using the Atlas Vector Search.
 *
 * <p>
 * The store uses a MongoDB collection to persist vector embeddings along with their
 * associated document content and metadata. By default, it uses the "vector_store"
 * collection with a vector search index for similarity search operations.
 * </p>
 *
 * <p>
 * Features:
 * </p>
 * <ul>
 * <li>Automatic schema initialization with configurable collection and index
 * creation</li>
 * <li>Support for cosine similarity search</li>
 * <li>Metadata filtering using MongoDB Atlas Search syntax</li>
 * <li>Configurable similarity thresholds for search results</li>
 * <li>Batch processing support with configurable batching strategies</li>
 * <li>Observation and metrics support through Micrometer</li>
 * </ul>
 *
 * <p>
 * Basic usage example:
 * </p>
 * <pre>{@code
 * MongoDBAtlasVectorStore vectorStore = MongoDBAtlasVectorStore.builder(mongoTemplate, embeddingModel)
 *     .collectionName("vector_store")
 *     .initializeSchema(true)
 *     .build();
 *
 * // Add documents
 * vectorStore.add(List.of(
 *     new Document("content1", Map.of("key1", "value1")),
 *     new Document("content2", Map.of("key2", "value2"))
 * ));
 *
 * // Search with filters
 * List<Document> results = vectorStore.similaritySearch(
 *     SearchRequest.query("search text")
 *         .withTopK(5)
 *         .withSimilarityThreshold(0.7)
 *         .withFilterExpression("key1 == 'value1'")
 * );
 * }</pre>
 *
 * <p>
 * Advanced configuration example:
 * </p>
 * <pre>{@code
 * MongoDBAtlasVectorStore vectorStore = MongoDBAtlasVectorStore.builder(mongoTemplate, embeddingModel)
 *     .collectionName("custom_vectors")
 *     .vectorIndexName("custom_vector_index")
 *     .pathName("custom_embedding")
 *     .numCandidates(500)
 *     .metadataFieldsToFilter(List.of("category", "author"))
 *     .initializeSchema(true)
 *     .batchingStrategy(new TokenCountBatchingStrategy())
 *     .build();
 * }</pre>
 *
 * <p>
 * Database Requirements:
 * </p>
 * <ul>
 * <li>MongoDB Atlas cluster with Vector Search enabled</li>
 * <li>Collection with vector search index configured</li>
 * <li>Collection schema with id (string), content (string), metadata (document), and
 * embedding (vector) fields</li>
 * <li>Proper access permissions for index and collection operations</li>
 * </ul>
 *
 * @author Chris Smith
 * @author Soby Chacko
 * @author Christian Tzolov
 * @author Thomas Vitale
 * @author Ilayaperumal Gopinathan
 * @since 1.0.0
 */
public class MongoDBVectorStore extends AbstractObservationVectorStore implements InitializingBean {

	private static final Logger logger = LoggerFactory.getLogger(MongoDBVectorStore.class);

	public static final String ID_FIELD_NAME = "_id";

	public static final String METADATA_FIELD_NAME = "metadata";

	public static final String CONTENT_FIELD_NAME = "content";

	public static final String SCORE_FIELD_NAME = "score";

	public static final String DEFAULT_VECTOR_COLLECTION_NAME = "vector_store";

	private static final String DEFAULT_VECTOR_INDEX_NAME = "vector_index";

	private static final String DEFAULT_PATH_NAME = "embedding";

	private static final int DEFAULT_NUM_CANDIDATES = 200;

	private static final int INDEX_ALREADY_EXISTS_ERROR_CODE = 68;

	private static final String INDEX_ALREADY_EXISTS_ERROR_CODE_NAME = "IndexAlreadyExists";

	private final MongoTemplate mongoTemplate;

	private final String collectionName;

	private final String vectorIndexName;

	private final String pathName;

	private final List<String> metadataFieldsToFilter;

	private final int numCandidates;

//	private final MongoDBAtlasFilterExpressionConverter filterExpressionConverter;

	private final boolean initializeSchema;

	protected MongoDBVectorStore(Builder builder) {
		super(builder);

		Assert.notNull(builder.mongoTemplate, "MongoTemplate must not be null");

		this.mongoTemplate = builder.mongoTemplate;
		this.collectionName = builder.collectionName;
		this.vectorIndexName = builder.vectorIndexName;
		this.pathName = builder.pathName;
		this.numCandidates = builder.numCandidates;
		this.metadataFieldsToFilter = builder.metadataFieldsToFilter;
//		this.filterExpressionConverter = builder.filterExpressionConverter;
		this.initializeSchema = builder.initializeSchema;
	}

	@Override
	public void afterPropertiesSet() throws Exception {
		if (!this.initializeSchema) {
			return;
		}

		// Create the collection if it does not exist
		if (!this.mongoTemplate.collectionExists(this.collectionName)) {
			this.mongoTemplate.createCollection(this.collectionName);
		}
		// Create search index
		createSearchIndex();
	}

	private void createSearchIndex() {
		try {
			this.mongoTemplate.executeCommand(createSearchIndexDefinition());
		}
		catch (UncategorizedMongoDbException e) {
			Throwable cause = e.getCause();
			if (cause instanceof MongoCommandException commandException) {
				// Ignore any IndexAlreadyExists errors
				if (INDEX_ALREADY_EXISTS_ERROR_CODE == commandException.getCode()
						|| INDEX_ALREADY_EXISTS_ERROR_CODE_NAME.equals(commandException.getErrorCodeName())) {
					return;
				}
			}
			throw e;
		}
	}

	/**
	 * Provides the Definition for the search index
	 */
	private org.bson.Document createSearchIndexDefinition() {
		List<org.bson.Document> vectorFields = new ArrayList<>();

		vectorFields.add(new org.bson.Document().append("type", "vector")
			.append("path", this.pathName)
			.append("numDimensions", this.embeddingModel.dimensions())
			.append("similarity", "cosine"));

		vectorFields.addAll(this.metadataFieldsToFilter.stream()
			.map(fieldName -> new org.bson.Document().append("type", "filter").append("path", "metadata." + fieldName))
			.toList());

		return new org.bson.Document().append("createSearchIndexes", this.collectionName)
			.append("indexes",
					List.of(new org.bson.Document().append("name", this.vectorIndexName)
						.append("type", "vectorSearch")
						.append("definition", new org.bson.Document("fields", vectorFields))));
	}

	/**
	 * Maps a Bson Document to a Spring AI Document
	 * @param mongoDocument the mongoDocument to map to a Spring AI Document
	 * @return the Spring AI Document
	 */
	private Document mapMongoDocument(org.bson.Document mongoDocument, float[] queryEmbedding) {
		String id = mongoDocument.getString(ID_FIELD_NAME);
		String content = mongoDocument.getString(CONTENT_FIELD_NAME);
		Double score = mongoDocument.getDouble(SCORE_FIELD_NAME);
		Map<String, Object> metadata = mongoDocument.get(METADATA_FIELD_NAME, org.bson.Document.class);

		metadata.put(DocumentMetadata.DISTANCE.value(), 1 - score);

		// @formatter:off
		return Document.builder()
			.id(id)
			.text(content)
			.metadata(metadata)
			.score(score)
			.build(); // @formatter:on
	}

	@Override
	public void doAdd(List<Document> documents) {
		List<float[]> embeddings = this.embeddingModel.embed(documents, EmbeddingOptionsBuilder.builder().build(),
				this.batchingStrategy);
		for (Document document : documents) {
			MongoDBDocument mdbDocument = new MongoDBDocument(document.getId(), document.getText(),
					document.getMetadata(), embeddings.get(documents.indexOf(document)));
			this.mongoTemplate.save(mdbDocument, this.collectionName);
		}
	}

	@Override
	public void doDelete(List<String> idList) {
		Query query = new Query(org.springframework.data.mongodb.core.query.Criteria.where(ID_FIELD_NAME).in(idList));
		this.mongoTemplate.remove(query, this.collectionName);
	}

	@Override
	public List<Document> similaritySearch(String query) {
		return similaritySearch(SearchRequest.builder().query(query).build());
	}

	@Override
	public List<Document> doSimilaritySearch(SearchRequest request) {

		//第一步 查询参数进行编码
		float[] queryEmbedding = this.embeddingModel.embed(request.getQuery());
		System.out.println(request.getQuery()+">>"+queryEmbedding.length);

		//第二步查询出来MongoDB文档
		List<org.bson.Document> listDocument = this.mongoTemplate.findAll(org.bson.Document.class, this.collectionName);
		
		Map<org.bson.Document, Float> mapSimilarity = new HashMap<org.bson.Document, Float>();
		//第三步一次进行向量比较
		for(org.bson.Document doc : listDocument){
			// 将MongoDB中的List<Double>转换为float[]
            List<Double> vectorList = doc.getList("embedding", Double.class);
            float[] vector = new float[vectorList.size()];
            for (int i = 0; i < vectorList.size(); i++) {
                vector[i] = vectorList.get(i).floatValue();
            }
			float cosine = calculateCosineSimilarity(queryEmbedding, vector);
			System.out.println(doc.get("content")+">>>"+cosine);
			doc.put(SCORE_FIELD_NAME, Double.parseDouble(cosine+""));			
			mapSimilarity.put(doc, cosine);
		}

		//第四步获取评分最高的2名
		  // 1. 将Map.Entry转换为列表并排序
        List<Map.Entry<org.bson.Document, Float>> entryList = new ArrayList<>(mapSimilarity.entrySet());
        // 2. 自定义排序规则：按Value降序（Float.compare(a,b)是升序，反转则为降序）
        entryList.sort((entry1, entry2) -> Float.compare(entry2.getValue(), entry1.getValue()));
        // 3. 提取排序后的Document
        List<org.bson.Document> listSort = entryList.stream()
                .map(Map.Entry::getKey)
                .collect(Collectors.toList());
		
		return listSort
		.stream()
		.limit(2)
		.map(d -> mapMongoDocument(d, queryEmbedding))
		.toList();
	}
	
	  /**
     * 计算余弦相似度（核心算法）
     * 公式：cosθ = (A·B) / (||A|| * ||B||)
     * 结果范围[-1,1]，越接近1表示越相似
     */
    private static float calculateCosineSimilarity(float[] vectorA, float[] vectorB) {
        if (vectorA.length != vectorB.length) {
            throw new IllegalArgumentException("向量维度不一致");
        }

        float dotProduct = 0.0f; // 点积
        float normA = 0.0f;      // 向量A的模长
        float normB = 0.0f;      // 向量B的模长

        for (int i = 0; i < vectorA.length; i++) {
            dotProduct += vectorA[i] * vectorB[i];
            normA += vectorA[i] * vectorA[i];
            normB += vectorB[i] * vectorB[i];
        }

        // 避免除以0
        if (normA == 0 || normB == 0) {
            return 0.0f;
        }

        return dotProduct / (float) (Math.sqrt(normA) * Math.sqrt(normB));
    }

	@Override
	public VectorStoreObservationContext.Builder createObservationContextBuilder(String operationName) {

		return VectorStoreObservationContext.builder(VectorStoreProvider.MONGODB.value(), operationName)
			.collectionName(this.collectionName)
			.dimensions(this.embeddingModel.dimensions())
			.fieldName(this.pathName);
	}

	@Override
	public <T> Optional<T> getNativeClient() {
		@SuppressWarnings("unchecked")
		T client = (T) this.mongoTemplate;
		return Optional.of(client);
	}

	/**
	 * Creates a new builder instance for MongoDBAtlasVectorStore.
	 * @return a new MongoDBBuilder instance
	 */
	public static Builder builder(MongoTemplate mongoTemplate, EmbeddingModel embeddingModel) {
		return new Builder(mongoTemplate, embeddingModel);
	}

	public static class Builder extends AbstractVectorStoreBuilder<Builder> {

		private final MongoTemplate mongoTemplate;

		private String collectionName = DEFAULT_VECTOR_COLLECTION_NAME;

		private String vectorIndexName = DEFAULT_VECTOR_INDEX_NAME;

		private String pathName = DEFAULT_PATH_NAME;

		private int numCandidates = DEFAULT_NUM_CANDIDATES;

		private List<String> metadataFieldsToFilter = Collections.emptyList();

		private boolean initializeSchema = false;

//		private MongoDBAtlasFilterExpressionConverter filterExpressionConverter = new MongoDBAtlasFilterExpressionConverter();

		/**
		 * @throws IllegalArgumentException if mongoTemplate is null
		 */
		private Builder(MongoTemplate mongoTemplate, EmbeddingModel embeddingModel) {
			super(embeddingModel);
			Assert.notNull(mongoTemplate, "MongoTemplate must not be null");
			this.mongoTemplate = mongoTemplate;
		}

		/**
		 * Configures the collection name. This must match the name of the collection for
		 * the Vector Search Index in Atlas.
		 * @param collectionName the name of the collection
		 * @return the builder instance
		 * @throws IllegalArgumentException if collectionName is null or empty
		 */
		public Builder collectionName(String collectionName) {
			Assert.hasText(collectionName, "Collection Name must not be null or empty");
			this.collectionName = collectionName;
			return this;
		}

		/**
		 * Configures the vector index name. This must match the name of the Vector Search
		 * Index Name in Atlas.
		 * @param vectorIndexName the name of the vector index
		 * @return the builder instance
		 * @throws IllegalArgumentException if vectorIndexName is null or empty
		 */
		public Builder vectorIndexName(String vectorIndexName) {
			Assert.hasText(vectorIndexName, "Vector Index Name must not be null or empty");
			this.vectorIndexName = vectorIndexName;
			return this;
		}

		/**
		 * Configures the path name. This must match the name of the field indexed for the
		 * Vector Search Index in Atlas.
		 * @param pathName the name of the path
		 * @return the builder instance
		 * @throws IllegalArgumentException if pathName is null or empty
		 */
		public Builder pathName(String pathName) {
			Assert.hasText(pathName, "Path Name must not be null or empty");
			this.pathName = pathName;
			return this;
		}

		/**
		 * Sets the number of candidates for vector search.
		 * @param numCandidates the number of candidates
		 * @return the builder instance
		 */
		public Builder numCandidates(int numCandidates) {
			this.numCandidates = numCandidates;
			return this;
		}

		/**
		 * Sets the metadata fields to filter in vector search.
		 * @param metadataFieldsToFilter list of metadata field names
		 * @return the builder instance
		 * @throws IllegalArgumentException if metadataFieldsToFilter is null or empty
		 */
		public Builder metadataFieldsToFilter(List<String> metadataFieldsToFilter) {
			Assert.notEmpty(metadataFieldsToFilter, "Fields list must not be empty");
			this.metadataFieldsToFilter = metadataFieldsToFilter;
			return this;
		}

		/**
		 * Sets whether to initialize the schema.
		 * @param initializeSchema true to initialize schema, false otherwise
		 * @return the builder instance
		 */
		public Builder initializeSchema(boolean initializeSchema) {
			this.initializeSchema = initializeSchema;
			return this;
		}


		/**
		 * Builds the MongoDBAtlasVectorStore instance.
		 * @return a new MongoDBAtlasVectorStore instance
		 * @throws IllegalStateException if the builder is in an invalid state
		 */
		@Override
		public MongoDBVectorStore build() {
			return new MongoDBVectorStore(this);
		}

	}

	/**
	 * The representation of {@link Document} along with its embedding.
	 *
	 * @param id The id of the document
	 * @param content The content of the document
	 * @param metadata The metadata of the document
	 * @param embedding The vectors representing the content of the document
	 */
	public record MongoDBDocument(String id, String content, Map<String, Object> metadata, float[] embedding) {
	}

}

1.3 VectorStoreConfig

配置 MongoDB 向量存储 Bean，关联MongoTemplate和 Embedding 模型

package com.conca.ai.config;

import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.mongodb.core.MongoTemplate;

import com.conca.ai.mongodb.MongoDBVectorStore;

@Configuration
public class VectorStoreConfig {
	
	@Bean
    public MongoDBVectorStore mongoDbVectorStore(MongoTemplate mongoTemplate,
          EmbeddingModel embeddingModel) {
		return MongoDBVectorStore
				.builder(mongoTemplate, embeddingModel)
				.build();
    }
	
//	//MongoDB驱动包查询网址
//	//https://repo.spring.io/ui/native/milestone/org/springframework/ai/
//	@Bean
//    public MongoDBAtlasVectorStore mongoDbAtlasVectorStore(MongoTemplate mongoTemplate,
//          EmbeddingModel embeddingModel) {
//		return MongoDBAtlasVectorStore
//				.builder(mongoTemplate, embeddingModel)
//				.build();
//    }
	
    
    
}

1.4 OllamaLLMConfig

配置 Ollama 本地小模型（qwen3:0.6b/deepseek-r1:1.5b）的连接和 ChatClient

package com.conca.ai.config;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.memory.MessageWindowChatMemory;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.ollama.OllamaChatModel;
import org.springframework.ai.ollama.api.OllamaApi;
import org.springframework.ai.ollama.api.OllamaOptions;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

 
/**
 */
@Configuration
public class OllamaLLMConfig
{
	
	// 从 application.properties 读取 ollama 的 base-url 配置
    @Value("${spring.ai.ollama.base-url}")
    private String ollamaBaseUrl;
    
    @Bean(name="ollamaDeepseekChatModel")
    public OllamaChatModel ollamaQwenChatModel() {
        OllamaApi ollamaApi = OllamaApi.builder().baseUrl(ollamaBaseUrl).build();
        
        OllamaOptions options = OllamaOptions.builder()
        		.model("qwen3:0.6b") // 第一个模型名称
//                .model("deepseek-r1:1.5b") // 第二个模型名称
                .temperature(0.9) // 不同的参数配置
                .build();
        
       return OllamaChatModel.builder()
    		   .ollamaApi(ollamaApi)
    		   .defaultOptions(options).build();
    }
    
    @Bean(name = "ollamaDeepseekMongoChatClient")
    public ChatClient ollamaQwenMongoChatClient(@Qualifier("ollamaDeepseekChatModel") ChatModel qwen)
    {
//    	MessageWindowChatMemory windowChatMemory = MessageWindowChatMemory.builder()
//                .chatMemoryRepository(chatMemoryRepository)
//                .maxMessages(10)
//            .build();

        return ChatClient.builder(qwen)
                .defaultOptions(OllamaOptions.builder()
                        .build())
//                .defaultAdvisors(MessageChatMemoryAdvisor.builder(windowChatMemory).build())
                .build();
    }
    
}

1.5 RagController

对外提供 HTTP 接口，实现 “知识库初始化”“向量检索”“RAG 问答” 三大核心功能

package com.conca.ai;

import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.function.Function;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.PromptTemplate;
import org.springframework.ai.document.Document;
import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.ai.embedding.EmbeddingRequest;
import org.springframework.ai.embedding.EmbeddingResponse;
import org.springframework.ai.ollama.api.OllamaOptions;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

import jakarta.annotation.Resource;
import lombok.extern.slf4j.Slf4j;
 
@RestController
@Slf4j
public class RagController
{
    @Resource
    private EmbeddingModel embeddingModel;
 
    @Resource
    private VectorStore vectorStore;
    
    @Resource(name="ollamaDeepseekMongoChatClient")
    private ChatClient chatClient;
 
    /**
     * 步骤一：模拟文本切片，文本向量化 后存入向量数据库MongoDB
     * http://localhost:8080/rag/init
     */
    @GetMapping("/rag/init")
    public String add()
    {
    	List<Document> documents = List.of(
    			new Document("A0200:用户登录验证失败二级宏观错误码"),
                new Document("A0201:用户名或密码错误"),
                new Document("A0202:用户token过期失效"),
                new Document("B2222:订单接口调用失败"),
                new Document("B2223:库存扣减接口超时"),
                new Document("B3333:物流查询接口返回异常"),
                new Document("C1111:Kafka消息发送失败"),
                new Document("C1112:Kafka消费组重平衡异常"),
                new Document("C3333:Redis缓存写入失败"),
                new Document("C3334:Redis连接池耗尽"),
                new Document("D0001:数据库错误一级宏观错误码"),
                new Document("D0100:数据库连接超时二级宏观错误码"),
                new Document("D0101:SQL执行报错"),
                new Document("D0102:事务提交失败"),
                new Document("E0001:第三方服务错误一级宏观错误码"),
                new Document("E0100:短信发送服务调用失败二级宏观错误码"),
                new Document("E0101:人脸识别接口返回异常"),
                new Document("F0001:文件处理错误一级宏观错误码"),
                new Document("F0100:文件上传超出大小限制二级宏观错误码"),
                new Document("F0101:文件格式解析失败"),
                new Document("00000:系统OK正确执行后的返回"),
                new Document("C2222:Kafka消息解压严重"),
                new Document("B1111:支付接口超时"),
                new Document("A0100:用户注册错误二级宏观错误码"),
                new Document("A0001:用户端错误一级宏观错误码")
        );
        vectorStore.add(documents);
        return embeddingModel.toString();
    }
 
 
    /**
     * 模拟：从向量数据库mongodb查找，进行相似度查找
     * http://localhost:8080/rag/get?msg=LLM
     * @param msg
     * @return
     */
    @GetMapping("/rag/get")
    public List getAll(@RequestParam(name = "msg") String msg)
    {
        SearchRequest searchRequest = SearchRequest.builder()
                .query(msg)
                .topK(2)
                .build();
 
        List<Document> list = vectorStore.similaritySearch(searchRequest);
 
        System.out.println(list);
 
        return list;
    }
    
    /**
     * 模拟：从向量数据库mongodb查找，rag
     * http://localhost:8080/rag/query?msg=A0100
     * @param msg
     * @return
     */
    @GetMapping("/rag/query")
    public String query(@RequestParam(name = "msg") String msg)
    {
 
        // 1. 检索相似文档（从MongoDB中找Top3相似文档）
        SearchRequest searchRequest = SearchRequest.builder()
                .query(msg)
                .topK(3)
                .build();
 
        List<Document> similarDocuments = vectorStore.similaritySearch(searchRequest);

        // 2. 拼接上下文
        String context = similarDocuments.stream()
                .map(Document::getText)
                .reduce("", (a, b) -> a + "\n" + b);

//        // 3. 构建Prompt（将上下文和问题传给LLM）
//        String promptTemplate = """
//                你是一个运维工程师,按照给出的编码给出对应故障解释，否则请明确说明“无法从提供的文档中找到相关答案”。
//                故障编码如下：
//                {context}
//                用户问题：
//                {question}
//                """;
        
     // 3. 构建Prompt（将上下文和问题传给LLM）
        StringBuffer promptTemplate = new StringBuffer();
        promptTemplate.append("你是运维工程师，**只按以下故障编码表解释**，不做任何额外推理、补充或关联。");
        promptTemplate.append("若编码不在表中，**只输出：无法从提供的文档中找到相关答案**。");
        promptTemplate.append("输出规则：直接输出“编码 解释”，不添加任何额外文字。");
        promptTemplate.append("故障编码表（编码:解释）：");
        promptTemplate.append("{context}。 ");
        promptTemplate.append("用户问题：");
        promptTemplate.append("{question}");
        
        PromptTemplate template = new PromptTemplate(promptTemplate.toString());
        Prompt prompt = template.create(Map.of("context", context, 
        		"question", msg));
        
        System.out.println(prompt);

        // 4. 调用Ollama生成回答
        return chatClient.prompt(prompt).call().content();
 
    }
    
    
}

1.6 RagApplication项目启动类

扫描包并初始化 Spring 上下文

package com.conca.ai;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.ComponentScan;

@SpringBootApplication
@ComponentScan(basePackages = {"com.conca.ai"}) // 扫描整个com.conca.ai包及其子包
public class RagApplication {
    public static void main(String[] args) {
        SpringApplication.run(RagApplication.class, args);
        
        
        // 1. 获取并打印项目包路径（当前类的包名）
        String packagePath = RagApplication.class.getPackage().getName();
        // 2. 定义项目名称（可根据实际情况修改）
        String projectName = "spring AI Stream Output Project"; // 你可以改成自己的项目名
        
        // 格式化输出信息
        System.out.println("====================================");
        System.out.println("         应用启动完毕 🚀");
        System.out.println("====================================");
        System.out.println("项目名称：" + projectName);
        System.out.println("包路径：" + packagePath);
        System.out.println("欢迎来到ai");
        System.out.println("====================================");
    }
}

2. Java 实现 MongoDB 向量比较的核心逻辑（重点）

MongoDBVectorStore类中的doSimilaritySearch方法是向量比较的核心，完全通过 Java 代码手动实现余弦相似度计算，而非依赖 MongoDB Atlas 的原生向量搜索能力，核心步骤如下：

@Override
public List<Document> doSimilaritySearch(SearchRequest request) {
    // 1. 对用户查询文本进行Embedding向量化（转为float[]向量）
    float[] queryEmbedding = this.embeddingModel.embed(request.getQuery());
    
    // 2. 全量查询MongoDB中存储的所有向量文档（无索引，性能瓶颈）
    List<org.bson.Document> listDocument = this.mongoTemplate.findAll(org.bson.Document.class, this.collectionName);
    
    // 3. 遍历所有文档，逐一遍历计算余弦相似度
    Map<org.bson.Document, Float> mapSimilarity = new HashMap<>();
    for(org.bson.Document doc : listDocument){
        // 3.1 从MongoDB中读取存储的List<Double>向量，转为float[]（适配余弦计算）
        List<Double> vectorList = doc.getList("embedding", Double.class);
        float[] vector = new float[vectorList.size()];
        for (int i = 0; i < vectorList.size(); i++) {
            vector[i] = vectorList.get(i).floatValue();
        }
        // 3.2 核心：调用余弦相似度算法计算相似度
        float cosine = calculateCosineSimilarity(queryEmbedding, vector);
        doc.put(SCORE_FIELD_NAME, Double.parseDouble(cosine+""));			
        mapSimilarity.put(doc, cosine);
    }
    
    // 4. 按相似度降序排序，取Top2结果返回
    List<Map.Entry<org.bson.Document, Float>> entryList = new ArrayList<>(mapSimilarity.entrySet());
    entryList.sort((entry1, entry2) -> Float.compare(entry2.getValue(), entry1.getValue()));
    List<org.bson.Document> listSort = entryList.stream().map(Map.Entry::getKey).collect(Collectors.toList());
    
    return listSort.stream().limit(2).map(d -> mapMongoDocument(d, queryEmbedding)).toList();
}

// 余弦相似度核心算法：cosθ = (A·B) / (||A|| * ||B||)
private static float calculateCosineSimilarity(float[] vectorA, float[] vectorB) {
    if (vectorA.length != vectorB.length) {
        throw new IllegalArgumentException("向量维度不一致");
    }
    float dotProduct = 0.0f; // 点积
    float normA = 0.0f;      // 向量A模长
    float normB = 0.0f;      // 向量B模长
    for (int i = 0; i < vectorA.length; i++) {
        dotProduct += vectorA[i] * vectorB[i];
        normA += vectorA[i] * vectorA[i];
        normB += vectorB[i] * vectorB[i];
    }
    if (normA == 0 || normB == 0) {
        return 0.0f;
    }
    return dotProduct / (float) (Math.sqrt(normA) * Math.sqrt(normB));
}

2.1核心步骤

对用户查询文本进行Embedding向量化（转为float[]向量）
全量查询MongoDB中存储的所有向量文档（无索引，性能瓶颈）
遍历所有文档，逐一遍历计算余弦相似度

3.1 从MongoDB中读取存储的List向量，转为float[]（适配余弦计算）

3.2 核心：调用余弦相似度算法计算相似度

按相似度降序排序，取Top2结果返回

2.2 向量比较日志

查询http://localhost:8080/rag/query?msg=A0100上面向量计算逻辑打印如下：

A0100>>1024

A0200:用户登录验证失败二级宏观错误码>>>0.549359

A0201:用户名或密码错误>>>0.58822674

A0202:用户token过期失效>>>0.52870256

B2222:订单接口调用失败>>>0.39715943

B2223:库存扣减接口超时>>>0.3791776

B3333:物流查询接口返回异常>>>0.35427678

C1111:Kafka消息发送失败>>>0.3918751

C1112:Kafka消费组重平衡异常>>>0.35871327

C3333:Redis缓存写入失败>>>0.3698006

C3334:Redis连接池耗尽>>>0.41192493

D0001:数据库错误一级宏观错误码>>>0.45165136

D0100:数据库连接超时二级宏观错误码>>>0.4534981

D0101:SQL执行报错>>>0.5083353

D0102:事务提交失败>>>0.5334933

E0001:第三方服务错误一级宏观错误码>>>0.46328068

E0100:短信发送服务调用失败二级宏观错误码>>>0.50492793

E0101:人脸识别接口返回异常>>>0.43495056

F0001:文件处理错误一级宏观错误码>>>0.46615037

F0100:文件上传超出大小限制二级宏观错误码>>>0.44284472

F0101:文件格式解析失败>>>0.51341987

00000:系统OK正确执行后的返回>>>0.4970258

C2222:Kafka消息解压严重>>>0.32977524

B1111:支付接口超时>>>0.40897542

A0100:用户注册错误二级宏观错误码>>>0.6324899

A0001:用户端错误一级宏观错误码>>>0.565165

3. 整体流程闭环

初始化：调用/rag/init接口，将 25 条故障编码文本通过 Embedding 模型向量化，存入 MongoDB 的vector_store集合；
检索：调用/rag/get?msg=XXX，对用户输入向量化后，与 MongoDB 中所有向量计算相似度，返回 Top2 相似文档；
问答：调用/rag/query?msg=XXX，检索 Top3 相似文档拼接上下文，传给本地小模型生成回答。

编辑

二、代码案例测试

1 启动程序，初始化知识库内容

http://localhost:8080/rag/init

MongoDB存储的内容如下：

编辑

2小模型qwen3:0.6b的RAG

启动小模型qwen3:0.6b，访问http://localhost:8080/rag/query?msg=A0100

编辑

其实发送给大模型的具体内容打印如下：

Prompt{messages=[UserMessage{content='你是运维工程师，只按以下故障编码表解释，不做任何额外推理、补充或关联。若编码不在表中，只输出：无法从提供的文档中找到相关答案。输出规则：直接输出“编码解释”，不添加任何额外文字。故障编码表（编码:解释）：

A0100:用户注册错误二级宏观错误码

A0201:用户名或密码错误。用户问题：A0100', properties={messageType=USER}, messageType=USER}], modelOptions=null}

3小模型deepseek-r1:1.5b的RAG

启动小模型deepseek-r1:1.5b，访问http://localhost:8080/rag/query?msg=A0100

编辑

其实发送给大模型的具体内容打印如下：

A0100:用户注册错误二级宏观错误码

A0201:用户名或密码错误。用户问题：A0100', properties={messageType=USER}, messageType=USER}], modelOptions=null}

三、本地小模型的核心缺点（结合本案例实测）

本案例使用的 qwen3:0.6b、deepseek-r1:1.5b 均为轻量级本地模型，虽然实现了私有化部署，但缺点极为突出，且直接影响系统的实用性：

1. 输出格式控制能力极差（最核心痛点）

案例中通过 Prompt 严格要求 “直接输出编码解释，无额外文字”，但实测结果显示：

qwen3:0.6b：输入 “A0100”，输出：编码解释；
deepseek-r1:1.5b：输入 “A0100”，输出：A0100 解释：用户注册错误二级宏观错误码。

根本原因：小模型参数规模小（千万级），对 Prompt 的语义约束理解能力弱，无法严格遵循输出规则，导致回答格式不统一，难以直接用于自动化场景。

2 生成结果稳定性差，重复查询输出不一致

对同一查询 “A0100” 多次调用：

qwen3:0.6b：输入 “A0100”，输出：：无法从提供的文档中找到相关答案
deepseek-r1:1.5b：输入 “A0100”，输出：A0100 解释：用户注册错误二级宏观错误码

根本原因：小模型的上下文窗口小（qwen3:0.6b 仅 2k tokens）、语义理解能力弱，无法处理多文档拼接后的逻辑关联。

根本原因：小模型的训练数据量少、模型结构简单，生成时的随机性无法有效控制（即使 temperature 设为 0.1 也无法完全避免）。

3. 性能与效果难以平衡

qwen3:0.6b：响应快（1-2 秒），但输出错误率高；
deepseek-r1:1.5b：输出准确率稍高，但响应慢（3-5 秒），且普通 PC 运行时内存占用高（>4G）。

四、优化建议

向量存储优化：使用 MongoDB Atlas 原生向量搜索（创建向量索引），替代 Java 手动计算相似度，提升检索性能；
模型层面优化：
- 升级模型：使用 7B 级别的模型（如 qwen3:7b），平衡性能与效果；
- Prompt 优化：添加示例（Few-shot），提升格式控制能力；
流程优化：增加结果校验层，对小模型输出进行格式校验，不符合规则则重新生成。

本案例验证了 “Spring AI + 本地小模型 + MongoDB” 构建私有化 RAG 的可行性，但小模型的能力短板和 Java 手动向量计算的性能问题，决定了该方案仅适用于数据量小、场景简单的内部测试场景，无法满足生产环境的高要求。

Spring AI + MongoDB + 本地小模型构建 RAG 系统：代码解析向量算分与小模型痛点分析