1. 环境准备
1.1 启动Milvus服务
-
单机模式(Docker快速启动):
bash复制代码 docker run -d --name milvus-standalone \ -p 19530:19530 \ -p 9091:9091 \ milvusdb/milvus:v2.3.4-standalone
1.2 添加Java SDK依赖
在pom.xml中添加Milvus Java SDK:
xml复制代码
<dependency>
<groupId>io.milvus</groupId>
<artifactId>milvus-sdk-java</artifactId>
<version>2.3.4</version>
</dependency>
2. 配置Milvus连接
2.1 创建配置类
java复制代码
@Configuration
public class MilvusConfig {
@Value("${milvus.host:localhost}")
private String host;
@Value("${milvus.port:19530}")
private int port;
@Bean
public MilvusClient milvusClient() {
return new MilvusClient(host, port);
}
}
2.2 配置文件
application.properties:
properties复制代码
milvus.host=localhost
milvus.port=19530
3. 数据模型与集合管理
3.1 定义集合结构
java复制代码
public class MilvusCollection {
public static final String COLLECTION_NAME = "doc_vectors";
public static final int VECTOR_DIM = 1536; // 根据实际向量维度调整
// 字段定义
public static final String ID_FIELD = "id";
public static final String CONTENT_FIELD = "content";
public static final String VECTOR_FIELD = "vector";
// 创建集合
public static void createCollection(MilvusClient client) {
FieldType idField = FieldType.newBuilder()
.withName(ID_FIELD)
.withDataType(DataType.Int64)
.withPrimaryKey(true)
.withAutoID(true)
.build();
FieldType contentField = FieldType.newBuilder()
.withName(CONTENT_FIELD)
.withDataType(DataType.VarChar)
.withMaxLength(1000)
.build();
FieldType vectorField = FieldType.newBuilder()
.withName(VECTOR_FIELD)
.withDataType(DataType.FloatVector)
.withDimension(VECTOR_DIM)
.build();
CreateCollectionParam createParam = CreateCollectionParam.newBuilder()
.withCollectionName(COLLECTION_NAME)
.addFieldType(idField)
.addFieldType(contentField)
.addFieldType(vectorField)
.build();
client.createCollection(createParam);
}
}
4. 实现数据存储服务
4.1 数据插入服务
java复制代码
@Service
public class MilvusStorageService {
private final MilvusClient milvusClient;
@Autowired
public MilvusStorageService(MilvusClient milvusClient) {
this.milvusClient = milvusClient;
}
public void storeDocument(String content, float[] vector) {
// 构建插入数据
List<InsertParam.Field> fields = new ArrayList<>();
fields.add(new InsertParam.Field(MilvusCollection.CONTENT_FIELD, Collections.singletonList(content)));
fields.add(new InsertParam.Field(MilvusCollection.VECTOR_FIELD, Collections.singletonList(vector)));
InsertParam insertParam = InsertParam.newBuilder()
.withCollectionName(MilvusCollection.COLLECTION_NAME)
.withFields(fields)
.build();
// 执行插入操作
milvusClient.insert(insertParam);
// 刷新数据使可立即搜索(生产环境需谨慎使用)
milvusClient.flush(MilvusCollection.COLLECTION_NAME);
}
}
4.2 完整调用流程
java复制代码
@RestController
public class DocumentController {
@Autowired
private OpenAIVectorizationService vectorizationService;
@Autowired
private MilvusStorageService storageService;
@PostMapping("/upload")
public ResponseEntity<String> uploadDocument(@RequestBody String content) {
// 向量化文档
float[] vector = vectorizationService.vectorize(content);
// 存储到Milvus
storageService.storeDocument(content, vector);
return ResponseEntity.ok("文档存储成功");
}
}
5. 集合管理操作
5.1 初始化集合(应用启动时)
java复制代码
@SpringBootApplication
public class Application implements CommandLineRunner {
@Autowired
private MilvusClient milvusClient;
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
@Override
public void run(String... args) {
if (!milvusClient.hasCollection(MilvusCollection.COLLECTION_NAME)) {
MilvusCollection.createCollection(milvusClient);
}
}
}
5.2 索引创建(可选)
java复制代码
public void createIndex() {
IndexType indexType = IndexType.IVF_FLAT;
String indexParam = "{"nlist":1024}";
CreateIndexParam createIndexParam = CreateIndexParam.newBuilder()
.withCollectionName(MilvusCollection.COLLECTION_NAME)
.withFieldName(MilvusCollection.VECTOR_FIELD)
.withIndexType(indexType)
.withMetricType(MetricType.L2)
.withExtraParam(indexParam)
.build();
milvusClient.createIndex(createIndexParam);
}
6. 关键注意事项
-
批处理优化:批量插入数据时使用
List<List<?>>结构提高效率java复制代码 // 批量插入示例 List<String> contents = ...; // 多文档内容 List<float[]> vectors = ...; // 对应向量 fields.add(new InsertParam.Field(CONTENT_FIELD, contents)); fields.add(new InsertParam.Field(VECTOR_FIELD, vectors)); -
连接管理:
- 使用连接池配置(推荐使用Zookeeper管理的集群)
- 处理连接超时和重试机制
-
数据一致性:
- 重要数据插入后执行
flush() - 使用Milvus的原子性操作保证数据完整
- 重要数据插入后执行
-
性能调优:
- 根据数据规模选择合适的索引类型(IVF_FLAT、HNSW等)
- 调整
nlist、M等索引参数优化查询性能
7. 完整架构示意图
复制代码
Spring Boot应用
│
│ 向量化请求
▼
[OpenAI/Cohere API]
│
│ 返回向量
▼
[业务逻辑层]
│
│ 结构化数据
▼
[Milvus向量数据库]
│
▼
(后续支持RAG检索)
通过以上步骤,即可在Spring Boot应用中实现文档向量化并存储到Milvus向量数据库的完整流程。