背景在于,最近再搞rag 应用,我再没有任何上下文的情况,下借助ai来实现
小场景快速验证,落地
首先,我给ai 输入,从最小场景,落地,以本地向量化,进行存储, 能以小的测试用例交互,实现 那么,并且要求以langchain.js 来运用
首先通过ai学习,到,一个rag 应用首先,先去把文本进行chunk拆分,其次,再通过embinds 来进行向量化内容
const getEmbeddings = () => {
if (!embeddings) {
const apiKey = localStorage.getItem('OPENAI_API_KEY') || 'sk-
const baseUrl = localStorage.getItem('OPENAI_BASE_URL') || 'https://api.302.ai/v1';
if (!apiKey) {
throw new Error("OpenAI API Key is missing. Please set it in Settings.");
}
embeddings = new OpenAIEmbeddings({
openAIApiKey: apiKey,
configuration: {
baseURL: baseUrl || undefined,
}
});
}
console.log('embeddings', embeddings)
return embeddings;
};
那我现在继续,想验证,把文档向量化存储
export const addText = async (text: string, metadata: Record<string, any> = {}): Promise<void> => {
const embedder = getEmbeddings();
const chunks = splitText(text);
console.log('chunks', chunks)
// Batch embed documents
const vectors = await embedder.embedDocuments(chunks);
console.log('%c [ ]-72', 'font-size:13px; background:pink; color:#bf2c9f;', vectors)
chunks.forEach((chunk, i) => {
docs.push({
pageContent: chunk,
metadata,
vector: vectors[i]
});
});
console.log('docs', docs)
console.log(`[RAG] Added ${chunks.length} vectors to local store.`);
};
那接着上一个思路,那我想继续搜索然后进行相似度匹配
export const search = async (query: string, k: number = 4): Promise<StoredDocument[]> => {
console.log(`[RAG] Searching for: "${query}"`);
const embedder = getEmbeddings();
// 1. Vectorize the query
console.log("[RAG] Vectorizing query...");
const queryVector = await embedder.embedQuery(query);
console.log(`[RAG] Query vectorized. Dimension: ${queryVector.length}`);
console.log('docs22', docs)
// 2. Calculate Similarity
console.log(`[RAG] Calculating similarity against ${docs.length} stored chunks...`);
const scoredDocs = docs.map(doc => ({
doc,
score: cosineSimilarity(queryVector, doc.vector)
}));
// 3. Sort by score
scoredDocs.sort((a, b) => b.score - a.score);
// 4. Log Top Results
console.log("[RAG] Top Results:");
scoredDocs.slice(0, k).forEach((d, i) => {
console.log(` ${i + 1}. Score: ${d.score.toFixed(4)} | Content: "${d.doc.pageContent.slice(0, 50)}..."`);
});
console.log('scoredDocs', scoredDocs)
return scoredDocs.slice(0, k).map(d => d.doc);
};
最终通过最小场景,验证,跑通,向量化,相似度查询的闭环'
那么剩下的也就是如何去进行构建一个更完善的应用罢了
ai时代,通过工具,去快速验证,和落地,也是一个工程师,基本素养和习惯 '