用ai来学习,如何构建RAG 应用

14 阅读1分钟

背景在于,最近再搞rag 应用,我再没有任何上下文的情况,下借助ai来实现

小场景快速验证,落地

首先,我给ai 输入,从最小场景,落地,以本地向量化,进行存储, 能以小的测试用例交互,实现 那么,并且要求以langchain.js 来运用

首先通过ai学习,到,一个rag 应用首先,先去把文本进行chunk拆分,其次,再通过embinds 来进行向量化内容

const getEmbeddings = () => {
  if (!embeddings) {
    const apiKey = localStorage.getItem('OPENAI_API_KEY') || 'sk-
    const baseUrl = localStorage.getItem('OPENAI_BASE_URL') || 'https://api.302.ai/v1';

    if (!apiKey) {
      throw new Error("OpenAI API Key is missing. Please set it in Settings.");
    }

    embeddings = new OpenAIEmbeddings({
      openAIApiKey: apiKey,
      configuration: {
        baseURL: baseUrl || undefined,
      }
    });
  }
  console.log('embeddings', embeddings)
  return embeddings;
};

那我现在继续,想验证,把文档向量化存储

export const addText = async (text: string, metadata: Record<string, any> = {}): Promise<void> => {
  const embedder = getEmbeddings();
  const chunks = splitText(text);
  console.log('chunks', chunks)
  // Batch embed documents
  const vectors = await embedder.embedDocuments(chunks);

  console.log('%c [  ]-72', 'font-size:13px; background:pink; color:#bf2c9f;', vectors)
  chunks.forEach((chunk, i) => {
    docs.push({
      pageContent: chunk,
      metadata,
      vector: vectors[i]
    });
  });
  console.log('docs', docs)
  console.log(`[RAG] Added ${chunks.length} vectors to local store.`);
};

那接着上一个思路,那我想继续搜索然后进行相似度匹配

export const search = async (query: string, k: number = 4): Promise<StoredDocument[]> => {
  console.log(`[RAG] Searching for: "${query}"`);
  const embedder = getEmbeddings();

  // 1. Vectorize the query
  console.log("[RAG] Vectorizing query...");
  const queryVector = await embedder.embedQuery(query);
  console.log(`[RAG] Query vectorized. Dimension: ${queryVector.length}`);
  console.log('docs22', docs)
  // 2. Calculate Similarity
  console.log(`[RAG] Calculating similarity against ${docs.length} stored chunks...`);
  const scoredDocs = docs.map(doc => ({
    doc,
    score: cosineSimilarity(queryVector, doc.vector)
  }));

  // 3. Sort by score
  scoredDocs.sort((a, b) => b.score - a.score);

  // 4. Log Top Results
  console.log("[RAG] Top Results:");
  scoredDocs.slice(0, k).forEach((d, i) => {
    console.log(`  ${i + 1}. Score: ${d.score.toFixed(4)} | Content: "${d.doc.pageContent.slice(0, 50)}..."`);
  });
  console.log('scoredDocs', scoredDocs)
  return scoredDocs.slice(0, k).map(d => d.doc);
};

image.png

最终通过最小场景,验证,跑通,向量化,相似度查询的闭环'

那么剩下的也就是如何去进行构建一个更完善的应用罢了

ai时代,通过工具,去快速验证,和落地,也是一个工程师,基本素养和习惯 '