最大限度地发挥LLM的潜力：使用矢量数据库LLMs做自然语言处理（NLP），将文本的含义表示为一个向量。这种对文本字词的

LLMs做自然语言处理（NLP），将文本的含义表示为一个向量。这种对文本字词的表示是一种嵌入。

令牌限制：LLM提示的最大问题

目前，LLM提示的最大问题之一是令牌限制。当GPT-3发布时，提示和输出的限制都是2,048个符号。随着GPT-3.5的发布，这个限制增加到4096个代币。现在，GPT-4有两个变种。一个是8,192个标记的限制，另一个是32,768个标记的限制，大约50页的文本。

那么，当你可能想做一个上下文大于这个限制的提示时，你能做什么？当然，唯一的解决办法是使上下文更短。但是，你如何才能使它更短，同时又有所有的相关信息呢？解决办法是：将上下文存储在一个矢量数据库中，并通过相似性搜索查询找到相关的上下文。

什么是矢量嵌入？

让我们先来解释一下什么是矢量嵌入。罗伊-凯恩斯的定义是："嵌入是为了使数据更有用而进行的学习转换"。一个神经网络学习将文本转换为包含其实际意义的向量空间。这更有用，因为它可以找到同义词以及单词之间的句法和语义关系。这种视觉有助于理解这些向量如何编码意义：

矢量数据库是做什么的？

矢量数据库存储和索引矢量嵌入。这对于快速检索向量和寻找类似的向量是很有用的。

相似性搜索

我们可以通过计算一个向量与所有其他向量的距离来寻找向量的相似性。最近的邻居将是与查询向量最相似的结果。这就是向量数据库中的平面索引的工作方式。但这不是很有效，在一个大型数据库中，这可能需要很长的时间。

为了提高搜索的性能，我们可以尝试只计算向量的一个子集的距离。这种方法被称为近似近邻（ANN），可以提高速度，但会牺牲结果的质量。一些流行的ANN索引是局部敏感哈希（LSH）、分层可导航小世界（HNSW）或倒置文件索引（IVF）。

整合矢量存储和LLMs

为了这个例子，我从这个网址下载了整个numpy文档（有2000多页）的PDF格式：https://numpy.org/doc/1.23/numpy-ref.pdf。

我们可以写一段Python代码，将上下文文件转换为嵌入文件，并将其保存到矢量存储中。我们将使用LangChain来加载文档并将其分割成几块，并使用Faiss（Facebook AI Similarity Search）作为一个矢量数据库。

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("example_data/layout-parser-paper.pdf")
pages = loader.load_and_split()
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(pages, embeddings)

db.save_local("numpy_faiss_index")

现在，我们可以使用这个数据库进行相似性搜索查询，找到可能与我们的提示有关的网页。然后，我们用所得到的块来填充我们的提示语的上下文。我们将使用LangChain来使它更容易：

from langchain.vectorstores import FAISS
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.llms import OpenAI

query = "How to calculate the median of an array"

db = FAISS.load_local("numpy_faiss_index", embeddings)
docs = docsearch.similarity_search(query)

chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="stuff")
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

我们对模型的问题是 "如何计算一个数组的中位数"。尽管我们给它的上下文远远超过了令牌的限制，但我们还是克服了这个限制，得到了答案：

To calculate the median, you can use the numpy.median()
function, which takes an input array or object that can be
converted to an array and computes the median along the
specified axis. The axis parameter specifies the axis or axes
along which the medians are computed, and the default is to
compute the median along a flattened version of the array. The
function returns the median of the array elements.
For example, to calculate the median of an array "arr" along
the first axis, you can use the following code:

import numpy as np
median = np.median(arr, axis=0)

This will compute the median of the array elements along the
first axis, and return the result in the variable "median".

这只是一个非常新的问题的一个聪明的解决方案。随着LLM的不断发展，也许像这样的问题会被解决，而不需要这种巧妙的解决方案。然而，我相信这种演变将为新的功能打开大门，这些功能可能需要其他新的巧妙的解决方案来应对它们可能带来的挑战。