在过去两周里,AI 革命中出现了一个新的竞争者,DeepSeek 凭借其 V3 和 R1 模型异军突起,这两个模型与 OpenAI 的模型不相上下。R1 模型仅耗资 560 万美元和 6 周的工作,由中国打造!它与 GPT-4o 和 o1 一样出色,甚至更好,而且完全免费和开源,这让我们能够在本地运行强大的大型语言模型(LLMs)。这篇博客深入探讨了 DeepSeek 到底是什么,为什么你应该关注它?他们是如何做到的?最重要的是,你如何利用它来构建你的下一个价值百万美元的技术初创公司。
DeepSeek 是什么?
DeepSeek 是一家中国人工智能公司,专注于构建开源大型语言模型,成立于 2023 年 7 月。DeepSeek R1 并不是他们的第一个 LLM,而是他们的第一个推理模型,与 OpenAI 的 o1 模型相当。
为什么应该关注?
- 它与 OpenAI 一样出色,甚至更好
- 在他们的官方网站上可以完全免费使用
- DeepSeek 的 API 比 OpenAI 的便宜 96% 以上(通过 R1 的定价输入令牌缓存未命中与 o1 的定价输入令牌参数进行比较计算)
- 它是 100% 免费和开源的,根据 MIT 许可证发布,允许你在本地计算机上运行。你可以在 GitHub 上查看它。
他们是如何做到的?
他们做到这一点的方式有很多。请注意,这些高度简化。如果你想更深入地了解这些技术的工作原理,请查看相关资料。
1. 深度底层优化
由于美国政府限制向中国出售高端芯片,DeepSeek 作为一家中国公司,无法获得 NVIDIA 显卡(例如 NVIDIA H100s)来训练他们的模型。这意味着他们必须想办法优化他们已有的芯片(NVIDIA H800s)。简而言之,他们使用了专家混合(MoE)和多头潜在注意力(MLA)技术来最大化 GPU 性能。
2. 只训练必要的部分
通常,训练 AI 模型的部分模型意味着更新整个模型。即使某些部分没有贡献任何东西,这导致了资源的巨大浪费。为了解决这个问题,他们引入了无辅助损失(ALS)负载均衡。ALS 负载均衡通过引入一个偏差因子来防止一个芯片过载,而另一个芯片利用率不足。这使得每个令牌只训练了模型参数的 5% ,比 GPT 4 训练成本低约 91% (GPT 4 训练成本为 6300 万美元,而 V3 训练成本为 557.6 万美元)。
3. 压缩
在底层,使用了大量的键值对。存储所有这些会占用大量资源。为了解决这个问题,DeepSeek 使用了低秩键值(KV)联合压缩。它通过使用投影矩阵来压缩这些键值对。存储的是这个压缩版本。当需要数据时,该技术会被逆转以恢复原始值,从而最小化大小、提高处理速度和降低内存使用率。
4. 强化学习
模型训练的一部分很像训练狗的方式。
- 模型被给予复杂但易于验证的问题来回答。
- 如果回答正确,它会得到“奖励”,从而强化这些模式。
- 如果回答错误,它会调整自己,以便在未来的迭代中改进。
结果:
| 训练成本 | 预训练 | 上下文扩展 | 后训练 | 总计 |
|---|---|---|---|---|
| 在 H800s GPU 小时 | 2664K | 119K | 5K | 2788K |
| 以美元计 | 532.8 万美元 | 23 万美元 | 1 万美元 | 557.6 万美元 |
我如何利用它?
DeepSeek 的模型很容易使用。以下是使用它们的方法:
1. 在线使用
你可以在他们的官方网站上免费使用 DeepSeek V3 和 R1。
2. API
DeepSeek 提供了一个官方 API,如果你不想自己托管模型,这比 OpenAI 便宜 96% 以上。
如何使用
API 本身非常简单。你可以使用 NPM 或 PIP 上的 OpenAI 包,或者发起 HTTP 请求。
警告: 切勿在客户端存储 API 密钥
- 申请 API 密钥
- 下载包
npm install openai
- 发起请求
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "https://api.deepseek.com",
apiKey: "<DeepSeek API Key>"
});
const completion = await openai.chat.completions.create({
messages: [
{
role: "system",
content: "You are a helpful assistant."
},
{
role: "user",
content: "What is 5 + 7?"
}
],
model: "deepseek-chat"
});
console.log(completion.choices[0].message.content);
运行它
node index.js
输出:
To find the sum of 5 and 7, follow these steps:
Start with the first number:
5
Add the second number to it:
5 + 7
Perform the addition:
5 + 7 = 12
Final Answer: 12
3. 本地使用
现在进入有趣的部分,✨ 自托管 ✨。不幸的是,完整模型大约 400 GB。大多数人没有那么多存储空间用于一个模型,而且在初创公司中使用它会非常昂贵。幸运的是,有蒸馏模型;经过微调的模型,体积小得多。模型越大,越智能但越慢。让我们先尝试在我们的机器上运行 DeepSeek。
- 选择你想要本地运行的大小。
注意:V3 只有一个大小
注意:截至 2025 年 2 月 2 日
| 参数数量(以亿计) | 大小(以 GB 计) | 模型 |
|---|---|---|
| 1.5B | 1.1GB | deepseek-r1:1.5b |
| 7B | 4.7GB | deepseek-r1:7b |
| 8B | 4.9GB | deepseek-r1:8b |
| 14B | 9GB | deepseek-r1:14b |
| 32B | 20GB | deepseek-r1:32b |
| 70B | 43GB | deepseek-r1:70b |
| 671B | 404GB | deepseek-r1:671b 或 deepseek-v3 |
在你的机器上
打开终端并运行上面选择的模型的命令。安装完成后,你应该会看到以下界面:
现在你可以输入提示并获得答案。
在项目中
Ollama 也为 NPM 和 PIP 提供了一个包。
下载包
npm install ollama
调用
import ollama from "ollama";
const response = await ollama.chat({
model: "<MODEL>",
messages: [
{
role: "user",
content: "List 5 foods from Italy. Explain their origins"
}
]
});
console.log(response.message.content);
运行
node index.js
输出:
<think>
Okay, so I need to list five Italian foods and explain their origins. Hmm, let's see... I'm not an expert on Italian cuisine, but I know a few basics. Maybe I can start by thinking about the most famous dishes and then research where they come from.
First, pasta comes to mind. I think pasta has been around for a long time, maybe even before Christ. I've heard that Marco Polo brought it back to Europe, but I'm not
sure if he actually introduced it to Italy or if it was already there. I should probably look into that more. Also, pizza is a big one—everyone knows it's from Italy.
But again, when exactly? I think the word "pizza" comes from "pie," which refers to a flatbread. Maybe bakers in Italy started making something like modern pizza centuries ago.
Lasagna is another dish I can think of. It's a layered pasta with fillings and béchamel sauce. I believe it has roots in Emilia-Romagna, but I'm not sure about the exact origins or when it was first made. Then there's risotto, which is a rice dish cooked with broth and other ingredients. I think it originated in Lombardy, maybe around the 17th century.
Gelato is Italian ice cream, right? I've heard that it has been around since the Middle Ages. Maybe Arab traders introduced sugar to Italy, and then gelato became popular as a sweet treat. So, sugar was key here, and the techniques for making gelato might have come from the Arab world.
Wait, am I missing any other major dishes? Oh, like tiramisu. That's a classic, but I'm not sure about its exact origin either. It seems to be a dessert from the Emilia-Romagna region as well. Maybe it was created in the 1800s or something. But since the user only asked for five foods, maybe that's enough.
Let me try to organize this:
1. Pasta: Originating before Christ, possibly brought to Italy by Marco Polo but already existing there.
2. Pizza: From ancient Rome, a flatbread that evolved into modern pizza in Campania.
3. Lasagna: From Emilia-Romagna, medieval dish with layers of pasta and fillings.
4. Risotto: Originating in Lombardy around the 17th century, using rice and broth.
5. Gelato: Middle Ages, influenced by Arab traders introducing sugar.
I think that's a good list. Each has its own history, but I should double-check some details to ensure accuracy. For example, pasta's history is a bit tricky because it was known before Marco Polo, but his travels popularized it in Europe. Also, gelato's origins might be tied to the invention of sugar, which came through trade with
Arab countries.
I wonder if there are other foods I could have included instead. Maybe something like risotto is more specific, or perhaps arancini, which are stuffed pasta balls, but they're a variant of pasta too. Or maybe minestrone soup, but that's more of a category than a specific dish.
No, the five listed seem to cover a good range from different regions and time periods in Italy. I think that's sufficient for this task.
</think>
Here is a list of five Italian foods along with their origins:
1. **Pasta**: Originating well before Christ, pasta has ancient roots and was likely already present in Italy when Marco Polo returned from his travels, where he popularized its use in Europe.
2. **Pizza**: The word "pizza" refers to a flatbread that traces its origins back to ancient Rome. Modern pizza as we know it evolved in Campania, particularly in areas like Napoli.
3. **Lasagna**: Hailing from the Emilia-Romagna region, lasagna is a medieval dish characterized by its layered structure of pasta interleaved with fillings and covered in béchamel sauce.
4. **Risotto**: Originating in Lombardy during the 17th century, risotto is a rice dish cooked with broth and various ingredients, known for its creamy texture.
5. **Gelato**: This Italian ice cream has medieval roots, influenced by Arab traders who introduced sugar to Italy. Gelato's techniques have been passed down through generations, becoming a beloved treat.
This selection highlights the diverse culinary history of Italy, spanning regions and centuries.
哇哦,这看起来真的很奇怪?原因很简单。
响应内容是用一种叫做 Markdown 的格式编写的。
我们有三种方法来处理这个问题。
1. 拥抱它
Markdown 就像是纯文本的升级版。如果你想要 Markdown,那就这样吧!
2. 转换为纯文本
我们可以使用一个叫做 remove-markdown 的库来去掉文本中的 Markdown 部分。
下载包
npm install remove-markdown
更新代码
import ollama from "ollama";
import removeMd from "remove-markdown";
const response = await ollama.chat({
model: "<MODEL>",
messages: [
{
role: "user",
content: "List 5 foods from Italy. Explain their origins."
}
]
});
console.log(removeMd(response.message.content));
3. 转换为 HTML
如果你打算在浏览器中渲染这个内容,我们可以使用 marked 库将 Markdown 转换为 HTML 代码。
下载包
npm install marked
更新代码
import ollama from "ollama";
import { writeFileSync } from "fs";
import { parse } from "marked";
const response = await ollama.chat({
model: "<MODEL>",
messages: [
{
role: "user",
content: "List 5 foods from Italy. Explain their origins."
}
]
});
// 可选地,我将响应保存到一个 HTML 文件中,以便在浏览器中查看
writeFileSync("response.html", `
<body>
${parse(response.message.content)}
</body>
`);
输出:
结论
DeepSeek 是 AI 行业中一个非常强大的新竞争者。它在 AI 行业中取得了前所未有的革命性突破。