环境说明
- 机器:MacBook Pro,M4 Pro 芯片,48G 内存
- 模型:Gemma4 31B Dense(约 19G)
- 框架:Ollama
效果非常不错,通过了两个经典的大模型逻辑推理测试:
- 洗车测试:洗车房离家50米,该走过去还是开车?(考察常识推理)
- 棍子过门测试:10米棍子能否通过3米高的门?(考察空间推理)
安装流程
安装Ollama
brew install ollama
启动服务
ollama serve
拉取并运行Gemma4 31B
(base) logan@U-JHY26QNJ-3201 ~ % ollama run gemma4:31b
pulling manifest
pulling manifest
pulling manifest
pulling manifest
pulling manifest
pulling 280af6832eca: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 19 GB
pulling 7339fa418c9a: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 11 KB
pulling 56380ca2ab89: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 42 B
pulling 0940386273ff: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 474 B
verifying sha256 digest
writing manifest
success
查看ollama 模型服务
ollama ps
如下图所示,gemma4:31b启动了,占了27GB的内存
NAME ID SIZE PROCESSOR CONTEXT UNTIL
gemma4:31b 6316f0629137 27 GB 100% GPU 32768 4 minutes from now
性能测试
你是什么模型?
>>> 你是什么模型
Thinking...
* User asks: "你是什么模型" (What model are you?).
* Language: Chinese.
* I am a large language model trained by Google.
* *Direct Answer:* I am a large language model.
* *Developer:* Trained by Google.
* *Capabilities (briefly):* Text generation, translation, coding, answering questions, etc.
* *Tone:* Helpful, polite, and clear.
* *Option 1 (Simple):* 我是一个由 Google 训练的大型语言模型。(I am a large language model trained by Google.)
* *Option 2 (Detailed):* 我是一个由 Google 训练的大型语言模型。我可以帮你完成很多任务,比如写代码、翻译语言、回答问题或者进行创意写作。(I am a large language model trained by Google. I can help you with
many tasks, such as writing code, translating languages, answering questions, or creative writing.)
* The most accurate and concise answer is the standard identity statement.
...done thinking.
我是一个由 Google 训练的大型语言模型。
你能干啥?
Thinking...
"你能干啥" (What can you do?)
The user wants to know my capabilities, strengths, and limitations.
Helpful, professional, versatile, and clear.
To provide a comprehensive answer, I should categorize my skills. Good categories would be:
* Content Creation (Writing)
* Information Retrieval/Knowledge (Answering)
* Language Translation/Processing
* Coding/Technical Help
* Logical Reasoning/Analysis
* Daily Assistance/Planning
* *Writing:* Emails, essays, poems, scripts, stories, summaries.
* *Answering:* Explaining complex concepts, general trivia, providing facts.
* *Translation:* Multiple languages, polishing text, grammar checks.
* *Coding:* Python, JS, Java, C++, debugging, explaining code.
* *Analysis:* Summarizing long texts, extracting key points, brainstorming.
* *Daily Life:* Itineraries, meal plans, study schedules.
* *Introduction:* Acknowledge the role as an AI assistant.
* *Detailed List (using bullet points for readability):*
1. **文本创作 (Text Creation):** 写作、润色、创意。
2. **回答问题 (Answering Questions):** 知识百科、复杂概念简化。
3. **语言翻译 (Translation):** 多语言互译、地道表达。
4. **编程辅助 (Programming):** 写代码、找 Bug、解释代码。
5. **逻辑分析与总结 (Analysis & Summary):** 摘要、对比、头脑风暴。
6. **生活助手 (Daily Life):** 计划、建议。
* *Closing/Limitation:* Mention that while powerful, I can make mistakes (hallucinations) and don't have a physical body or real-time consciousness.
* *Check:* Did I sound too arrogant? *Adjustment:* Use phrases like "我可以尝试" (I can try) or "协助你" (assist you).
* *Check:* Is the language natural? *Adjustment:* Ensure the Chinese is fluent and professional.
...done thinking.
简单来说,我是一个全能的**智能助手**。我可以处理文字、代码、逻辑分析以及提供创意建议。
为了让你更直观地了解,我把我的能力分成了几个大类:
### 1. ✍️ 文字创作与编辑
* **写作:** 我可以帮你写邮件、工作报告、博客文章、小说、诗歌、演讲稿等。
* **润色:** 如果你写了一段话觉得不够专业或不够流畅,我可以帮你修改得更地道。
* **摘要:** 给我一篇长文章,我可以迅速为你总结出核心要点。
* **翻译:** 我精通多种语言,可以在中文、英文、日文、法文等之间进行高质量互译。
### 2. 🔍 知识查询与学习
* **百科问答:** 你可以问我科学、历史、文化、地理等各方面的知识。
* **复杂概念简化:** 如果某个理论很难懂(比如“量子纠缠”或“经济学中的边际效应”),我可以尝试用最简单的话解释给你听。
* **学习辅助:** 我可以帮你制定学习计划,或者在你学习某个科目时充当你的陪练。
### 3. 💻 编程与技术支持
* **写代码:** 我支持 Python, JavaScript, Java, C++, HTML/CSS 等多种编程语言。
* **找 Bug:** 你把报错的代码发给我,我可以帮你分析问题出在哪里并提供修复方案。
* **解释代码:** 如果你看到一段看不懂的代码,我可以逐行解释它的逻辑。
### 4. 🧠 逻辑分析与创意头脑风暴
* **方案策划:** 比如“我想策划一场惊喜生日派对,有什么创意?”或“帮我起 10 个科技公司的名字”。
* **逻辑推演:** 帮你在多个选项中做对比分析,列出优缺点。
* **数学计算:** 处理从基础数学到复杂数学问题的计算。
### 5. 📅 日常生活助手
* **制定计划:** 比如“帮我制定一个 5 天的东京旅游行程”或“一个为期一个月的小白健身计划”。
* **建议提供:** 比如“晚餐吃什么”、“送给长辈什么礼物比较合适”。
---
**⚠️ 不过,我也有一些局限性:**
* **实时性:** 虽然我知识广博,但对于此时此刻正在发生的极新事件,我的了解可能会有延迟。
* **事实准确性:** 虽然我努力提供准确信息,但偶尔也会出现“一本正经地胡说八道”(幻觉)的情况,建议对关键信息进行核实。
* **没有实体:** 我没有身体,无法帮你递杯水或在现实世界中执行物理任务。
**你想现在试一下我的哪个功能?或者直接给我一个任务吧!**
我现在需要洗车,但是洗车房离家里只有50米,我是走过去还是开车过去
Thinking...
Needs to go to a car wash.
The car wash is only 50 meters away from home.
"Should I walk there or drive there?"
* The core purpose of visiting a car wash is to *wash the car*.
* If the user walks there, how will the car get there?
* If the user drives there, they can actually get the car washed.
* *Logical path:* You can't wash a car that isn't at the car wash.
* *Humorous/Obvious angle:* Point out the absurdity of walking to a car wash to wash a car.
* *Nuance (though unlikely):* Maybe they are just going to check if it's open? Or pay for a membership? But the prompt says "I need to wash the car" (我现在需要洗车), implying the action of washing.
* *Initial thought:* "You have to drive, otherwise the car stays home." (Too blunt)
* *Better approach:* "Since you want to wash the car, you obviously need to drive it there!" (Playful and clear)
* *Detailed breakdown:*
1. Direct answer: Drive.
2. Reason: You can't wash a car by walking to the car wash.
3. Joke/Irony: Unless you're planning to push the car 50 meters (which is possible but illogical).
* *Response:* "你应该开车过去。因为如果你走过去,你的车还留在家里,洗车房没法洗你的车啊!😂" (You should drive. Because if you walk, the car stays home, and the car wash can't wash it!)
...done thinking.
这是一个经典的逻辑陷阱题!😂
**你应该开车过去。**
因为如果你走过去,你的车还在家里,洗车店的人没法给一只“不在场”的车洗澡。
除非你的计划是:
1. 先走过去看看排队的人多不多。
2. 或者你打算把车推过去(虽然只有50米,但那样太累了)。
**结论:启动发动机,出发吧!🚗💨**
一根10米高1厘米粗的棍子,能否通过一个3米高2米宽的门
Thinking...
* Object: A rod (stick).
* Dimensions of object: Length = 10m, Diameter (Thickness) = 1cm.
* Dimensions of opening (door): Height = 3m, Width = 2m.
* Question: Can it pass through?
* *Constraint 1: The "Front" of the rod (Cross-section).*
* The diameter is 1cm. 1cm is much smaller than both 3m and 2m. So, if the rod is pushed through end-first (lengthwise), it fits easily.
* *Constraint 2: The "Side" of the rod (Length).*
* The length is 10m. 10m is larger than both 3m and 2m. So, it cannot pass through sideways (flatly).
* *Constraint 3: The "Diagonal" of the door.*
* Diagonal = $\sqrt{3^2 + 2^2} = \sqrt{9 + 4} = \sqrt{13} \approx 3.61\text{m}$.
* 10m is still larger than 3.61m. So, it cannot pass through diagonally as a rigid flat object.
* The prompt doesn't say the rod has to be carried horizontally or vertically *relative to the door frame*. It just asks if it *can* pass through.
* Standard physical movement: You hold the rod and push it through the door lengthwise.
* Yes, it can.
* Explanation: As long as the rod is moved **lengthwise** (the 10m dimension is the direction of movement), the only dimensions that matter for the opening are the thickness (1cm). Since 1cm < 2m and
1cm < 3m, it passes through easily.
* Wait, is this a riddle?
* Is the rod flexible? (Irrelevant, it fits even if rigid).
* Is it a trick about the space *behind* or *in front* of the door? (Usually not, unless specified).
* Conclusion: It's a test of basic spatial reasoning and the ability to distinguish between the dimension of movement and the dimensions of the object's cross-section.
...done thinking.
**能。**
而且非常轻松地通过。
**原因很简单:**
你只需要**纵向(沿着棍子的长度方向)**把它穿过去即可。
在这种情况下,决定能否通过的不再是棍子的长度(10米),而是它的**粗细(1厘米)**。由于 1 厘米远小于门的宽度(2米)和高度(3米),所以这根棍子可以像一支笔穿过门洞一样,轻而易举地通过。
**除非**你非要尝试把它**横着**(平行于地面且垂直于门平面)搬进去,那样的话,10米确实进不去 2米的宽门。但显然,正常人都会选择纵向搬运。 :)
测试小结
Gemma4 31B 在 M4 Pro + 48G 内存上跑得很流畅,推理质量也出乎意料:
- 思维链清晰,能看到完整的推理过程
- 常识推理和空间推理都没翻车
- 中文能力不错,回复自然
如果你有 M4 Pro 或更高配置的 Mac,强烈建议试一试。
Reference
ollama: ollama.com/library/gem…
gemma4: deepmind.google/models/gemm…
benchmark: artificialanalysis.ai
加点私货
AI 领域变化太快,今天是 Gemma4,明天可能又有新模型刷榜。
我做了一个 AI 日报站 AmazingIndex,每天用量化算法从几十个信息源里筛选最值得看的 AI动态,帮你过滤噪音,不错过重要信息。
感兴趣的话可以每天刷一眼,比刷推效率高 👀