Python 爬虫入门：写第一个简单爬虫 + 3 个新手必踩坑解决刚入门 Python 爬虫，很多人第一步就被各种报错劝

刚入门 Python 爬虫，很多人第一步就被各种报错劝退。这篇是我从零开始写的第一个爬虫实战，代码简单可运行，把新手最容易遇到的坑一次性讲清楚，适合和我一样刚入门的朋友。

一、目标

爬取一个简单网页，拿到标题和内容，理解爬虫基本流程。

二、代码（可直接运行）

python

运行

import requests
from bs4 import BeautifulSoup

# 目标网址（示例用，简单静态页）
url = "https://www.baidu.com"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
}

try:
    response = requests.get(url, headers=headers, timeout=10)
    response.raise_for_status()  # 自动抛异常
    
    soup = BeautifulSoup(response.text, "html.parser")
    title = soup.title.string if soup.title else "无标题"
    
    print("页面标题：", title)
    print("状态码：", response.status_code)
    
except Exception as e:
    print("请求出错：", e)

三、新手必踩 3 个坑 + 解决

ModuleNotFoundError: No module named 'requests'
- 原因：没装库
- 解决：
  
  plaintext
```
pip install requests beautifulsoup4
```
403 Forbidden / 被拒绝访问
- 原因：没带请求头，被识别为爬虫
- 解决：加上 User-Agent

编码乱码、中文乱码

解决：

python

运行

response.encoding = response.apparent_encoding

四、总结

爬虫基本三步：请求 → 解析 → 提取
新手先跑通流程，不要一上来搞复杂反爬
先练静态网页，再练接口、动态页面

结尾

接下来会继续写：

爬取真实网站实战
Xpath、正则表达式
简单反爬处理
小项目实战

欢迎关注，一起从零学爬虫～