Python爬取懂车帝/汽车之家评论并做竞品分析1. 引言在汽车行业，用户评论数据是了解消费者需求、竞品优劣势的重要信

1. 引言

在汽车行业，用户评论数据是了解消费者需求、竞品优劣势的重要信息来源。懂车帝和汽车之家作为国内领先的汽车垂直平台，积累了大量的用户评价数据。通过Python爬虫技术抓取这些评论，并进行竞品分析，可以帮助车企、市场研究人员或数据分析师优化产品策略。

本文将介绍如何：

使用Python爬取懂车帝/汽车之家评论（涉及Requests、Selenium、反爬策略）
数据清洗与存储（Pandas、MySQL/MongoDB）
竞品分析（词频统计、情感分析、可视化）

2. 爬取懂车帝/汽车之家评论

2.1 目标分析

懂车帝：动态加载（Ajax/API），需分析接口
汽车之家：部分静态HTML，部分动态加载，可能需要Selenium

2.2 爬取汽车之家评论（静态+动态结合）

方法1：Requests + BeautifulSoup（静态页面）

import requests
from bs4 import BeautifulSoup
import pandas as pd

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}

def get_autohome_comments(car_id, page=1):
    url = f"https://club.autohome.com.cn/bbs/thread-c-{car_id}-{page}.html"
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
    comments = soup.find_all('div', class_='tz-paragraph')
    return [comment.get_text().strip() for comment in comments]

# 示例：爬取某车型的评论（car_id需替换）
comments = get_autohome_comments("1234", 1)  # 1234是车型ID
print(comments[:3])  # 输出前3条评论

方法2：Selenium（动态加载）

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
import time

service = Service("path/to/chromedriver")  # 需下载对应ChromeDriver
driver = webdriver.Chrome(service=service)

def get_dongchedi_comments(car_id):
    url = f"https://www.dongchedi.com/community/{car_id}"
    driver.get(url)
    time.sleep(3)  # 等待加载
    
    # 模拟滚动加载更多评论
    for _ in range(3):  # 滚动3次
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(2)
    
    comments = driver.find_elements(By.CLASS_NAME, "comment-content")
    return [comment.text for comment in comments]

# 示例：爬取懂车帝某车型评论
comments = get_dongchedi_comments("1001")  # 1001是车型ID
print(comments[:3])
driver.quit()

2.3 反爬策略

随机User-Agent：使用**<font style="color:rgb(64, 64, 64);background-color:rgb(236, 236, 236);">fake_useragent</font>**库
IP代理：使用**<font style="color:rgb(64, 64, 64);background-color:rgb(236, 236, 236);">requests</font>**+代理IP池（如亿牛云、芝麻代理）
Selenium随机等待：避免被识别为机器人

3. 数据存储与清洗

3.1 存储至CSV/Pandas

import pandas as pd

data = {
    "source": ["autohome"] * len(autohome_comments) + ["dongchedi"] * len(dongchedi_comments),
    "comment": autohome_comments + dongchedi_comments
}

df = pd.DataFrame(data)
df.to_csv("car_comments.csv", index=False)

3.2 存储至MySQL

import pymysql

conn = pymysql.connect(
    host="localhost",
    user="root",
    password="your_password",
    database="car_analysis"
)

cursor = conn.cursor()
cursor.execute("""
    CREATE TABLE IF NOT EXISTS comments (
        id INT AUTO_INCREMENT PRIMARY KEY,
        source VARCHAR(20),
        comment TEXT
    )
""")

# 插入数据
for index, row in df.iterrows():
    cursor.execute("INSERT INTO comments (source, comment) VALUES (%s, %s)", (row["source"], row["comment"]))

conn.commit()
conn.close()

4. 竞品分析（数据可视化与NLP）

4.1 词频分析（jieba分词 + WordCloud）

import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt

text = " ".join(df["comment"])
words = jieba.lcut(text)
word_freq = pd.Series(words).value_counts().head(20)

# 生成词云
wordcloud = WordCloud(font_path="simhei.ttf", background_color="white").generate(" ".join(words))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

4.2 情感分析（SnowNLP）

from snownlp import SnowNLP

def get_sentiment(text):
    return SnowNLP(text).sentiments

df["sentiment"] = df["comment"].apply(get_sentiment)

# 按来源（懂车帝/汽车之家）分析情感倾向
sentiment_by_source = df.groupby("source")["sentiment"].mean()
print(sentiment_by_source)

4.3 可视化对比（Matplotlib/Seaborn）

import seaborn as sns

# 绘制情感分布
sns.boxplot(x="source", y="sentiment", data=df)
plt.title("Sentiment Analysis: Autohome vs Dongchedi")
plt.show()

5. 结论

懂车帝 vs 汽车之家评论差异：
- 汽车之家评论更偏向技术讨论，懂车帝更偏向用户体验
- 情感分析显示，某车型在懂车帝的评分略高
竞品优化建议：
- 针对负面评论优化产品（如“油耗高”、“内饰一般”）
- 结合词云分析用户关注点（如“动力”、“空间”）