网红经济下消费者行为情感分析系统｜Python实战全解析💡 网红经济下消费者行为情感分析系统｜Python实战全解析

💡 网红经济下消费者行为情感分析系统｜Python实战全解析

本文为武汉纺织大学管理学院本科毕业设计精华版，完整源码+数据集获取方式见文末

🎯 研究背景与行业价值

网红经济浪潮：

✅ 社交电商崛起：直播、短视频成为新消费场景
✅ 粉丝经济效应：网红影响力直接转化购买力
✅ 情感驱动消费：用户决策更多基于情感认同
✅ 数据价值凸显：海量评论蕴含宝贵商业洞察

传统分析局限：

❌ 主观判断：依赖人工经验，缺乏数据支撑
❌ 效率低下：手动处理海量评论耗时耗力
❌ 深度不足：难以挖掘潜在主题和情感倾向
❌ 实时性差：无法快速响应市场变化

🏗️ 技术架构设计

完整分析流程

📊 数据采集层：
├── 爬虫程序：评论数据获取
├── 八爪鱼工具：辅助数据采集
└── 本地存储：原始数据保存

🛠️ 数据处理层：
├── 数据去重：去除重复评论
├── 类型转换：评分情感分类
├── 数据清洗：去除无关词汇
└── 评论分词：中文文本切分

🔍 特征工程层：
├── 词性标注：名词提取
├── 停用词过滤：去除无意义词
└── 情感词匹配：正负面词表构建

📈 分析建模层：
├── 词云可视化：关键词展示
├── 情感分析：倾向性判断
└── LDA主题模型：潜在主题挖掘

🎯 应用输出层：
├── 产品优化建议
├── 营销策略指导
└── 用户洞察报告

核心技术栈

技术领域	工具选择	应用场景
开发环境	PyCharm IDE	代码编写、调试
分词工具	jieba分词	中文文本切分
可视化	wordcloud	词云图生成
主题模型	LDA	潜在主题挖掘
数据处理	Pandas + NumPy	数据清洗、分析

⚡ 核心代码实现

1. 数据预处理模块

import jieba
import jieba.posseg as pseg
import pandas as pd
import re
from collections import Counter

class DataPreprocessor:
    """
    数据预处理类
    """
    def __init__(self):
        self.stop_words = self.load_stop_words()
        
    def load_stop_words(self):
        """加载停用词表"""
        stop_words = set([
            '的', '了', '在', '是', '我', '有', '和', '就', 
            '不', '人', '都', '一', '个', '上', '也', '很',
            '到', '说', '要', '去', '你', '会', '着', '没有',
            '看', '好', '自己', '这', '京东', '手机', '荣耀'
        ])
        return stop_words
    
    def remove_duplicates(self, df):
        """
        数据去重 - 删除完全重复的评论
        """
        print("开始数据去重...")
        initial_count = len(df)
        df_cleaned = df.drop_duplicates()
        print(f"去重结果: {initial_count} → {len(df_cleaned)} 条评论")
        return df_cleaned
    
    def convert_rating_to_sentiment(self, df, rating_col='rating'):
        """
        评分转换为情感标签
        1-3分 → neg(负面), 5分 → pos(正面)
        """
        def rating_mapping(score):
            if score in [1, 2, 3]:
                return 'neg'
            elif score == 5:
                return 'pos'
            else:
                return 'neutral'
        
        df['sentiment'] = df[rating_col].apply(rating_mapping)
        return df
    
    def clean_text(self, text):
        """
        文本清洗 - 去除数字、字母、特殊字符
        """
        if pd.isna(text):
            return ''
        
        # 去除数字和字母
        text = re.sub(r'[a-zA-Z0-9]', '', str(text))
        # 去除标点符号
        text = re.sub(r'[^\w\s]', '', text)
        # 去除多余空白
        text = re.sub(r'\s+', ' ', text).strip()
        
        return text
    
    def segment_words(self, text, use_stop_words=True):
        """
        中文分词处理
        """
        if not text:
            return []
        
        # 使用jieba进行分词和词性标注
        words = pseg.cut(text)
        
        # 过滤条件：去除停用词、单字词
        filtered_words = []
        for word, flag in words:
            if (len(word) > 1 and 
                (not use_stop_words or word not in self.stop_words)):
                filtered_words.append((word, flag))
        
        return filtered_words
    
    def extract_noun_comments(self, df, text_col='comment'):
        """
        提取包含名词的评论
        """
        print("开始提取含名词的评论...")
        
        noun_comments = []
        for idx, row in df.iterrows():
            text = row[text_col]
            segmented = self.segment_words(text)
            
            # 检查是否包含名词（词性以'n'开头）
            has_noun = any(flag.startswith('n') for _, flag in segmented)
            if has_noun:
                noun_comments.append(row)
        
        noun_df = pd.DataFrame(noun_comments)
        print(f"含名词评论: {len(noun_df)}/{len(df)} 条")
        return noun_df

# 使用示例
preprocessor = DataPreprocessor()

# 加载数据
df = pd.read_csv('honor50_comments.csv')

# 数据预处理流程
df = preprocessor.remove_duplicates(df)
df = preprocessor.convert_rating_to_sentiment(df)
df['cleaned_comment'] = df['comment'].apply(preprocessor.clean_text)
noun_df = preprocessor.extract_noun_comments(df)

2. 情感分析模块

import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from gensim import corpora, models
import numpy as np

class SentimentAnalyzer:
    """
    情感分析类
    """
    def __init__(self):
        self.positive_words = self.load_sentiment_words('positive')
        self.negative_words = self.load_sentiment_words('negative')
        self.negation_words = {
            '不', '没', '无', '非', '莫', '弗', '毋', '未', '否', '别',
            '無', '休', '不是', '不能', '不可', '没有', '不用', '不要', 
            '从没', '不太'
        }
    
    def load_sentiment_words(self, sentiment_type):
        """加载情感词表"""
        base_words = set()
        # 基础情感词表（知网情感分析用词语集）
        if sentiment_type == 'positive':
            base_words = {'好', '优秀', '满意', '喜欢', '不错', '漂亮', '支持'}
        else:
            base_words = {'差', '坏', '不满', '讨厌', '不好', '糟糕', '反对'}
        
        # 添加领域特定词汇
        domain_words = {
            'positive': {'满意', '好评', '很快', '超值', '给力', '支持', '完美', '喜欢'},
            'negative': {'差评', '贵', '高', '卡', '不好', '慢', '问题'}
        }
        
        base_words.update(domain_words.get(sentiment_type, set()))
        return base_words
    
    def calculate_sentiment_score(self, words):
        """
        计算情感得分
        """
        score = 0
        negation_count = 0
        
        for i, (word, pos) in enumerate(words):
            # 检查否定词
            if word in self.negation_words:
                negation_count += 1
                continue
            
            # 情感词匹配
            if word in self.positive_words:
                sentiment_value = 1
            elif word in self.negative_words:
                sentiment_value = -1
            else:
                continue
            
            # 否定修正（奇数否定取反）
            if negation_count % 2 == 1:
                sentiment_value = -sentiment_value
            
            score += sentiment_value
            negation_count = 0  # 重置否定计数
        
        return score
    
    def analyze_sentiments(self, df):
        """
        批量情感分析
        """
        sentiments = []
        scores = []
        
        for _, row in df.iterrows():
            words = eval(row['segmented_words'])  # 假设分词结果已存储
            score = self.calculate_sentiment_score(words)
            
            if score > 0:
                sentiment = 'pos'
            elif score < 0:
                sentiment = 'neg'
            else:
                sentiment = 'neutral'
            
            sentiments.append(sentiment)
            scores.append(score)
        
        df['predicted_sentiment'] = sentiments
        df['sentiment_score'] = scores
        
        return df
    
    def generate_wordcloud(self, texts, title, sentiment_type):
        """
        生成情感词云
        """
        # 提取对应情感词汇
        sentiment_words = []
        for text in texts:
            words = eval(text) if isinstance(text, str) else text
            for word, pos in words:
                if (sentiment_type == 'positive' and word in self.positive_words) or \
                   (sentiment_type == 'negative' and word in self.negative_words):
                    sentiment_words.append(word)
        
        if not sentiment_words:
            print(f"未找到{sentiment_type}情感词汇")
            return None
        
        # 生成词云
        wordcloud = WordCloud(
            font_path='SimHei.ttf',
            width=800,
            height=600,
            background_color='white',
            max_words=100
        ).generate(' '.join(sentiment_words))
        
        plt.figure(figsize=(10, 8))
        plt.imshow(wordcloud, interpolation='bilinear')
        plt.axis('off')
        plt.title(f'{title} - {sentiment_type}情感词云', fontsize=16)
        plt.tight_layout()
        return plt

# 使用示例
analyzer = SentimentAnalyzer()
df_with_sentiment = analyzer.analyze_sentiments(noun_df)

# 生成词云
positive_texts = df_with_sentiment[df_with_sentiment['predicted_sentiment'] == 'pos']['segmented_words']
negative_texts = df_with_sentiment[df_with_sentiment['predicted_sentiment'] == 'neg']['segmented_words']

positive_wc = analyzer.generate_wordcloud(positive_texts, '华为荣耀50', 'positive')
negative_wc = analyzer.generate_wordcloud(negative_texts, '华为荣耀50', 'negative')

3. LDA主题分析模块

class LDAAnalyzer:
    """
    LDA主题分析类
    """
    def __init__(self):
        self.dictionary = None
        self.corpus = None
    
    def prepare_lda_data(self, segmented_texts):
        """
        准备LDA分析数据
        """
        # 构建词典
        texts = [[word for word, pos in eval(text)] for text in segmented_texts]
        self.dictionary = corpora.Dictionary(texts)
        
        # 构建语料库
        self.corpus = [self.dictionary.doc2bow(text) for text in texts]
        
        return self.dictionary, self.corpus
    
    def find_optimal_topics(self, corpus, dictionary, max_topics=10):
        """
        寻找最优主题数量
        """
        from gensim.models import LdaModel
        from sklearn.metrics.pairwise import cosine_similarity
        
        topic_similarities = []
        
        for k in range(2, max_topics + 1):
            # 训练LDA模型
            lda_model = LdaModel(
                corpus=corpus,
                id2word=dictionary,
                num_topics=k,
                random_state=42,
                passes=10
            )
            
            # 计算主题间相似度
            topic_vectors = []
            for i in range(k):
                topic_vec = dict(lda_model.get_topic_terms(i, topn=100))
                vector = [topic_vec.get(idx, 0) for idx in range(len(dictionary))]
                topic_vectors.append(vector)
            
            # 计算平均余弦相似度
            similarity_matrix = cosine_similarity(topic_vectors)
            np.fill_diagonal(similarity_matrix, 0)  # 忽略自身相似度
            avg_similarity = np.mean(similarity_matrix)
            
            topic_similarities.append((k, avg_similarity))
            print(f"主题数: {k}, 平均相似度: {avg_similarity:.4f}")
        
        return topic_similarities
    
    def train_lda_model(self, corpus, dictionary, num_topics):
        """
        训练LDA模型
        """
        lda_model = models.LdaModel(
            corpus=corpus,
            id2word=dictionary,
            num_topics=num_topics,
            random_state=42,
            passes=15,
            alpha='auto',
            per_word_topics=True
        )
        
        return lda_model
    
    def display_topics(self, lda_model, num_words=10):
        """
        显示主题关键词
        """
        topics = lda_model.print_topics(num_words=num_words)
        for idx, topic in topics:
            print(f"主题 {idx + 1}: {topic}")
        
        return topics

# 使用示例
lda_analyzer = LDAAnalyzer()

# 准备数据
dictionary, corpus = lda_analyzer.prepare_lda_data(noun_df['segmented_words'])

# 寻找最优主题数
topic_similarities = lda_analyzer.find_optimal_topics(corpus, dictionary)

# 训练LDA模型（选择相似度最低的主题数）
optimal_topics = 3
lda_model = lda_analyzer.train_lda_model(corpus, dictionary, optimal_topics)

# 显示主题
topics = lda_analyzer.display_topics(lda_model)

📊 数据分析结果

1. 情感分析效果评估

混淆矩阵结果：

实际\预测	负面(neg)	正面(pos)	总计
负面(neg)	8	8	16
正面(pos)	75	626	701
总计	83	634	717

📊 准确率：88.42% - 证明基于词表的情感分析在电商评论场景下效果显著

2. 正面评价主题分析

LDA主题挖掘结果：

主题1	主题2	主题3
喜欢	不错	速度
拍照	满意	外观
运行	漂亮	很快
手感	支持	效果
清晰	颜色	屏幕

主题解读：

🎯 主题1：关注拍照功能、运行性能 - 反映产品核心功能体验
💝 主题2：强调满意度、外观设计 - 体现情感认同和美学需求
⚡ 主题3：重视响应速度、显示效果 - 关注用户体验流畅度

3. 负面评价主题分析

问题聚焦领域：

主题1	主题2	主题3
拍照	高	手感
像素	真的	不好
运行	做工	流畅
轻薄	感觉	特别

问题诊断：

🔍 主题1：拍照质量、运行卡顿 - 核心功能待优化
🛠️ 主题2：价格偏高、做工问题 - 性价比和品质担忧
📱 主题3：手感体验、流畅度 - 物理交互感受不佳

🎯 商业应用价值

对于手机厂商

📱 产品优化：聚焦拍照、运行速度等核心痛点
💰 定价策略：调整价格或提升价值感知
🎨 设计改进：优化外观手感和做工品质
🔧 质量管控：加强品控，减少次品流出

对于营销策略

🎯 卖点突出：强化拍照、速度等优势功能宣传
💬 话术优化：针对用户关注点调整营销语言
⭐ 口碑管理：加强正面评价引导和传播
🔄 产品迭代：基于用户反馈指导新品开发

对于消费者洞察

👥 用户画像：理解不同用户群体的关注差异
💭 需求挖掘：发现潜在需求和改进方向
📈 趋势把握：跟踪用户评价变化趋势
🎪 体验优化：全面提升用户购买和使用体验

💡 项目特色亮点

技术创新

多技术融合：结合LDA、情感分析、可视化等技术
领域适配：针对电商评论场景优化情感词表
主题挖掘：深度挖掘用户评价的潜在主题
实用导向：分析结果直接指导商业决策

方法优势

可解释性强：基于词典的方法结果易于理解
计算效率高：相比深度学习模型更轻量高效
可扩展性好：框架易于扩展到其他产品领域
业务价值大：分析结果具有明确的商业应用价值

🚀 进一步优化方向

技术增强

🤖 深度学习：引入BERT等预训练模型提升准确率
🔄 实时分析：构建流式处理管道支持实时监控
📊 多维度分析：结合用户画像进行细分分析
🌐 跨平台整合：整合多个电商平台数据

功能扩展

📈 趋势分析：跟踪评价随时间变化趋势
👥 用户分群：基于评价行为的用户细分
🏆 竞品对比：与竞品评价数据对比分析
🎯 个性化推荐：基于评价内容的个性化产品推荐

在这里插入图片描述

🎁 资源获取

完整项目资料包：

✅ 完整情感分析系统源码
✅ 华为荣耀50清洗后数据集
✅ LDA主题分析完整代码
✅ 情感词表及停用词表
✅ 可视化图表生成代码

获取方式： 本项目包含完整的技术实现和商业分析，需要付费获取完整资源

💬 技术交流区

常见问题解答： Q: 情感分析的准确率如何进一步提升？ A: 可以结合深度学习模型，或者针对具体领域扩充情感词典

Q: LDA主题数如何确定？ A: 项目中提供了主题相似度寻优方法，也可以结合困惑度指标

Q: 能否用于其他产品分析？ A: 框架通用，只需更换数据源和调整领域词典即可

✨ 如果觉得本项目对你有帮助，请点赞、收藏、关注支持！ ✨