基于大数据的B站热门视频评论情感可视化分析系统【python、Hadoop、spark、毕设项目开发、数据分析、推荐算法、选题指导、作业、毕设、实战皆可】

41 阅读8分钟

💖💖作者:计算机毕业设计小途 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目

@TOC

基于大数据的B站热门视频评论情感可视化分析系统介绍

基于大数据的B站热门视频评论情感可视化分析系统是一套面向计算机专业毕业设计的完整大数据分析解决方案,该系统采用Hadoop+Spark大数据处理框架作为核心技术架构,通过HDFS分布式文件系统存储海量B站视频评论数据,利用Spark SQL进行高效的数据查询与处理,结合Pandas和NumPy进行深度数据分析与挖掘。系统支持Python+Django和Java+Spring Boot两种技术实现版本,前端采用Vue+ElementUI构建现代化用户界面,通过Echarts图表库实现数据的多维度可视化展示,后端使用MySQL数据库进行结构化数据存储与管理。系统功能涵盖用户管理、个人中心、密码修改等基础模块,核心功能包括数据大屏可视化、视频互动特征分析、评论情感倾向分析、用户评论热点分析以及评论时间分布分析五大数据分析模块,通过对B站热门视频评论进行情感极性判断、热点话题提取、时间序列分析等多维度数据挖掘,帮助用户深入理解视频内容的用户反馈趋势与情感倾向变化规律。系统整体采用前后端分离架构设计,大数据处理层、业务逻辑层、数据展示层职责清晰,既能满足毕业设计对技术深度与广度的要求,也能充分展示大数据技术在实际场景中的应用价值,为学生提供从数据采集、存储、处理到可视化展示的完整大数据项目开发经验。

基于大数据的B站热门视频评论情感可视化分析系统演示视频

演示视频

基于大数据的B站热门视频评论情感可视化分析系统演示图片

评论情感倾向分析.png

评论时间分布分析.png

视频互动特征分析.png

数据大屏上.png

数据大屏下.png

用户评论热点分析.png

基于大数据的B站热门视频评论情感可视化分析系统代码展示

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, avg, sum, when, hour, to_date, regexp_replace, length
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, TimestampType, FloatType
import pandas as pd
import numpy as np
from snownlp import SnowNLP
from collections import Counter
import jieba
import jieba.analyse
spark = SparkSession.builder.appName("BilibiliCommentAnalysis").config("spark.executor.memory", "4g").config("spark.driver.memory", "2g").getOrCreate()
def analyze_comment_sentiment(video_id):
    comment_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/bilibili_db").option("dbtable", "comment_info").option("user", "root").option("password", "123456").load()
    video_comments = comment_df.filter(col("video_id") == video_id).select("comment_id", "content", "user_id", "create_time")
    comment_list = video_comments.collect()
    sentiment_results = []
    for comment in comment_list:
        content = comment['content']
        if content and len(content.strip()) > 0:
            cleaned_content = regexp_replace(content, "[^\u4e00-\u9fa5a-zA-Z0-9]", "")
            snow = SnowNLP(str(content))
            sentiment_score = snow.sentiments
            if sentiment_score >= 0.6:
                sentiment_label = "积极"
                sentiment_value = 1
            elif sentiment_score <= 0.4:
                sentiment_label = "消极"
                sentiment_value = -1
            else:
                sentiment_label = "中性"
                sentiment_value = 0
            sentiment_results.append({
                'comment_id': comment['comment_id'],
                'content': content,
                'user_id': comment['user_id'],
                'sentiment_score': round(sentiment_score, 4),
                'sentiment_label': sentiment_label,
                'sentiment_value': sentiment_value,
                'create_time': comment['create_time']
            })
    sentiment_df = spark.createDataFrame(sentiment_results)
    positive_count = sentiment_df.filter(col("sentiment_value") == 1).count()
    negative_count = sentiment_df.filter(col("sentiment_value") == -1).count()
    neutral_count = sentiment_df.filter(col("sentiment_value") == 0).count()
    total_count = sentiment_df.count()
    avg_sentiment = sentiment_df.agg(avg("sentiment_score")).collect()[0][0]
    sentiment_distribution = {
        'positive_count': positive_count,
        'negative_count': negative_count,
        'neutral_count': neutral_count,
        'positive_rate': round(positive_count / total_count * 100, 2) if total_count > 0 else 0,
        'negative_rate': round(negative_count / total_count * 100, 2) if total_count > 0 else 0,
        'neutral_rate': round(neutral_count / total_count * 100, 2) if total_count > 0 else 0,
        'avg_sentiment_score': round(avg_sentiment, 4) if avg_sentiment else 0,
        'total_comments': total_count
    }
    sentiment_pandas_df = sentiment_df.toPandas()
    sentiment_pandas_df['create_date'] = pd.to_datetime(sentiment_pandas_df['create_time']).dt.date
    daily_sentiment = sentiment_pandas_df.groupby('create_date').agg({
        'sentiment_score': 'mean',
        'sentiment_value': lambda x: (x == 1).sum()
    }).reset_index()
    daily_sentiment.columns = ['date', 'avg_score', 'positive_count']
    sentiment_trend = daily_sentiment.to_dict('records')
    return {
        'distribution': sentiment_distribution,
        'trend': sentiment_trend,
        'detail_data': sentiment_pandas_df.to_dict('records')[:100]
    }
def extract_comment_hotspots(video_id, top_n=20):
    comment_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/bilibili_db").option("dbtable", "comment_info").option("user", "root").option("password", "123456").load()
    video_comments = comment_df.filter(col("video_id") == video_id).select("comment_id", "content", "like_count", "reply_count")
    comments_list = video_comments.collect()
    stopwords = set(['的', '了', '在', '是', '我', '有', '和', '就', '不', '人', '都', '一', '一个', '上', '也', '很', '到', '说', '要', '去', '你', '会', '着', '没有', '看', '好', '自己', '这'])
    all_words = []
    keyword_comment_map = {}
    for comment in comments_list:
        content = comment['content']
        if content and len(content) > 0:
            words = jieba.cut(content)
            filtered_words = [w for w in words if len(w) >= 2 and w not in stopwords]
            all_words.extend(filtered_words)
            for word in filtered_words:
                if word not in keyword_comment_map:
                    keyword_comment_map[word] = []
                keyword_comment_map[word].append({
                    'comment_id': comment['comment_id'],
                    'content': content,
                    'like_count': comment['like_count'],
                    'reply_count': comment['reply_count']
                })
    word_counter = Counter(all_words)
    top_keywords = word_counter.most_common(top_n)
    hotspot_results = []
    for keyword, frequency in top_keywords:
        related_comments = keyword_comment_map.get(keyword, [])
        total_likes = sum([c['like_count'] for c in related_comments])
        total_replies = sum([c['reply_count'] for c in related_comments])
        heat_score = frequency * 1.0 + total_likes * 0.5 + total_replies * 0.3
        top_comments = sorted(related_comments, key=lambda x: x['like_count'], reverse=True)[:3]
        hotspot_results.append({
            'keyword': keyword,
            'frequency': frequency,
            'total_likes': total_likes,
            'total_replies': total_replies,
            'heat_score': round(heat_score, 2),
            'comment_count': len(related_comments),
            'top_comments': top_comments
        })
    hotspot_results = sorted(hotspot_results, key=lambda x: x['heat_score'], reverse=True)
    hotspot_df = pd.DataFrame(hotspot_results)
    keyword_categories = {}
    for item in hotspot_results[:10]:
        keyword = item['keyword']
        if any(tech_word in keyword for tech_word in ['技术', '效果', '质量', '画质', '剪辑']):
            category = '技术评价'
        elif any(content_word in keyword for content_word in ['内容', '情节', '剧情', '故事', '台词']):
            category = '内容评价'
        elif any(emotion_word in keyword for emotion_word in ['喜欢', '感动', '好看', '精彩', '有趣']):
            category = '情感表达'
        elif any(person_word in keyword for person_word in ['UP主', '博主', '作者', '演员', '主角']):
            category = '人物讨论'
        else:
            category = '其他话题'
        if category not in keyword_categories:
            keyword_categories[category] = []
        keyword_categories[category].append(keyword)
    category_summary = [{'category': k, 'keywords': v, 'count': len(v)} for k, v in keyword_categories.items()]
    return {
        'top_keywords': hotspot_results[:top_n],
        'category_summary': category_summary,
        'total_keyword_count': len(word_counter)
    }
def analyze_comment_time_distribution(video_id):
    comment_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/bilibili_db").option("dbtable", "comment_info").option("user", "root").option("password", "123456").load()
    video_comments = comment_df.filter(col("video_id") == video_id).select("comment_id", "create_time", "like_count", "reply_count")
    video_comments = video_comments.withColumn("comment_date", to_date(col("create_time")))
    video_comments = video_comments.withColumn("comment_hour", hour(col("create_time")))
    daily_stats = video_comments.groupBy("comment_date").agg(
        count("comment_id").alias("comment_count"),
        sum("like_count").alias("total_likes"),
        sum("reply_count").alias("total_replies"),
        avg("like_count").alias("avg_likes")
    ).orderBy("comment_date")
    daily_stats_list = daily_stats.collect()
    daily_distribution = []
    for row in daily_stats_list:
        daily_distribution.append({
            'date': str(row['comment_date']),
            'comment_count': row['comment_count'],
            'total_likes': row['total_likes'],
            'total_replies': row['total_replies'],
            'avg_likes': round(row['avg_likes'], 2) if row['avg_likes'] else 0
        })
    hourly_stats = video_comments.groupBy("comment_hour").agg(
        count("comment_id").alias("comment_count"),
        avg("like_count").alias("avg_likes")
    ).orderBy("comment_hour")
    hourly_stats_list = hourly_stats.collect()
    hourly_distribution = []
    for row in hourly_stats_list:
        hour_value = row['comment_hour']
        if 6 <= hour_value < 12:
            time_period = "上午"
        elif 12 <= hour_value < 18:
            time_period = "下午"
        elif 18 <= hour_value < 24:
            time_period = "晚上"
        else:
            time_period = "凌晨"
        hourly_distribution.append({
            'hour': hour_value,
            'time_period': time_period,
            'comment_count': row['comment_count'],
            'avg_likes': round(row['avg_likes'], 2) if row['avg_likes'] else 0
        })
    comment_pandas_df = video_comments.toPandas()
    comment_pandas_df['create_time'] = pd.to_datetime(comment_pandas_df['create_time'])
    comment_pandas_df['dayofweek'] = comment_pandas_df['create_time'].dt.dayofweek
    weekday_names = {0: '周一', 1: '周二', 2: '周三', 3: '周四', 4: '周五', 5: '周六', 6: '周日'}
    comment_pandas_df['weekday_name'] = comment_pandas_df['dayofweek'].map(weekday_names)
    weekly_stats = comment_pandas_df.groupby('weekday_name').agg({
        'comment_id': 'count',
        'like_count': 'mean'
    }).reset_index()
    weekly_stats.columns = ['weekday', 'comment_count', 'avg_likes']
    weekday_order = ['周一', '周二', '周三', '周四', '周五', '周六', '周日']
    weekly_stats['weekday'] = pd.Categorical(weekly_stats['weekday'], categories=weekday_order, ordered=True)
    weekly_stats = weekly_stats.sort_values('weekday')
    weekly_distribution = weekly_stats.to_dict('records')
    total_comments = video_comments.count()
    date_range = daily_stats.agg({"comment_date": "min"}).collect()[0][0], daily_stats.agg({"comment_date": "max"}).collect()[0][0]
    peak_hour = max(hourly_distribution, key=lambda x: x['comment_count'])
    peak_day = max(daily_distribution, key=lambda x: x['comment_count']) if daily_distribution else None
    time_summary = {
        'total_comments': total_comments,
        'date_range': {'start': str(date_range[0]), 'end': str(date_range[1])},
        'peak_hour': peak_hour['hour'],
        'peak_hour_count': peak_hour['comment_count'],
        'peak_day': peak_day['date'] if peak_day else None,
        'peak_day_count': peak_day['comment_count'] if peak_day else 0
    }
    return {
        'daily_distribution': daily_distribution,
        'hourly_distribution': hourly_distribution,
        'weekly_distribution': weekly_distribution,
        'summary': time_summary
    }

基于大数据的B站热门视频评论情感可视化分析系统文档展示

文档.png

💖💖作者:计算机毕业设计小途 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目