基于大数据的无人驾驶网约车抖音社会舆情分析系统 | 5大核心技术栈打造舆情分析系统:Hadoop+Spark+Python完整实现方案

97 阅读7分钟

💖💖作者:计算机毕业设计江挽 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目

基于大数据的无人驾驶网约车抖音社会舆情分析系统介绍

无人驾驶网约车抖音社会舆情分析系统是一套基于大数据技术栈构建的综合性舆情监测分析平台,专门针对无人驾驶网约车在抖音等社交媒体平台的公众讨论进行深度挖掘和分析。系统采用Hadoop分布式存储框架作为数据底层支撑,结合Spark大数据处理引擎实现海量舆情数据的实时采集、清洗和分析,通过Python语言开发核心算法模块,运用Pandas和NumPy进行数据科学计算和统计分析。前端界面基于Vue框架结合ElementUI组件库构建,通过Echarts可视化图表库将分析结果以直观的图表形式展现给用户。系统后端采用Django框架提供RESTful API接口,支持用户管理、舆情数据管理、舆情分析、系统管理等核心功能模块,能够有效监测无人驾驶网约车相关话题的网络传播趋势、情感倾向分析、热点事件追踪等,为相关企业和研究机构提供数据驱动的决策支持。

基于大数据的无人驾驶网约车抖音社会舆情分析系统演示视频

演示视频

基于大数据的无人驾驶网约车抖音社会舆情分析系统演示图片

在这里插入图片描述 在这里插入图片描述转存失败,建议直接上传图片文件 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

基于大数据的无人驾驶网约车抖音社会舆情分析系统代码展示

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, count, avg, desc, asc, regexp_extract, split
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType, TimestampType
import pandas as pd
import numpy as np
from textblob import TextBlob
import jieba
from collections import Counter
import re
from datetime import datetime, timedelta

spark = SparkSession.builder.appName("UnmannedVehicleSentimentAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()

def sentiment_analysis_processing():
    """舆情数据情感分析核心处理函数"""
    sentiment_df = spark.read.option("header", "true").csv("hdfs://localhost:9000/sentiment_data/douyin_comments.csv")
    sentiment_df = sentiment_df.filter(col("content").isNotNull() & (col("content") != ""))
    def analyze_sentiment_score(text):
        if not text or len(text.strip()) == 0:
            return 0.0
        positive_words = ['好', '棒', '赞', '优秀', '方便', '安全', '先进', '未来', '智能', '高科技']
        negative_words = ['差', '坏', '危险', '担心', '害怕', '不安全', '失业', '取代', '威胁', '不信任']
        words = list(jieba.cut(text))
        positive_count = sum(1 for word in words if word in positive_words)
        negative_count = sum(1 for word in words if word in negative_words)
        total_words = len(words)
        if total_words == 0:
            return 0.0
        sentiment_score = (positive_count - negative_count) / total_words
        return max(-1.0, min(1.0, sentiment_score))
    def classify_sentiment(score):
        if score > 0.1:
            return "positive"
        elif score < -0.1:
            return "negative" 
        else:
            return "neutral"
    sentiment_udf = spark.udf.register("sentiment_analysis", analyze_sentiment_score, FloatType())
    classify_udf = spark.udf.register("classify_sentiment", classify_sentiment, StringType())
    result_df = sentiment_df.withColumn("sentiment_score", sentiment_udf(col("content")))
    result_df = result_df.withColumn("sentiment_category", classify_udf(col("sentiment_score")))
    daily_sentiment = result_df.groupBy(col("publish_date"), col("sentiment_category")).agg(count("*").alias("count"))
    avg_sentiment_by_topic = result_df.groupBy("topic_keyword").agg(avg("sentiment_score").alias("avg_sentiment"), count("*").alias("comment_count"))
    trend_analysis = result_df.groupBy(col("publish_date")).agg(avg("sentiment_score").alias("daily_avg_sentiment"), count("*").alias("daily_comment_count")).orderBy(asc("publish_date"))
    daily_sentiment.write.mode("overwrite").option("header", "true").csv("hdfs://localhost:9000/analysis_results/daily_sentiment")
    avg_sentiment_by_topic.write.mode("overwrite").option("header", "true").csv("hdfs://localhost:9000/analysis_results/topic_sentiment")
    trend_analysis.write.mode("overwrite").option("header", "true").csv("hdfs://localhost:9000/analysis_results/sentiment_trend")
    return {
        'total_comments': result_df.count(),
        'positive_ratio': result_df.filter(col("sentiment_category") == "positive").count() / result_df.count(),
        'negative_ratio': result_df.filter(col("sentiment_category") == "negative").count() / result_df.count(),
        'neutral_ratio': result_df.filter(col("sentiment_category") == "neutral").count() / result_df.count(),
        'overall_sentiment': result_df.agg(avg("sentiment_score")).collect()[0][0]
    }

def hotspot_event_detection():
    """热点事件检测与分析核心处理函数"""
    event_df = spark.read.option("header", "true").csv("hdfs://localhost:9000/event_data/douyin_posts.csv")
    event_df = event_df.filter(col("content").isNotNull() & (col("likes_count").isNotNull()) & (col("comments_count").isNotNull()) & (col("shares_count").isNotNull()))
    event_df = event_df.withColumn("engagement_score", col("likes_count") + col("comments_count") * 2 + col("shares_count") * 3)
    def extract_keywords(text):
        if not text:
            return []
        keywords = ['无人驾驶', '自动驾驶', '网约车', '滴滴', '安全', '事故', '测试', '商用', '技术', 'AI', '人工智能', '未来', '交通']
        found_keywords = []
        text_lower = text.lower()
        for keyword in keywords:
            if keyword in text:
                found_keywords.append(keyword)
        return found_keywords
    def calculate_viral_index(likes, comments, shares, publish_hours_ago):
        if publish_hours_ago <= 0:
            publish_hours_ago = 1
        base_score = likes + comments * 2 + shares * 3
        time_decay = 1 / (1 + publish_hours_ago / 24)
        viral_index = base_score * time_decay
        return float(viral_index)
    keyword_udf = spark.udf.register("extract_keywords", extract_keywords, StringType())
    viral_udf = spark.udf.register("calculate_viral", calculate_viral_index, FloatType())
    current_time = datetime.now()
    event_df = event_df.withColumn("hours_since_publish", (col("publish_timestamp").cast("long") - current_time.timestamp()) / 3600)
    event_df = event_df.withColumn("viral_index", viral_udf(col("likes_count"), col("comments_count"), col("shares_count"), col("hours_since_publish")))
    hotspot_threshold = event_df.agg(avg("viral_index")).collect()[0][0] * 2
    hot_events = event_df.filter(col("viral_index") > hotspot_threshold).orderBy(desc("viral_index"))
    recent_events = event_df.filter(col("publish_date") >= (datetime.now() - timedelta(days=7)).strftime('%Y-%m-%d'))
    trending_events = recent_events.groupBy("topic_keyword").agg(sum("viral_index").alias("total_viral_score"), count("*").alias("post_count"), avg("engagement_score").alias("avg_engagement")).orderBy(desc("total_viral_score"))
    event_timeline = event_df.groupBy(col("publish_date")).agg(sum("viral_index").alias("daily_viral_score"), count("*").alias("daily_post_count"), avg("engagement_score").alias("daily_avg_engagement")).orderBy(asc("publish_date"))
    geographic_hotspots = event_df.groupBy("user_location").agg(sum("viral_index").alias("location_viral_score"), count("*").alias("location_post_count")).filter(col("location_post_count") > 10).orderBy(desc("location_viral_score"))
    hot_events.write.mode("overwrite").option("header", "true").csv("hdfs://localhost:9000/analysis_results/hot_events")
    trending_events.write.mode("overwrite").option("header", "true").csv("hdfs://localhost:9000/analysis_results/trending_topics")
    event_timeline.write.mode("overwrite").option("header", "true").csv("hdfs://localhost:9000/analysis_results/event_timeline")
    return {
        'total_hot_events': hot_events.count(),
        'top_trending_topic': trending_events.first()['topic_keyword'] if trending_events.count() > 0 else None,
        'peak_activity_date': event_timeline.orderBy(desc("daily_viral_score")).first()['publish_date'] if event_timeline.count() > 0 else None,
        'average_daily_engagement': event_timeline.agg(avg("daily_avg_engagement")).collect()[0][0]
    }

def public_opinion_trend_analysis():
    """公众舆论趋势分析核心处理函数"""
    opinion_df = spark.read.option("header", "true").csv("hdfs://localhost:9000/opinion_data/public_comments.csv")
    opinion_df = opinion_df.filter(col("content").isNotNull() & (col("publish_date").isNotNull()))
    def extract_opinion_dimensions(text):
        safety_keywords = ['安全', '事故', '危险', '风险', '保障', '可靠']
        tech_keywords = ['技术', '算法', '传感器', '雷达', '摄像头', 'AI', '人工智能']
        economic_keywords = ['价格', '费用', '成本', '便宜', '昂贵', '经济', '实惠']
        social_keywords = ['就业', '失业', '司机', '工作', '社会', '影响', '替代']
        dimensions = {'safety': 0, 'technology': 0, 'economic': 0, 'social': 0}
        text_lower = text.lower()
        for keyword in safety_keywords:
            if keyword in text:
                dimensions['safety'] += 1
        for keyword in tech_keywords:
            if keyword in text:
                dimensions['technology'] += 1
        for keyword in economic_keywords:
            if keyword in text:
                dimensions['economic'] += 1
        for keyword in social_keywords:
            if keyword in text:
                dimensions['social'] += 1
        return max(dimensions, key=dimensions.get) if max(dimensions.values()) > 0 else 'general'
    def calculate_opinion_strength(text, likes, comments):
        if not text:
            return 0.0
        text_length = len(text)
        engagement_factor = (likes + comments) / 10 if (likes + comments) > 0 else 0.1
        strength_score = (text_length / 100) * engagement_factor
        return min(10.0, max(0.1, strength_score))
    opinion_dimension_udf = spark.udf.register("extract_opinion", extract_opinion_dimensions, StringType())
    strength_udf = spark.udf.register("calculate_strength", calculate_opinion_strength, FloatType())
    opinion_df = opinion_df.withColumn("opinion_dimension", opinion_dimension_udf(col("content")))
    opinion_df = opinion_df.withColumn("opinion_strength", strength_udf(col("content"), col("likes_count"), col("comments_count")))
    monthly_trends = opinion_df.groupBy(col("publish_month"), col("opinion_dimension")).agg(count("*").alias("mention_count"), avg("opinion_strength").alias("avg_strength")).orderBy(asc("publish_month"), desc("mention_count"))
    dimension_evolution = opinion_df.groupBy("opinion_dimension").agg(count("*").alias("total_mentions"), avg("opinion_strength").alias("overall_strength"), max("publish_date").alias("latest_mention")).orderBy(desc("total_mentions"))
    weekly_sentiment_shift = opinion_df.groupBy(col("publish_week")).agg(avg("sentiment_score").alias("weekly_sentiment"), count("*").alias("weekly_volume"), sum("opinion_strength").alias("weekly_intensity")).orderBy(asc("publish_week"))
    demographic_opinions = opinion_df.groupBy("user_age_group", "opinion_dimension").agg(count("*").alias("group_mentions"), avg("opinion_strength").alias("group_strength")).orderBy(desc("group_mentions"))
    influence_analysis = opinion_df.filter(col("user_followers_count") > 1000).groupBy("opinion_dimension").agg(sum("opinion_strength").alias("influencer_impact"), count("*").alias("influencer_mentions")).orderBy(desc("influencer_impact"))
    monthly_trends.write.mode("overwrite").option("header", "true").csv("hdfs://localhost:9000/analysis_results/monthly_opinion_trends")
    dimension_evolution.write.mode("overwrite").option("header", "true").csv("hdfs://localhost:9000/analysis_results/opinion_dimensions")
    weekly_sentiment_shift.write.mode("overwrite").option("header", "true").csv("hdfs://localhost:9000/analysis_results/sentiment_evolution")
    return {
        'dominant_opinion_dimension': dimension_evolution.first()['opinion_dimension'] if dimension_evolution.count() > 0 else None,
        'trend_direction': 'positive' if weekly_sentiment_shift.orderBy(desc("publish_week")).first()['weekly_sentiment'] > 0 else 'negative',
        'opinion_intensity_peak': weekly_sentiment_shift.orderBy(desc("weekly_intensity")).first()['publish_week'] if weekly_sentiment_shift.count() > 0 else None,
        'total_analyzed_opinions': opinion_df.count()
    }

基于大数据的无人驾驶网约车抖音社会舆情分析系统文档展示

在这里插入图片描述

💖💖作者:计算机毕业设计江挽 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目