大数据毕业设计推荐:基于Hadoop+Spark的压力检测数据分析系统完整实现|计算机毕业设计|毕业设计选题

46 阅读6分钟

前言

一.开发工具简介

  • 大数据框架:Hadoop+Spark(本次没用Hive,支持定制)
  • 开发语言:Python+Java(两个版本都支持)
  • 后端框架:Django+Spring Boot(Spring+SpringMVC+Mybatis)(两个版本都支持)
  • 前端:Vue+ElementUI+Echarts+HTML+CSS+JavaScript+jQuery
  • 详细技术点:Hadoop、HDFS、Spark、Spark SQL、Pandas、NumPy
  • 数据库:MySQL

二.系统内容简介

基于大数据的压力检测数据分析系统是一个集成现代大数据处理技术和数据分析方法的综合性平台,该系统采用Hadoop分布式存储架构和Spark大数据计算引擎作为核心技术支撑,结合Django后端框架和Vue前端技术栈,构建了完整的压力数据采集、存储、分析和可视化解决方案。系统通过个人中心模块实现用户信息管理和个性化设置,用户管理模块提供完整的用户权限控制功能,综合分析模块整合多维度数据进行全面的压力状况评估。系统的核心分析功能包括压力水平分析、生活方式压力分析、人格特质压力分析和生理指标压力分析四大模块,每个模块都运用Spark SQL和Pandas、NumPy等数据处理库进行深度数据挖掘和统计分析。通过Echarts图表组件实现数据可视化展示,帮助用户直观了解压力分布情况和变化趋势,系统管理模块确保整个平台的稳定运行和数据安全,所有数据存储在MySQL数据库中,通过HDFS分布式文件系统实现大规模数据的高效存储和访问。

三.系统功能演示

大数据毕业设计推荐:基于Hadoop+Spark的压力检测数据分析系统完整实现|计算机毕业设计|毕业设计选题

四.系统界面展示

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

五.系统源码展示


from pyspark.sql import SparkSession
from pyspark.sql.functions import col, avg, count, when, desc
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.clustering import KMeans
from django.http import JsonResponse
import pandas as pd
import numpy as np

spark = SparkSession.builder.appName("PressureAnalysisSystem").config("spark.sql.adaptive.enabled", "true").getOrCreate()

def comprehensive_pressure_analysis(request):
    user_id = request.GET.get('user_id')
    pressure_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/pressure_db").option("dbtable", "pressure_data").option("user", "root").option("password", "password").load()
    lifestyle_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/pressure_db").option("dbtable", "lifestyle_data").option("user", "root").option("password", "password").load()
    personality_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/pressure_db").option("dbtable", "personality_data").option("user", "root").option("password", "password").load()
    physiological_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/pressure_db").option("dbtable", "physiological_data").option("user", "root").option("password", "password").load()
    joined_df = pressure_df.join(lifestyle_df, "user_id", "inner").join(personality_df, "user_id", "inner").join(physiological_df, "user_id", "inner")
    if user_id:
        user_data = joined_df.filter(col("user_id") == user_id)
        pressure_score = user_data.select(avg("pressure_level")).collect()[0][0]
        lifestyle_score = user_data.select(avg("lifestyle_score")).collect()[0][0]
        personality_score = user_data.select(avg("personality_score")).collect()[0][0]
        physiological_score = user_data.select(avg("physiological_score")).collect()[0][0]
        comprehensive_score = (pressure_score * 0.3 + lifestyle_score * 0.25 + personality_score * 0.2 + physiological_score * 0.25)
        risk_level = "高风险" if comprehensive_score > 7.5 else "中风险" if comprehensive_score > 5.0 else "低风险"
        trend_data = user_data.orderBy("create_time").select("create_time", "pressure_level").collect()
        trend_analysis = []
        for i, row in enumerate(trend_data):
            if i > 0:
                change_rate = (row['pressure_level'] - trend_data[i-1]['pressure_level']) / trend_data[i-1]['pressure_level'] * 100
                trend_analysis.append({"date": str(row['create_time']), "pressure": row['pressure_level'], "change_rate": round(change_rate, 2)})
        correlation_matrix = joined_df.select("pressure_level", "lifestyle_score", "personality_score", "physiological_score").toPandas().corr()
        result_data = {"comprehensive_score": round(comprehensive_score, 2), "risk_level": risk_level, "pressure_avg": round(pressure_score, 2), "lifestyle_avg": round(lifestyle_score, 2), "personality_avg": round(personality_score, 2), "physiological_avg": round(physiological_score, 2), "trend_analysis": trend_analysis, "correlation": correlation_matrix.to_dict()}
        return JsonResponse({"status": "success", "data": result_data})

def pressure_level_clustering_analysis(request):
    pressure_data = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/pressure_db").option("dbtable", "pressure_data").option("user", "root").option("password", "password").load()
    feature_cols = ["work_pressure", "study_pressure", "family_pressure", "social_pressure", "health_pressure"]
    assembler = VectorAssembler(inputCols=feature_cols, outputCol="features")
    feature_df = assembler.transform(pressure_data)
    kmeans = KMeans(k=4, seed=42, featuresCol="features", predictionCol="cluster")
    model = kmeans.fit(feature_df)
    clustered_df = model.transform(feature_df)
    cluster_stats = clustered_df.groupBy("cluster").agg(count("user_id").alias("user_count"), avg("work_pressure").alias("avg_work_pressure"), avg("study_pressure").alias("avg_study_pressure"), avg("family_pressure").alias("avg_family_pressure"), avg("social_pressure").alias("avg_social_pressure"), avg("health_pressure").alias("avg_health_pressure")).orderBy("cluster")
    cluster_results = []
    for row in cluster_stats.collect():
        cluster_info = {"cluster_id": row['cluster'], "user_count": row['user_count'], "work_pressure": round(row['avg_work_pressure'], 2), "study_pressure": round(row['avg_study_pressure'], 2), "family_pressure": round(row['avg_family_pressure'], 2), "social_pressure": round(row['avg_social_pressure'], 2), "health_pressure": round(row['avg_health_pressure'], 2)}
        total_pressure = cluster_info["work_pressure"] + cluster_info["study_pressure"] + cluster_info["family_pressure"] + cluster_info["social_pressure"] + cluster_info["health_pressure"]
        cluster_info["total_pressure"] = round(total_pressure, 2)
        if total_pressure > 30:
            cluster_info["risk_category"] = "高压力群体"
        elif total_pressure > 20:
            cluster_info["risk_category"] = "中压力群体"
        else:
            cluster_info["risk_category"] = "低压力群体"
        cluster_results.append(cluster_info)
    pressure_distribution = clustered_df.groupBy("cluster").agg(count(when(col("work_pressure") > 8, True)).alias("high_work_pressure"), count(when(col("study_pressure") > 8, True)).alias("high_study_pressure"), count(when(col("family_pressure") > 8, True)).alias("high_family_pressure")).collect()
    distribution_data = []
    for row in pressure_distribution:
        distribution_data.append({"cluster": row['cluster'], "high_work": row['high_work_pressure'], "high_study": row['high_study_pressure'], "high_family": row['high_family_pressure']})
    return JsonResponse({"status": "success", "clusters": cluster_results, "distribution": distribution_data, "total_users": clustered_df.count()})

def lifestyle_pressure_correlation_analysis(request):
    lifestyle_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/pressure_db").option("dbtable", "lifestyle_data").option("user", "root").option("password", "password").load()
    pressure_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/pressure_db").option("dbtable", "pressure_data").option("user", "root").option("password", "password").load()
    combined_df = lifestyle_df.join(pressure_df, "user_id", "inner")
    sleep_pressure_corr = combined_df.groupBy("sleep_hours").agg(avg("pressure_level").alias("avg_pressure")).orderBy("sleep_hours")
    exercise_pressure_corr = combined_df.groupBy("exercise_frequency").agg(avg("pressure_level").alias("avg_pressure")).orderBy("exercise_frequency")
    diet_pressure_corr = combined_df.groupBy("diet_quality").agg(avg("pressure_level").alias("avg_pressure")).orderBy("diet_quality")
    social_pressure_corr = combined_df.groupBy("social_activity").agg(avg("pressure_level").alias("avg_pressure")).orderBy("social_activity")
    work_schedule_analysis = combined_df.groupBy("work_schedule").agg(avg("pressure_level").alias("avg_pressure"), count("user_id").alias("user_count")).orderBy(desc("avg_pressure"))
    sleep_data = [{"sleep_hours": row['sleep_hours'], "pressure": round(row['avg_pressure'], 2)} for row in sleep_pressure_corr.collect()]
    exercise_data = [{"exercise_freq": row['exercise_frequency'], "pressure": round(row['avg_pressure'], 2)} for row in exercise_pressure_corr.collect()]
    diet_data = [{"diet_quality": row['diet_quality'], "pressure": round(row['avg_pressure'], 2)} for row in diet_pressure_corr.collect()]
    social_data = [{"social_activity": row['social_activity'], "pressure": round(row['avg_pressure'], 2)} for row in social_pressure_corr.collect()]
    work_schedule_data = [{"schedule_type": row['work_schedule'], "pressure": round(row['avg_pressure'], 2), "user_count": row['user_count']} for row in work_schedule_analysis.collect()]
    high_pressure_lifestyle = combined_df.filter(col("pressure_level") > 7).groupBy("sleep_hours", "exercise_frequency").count().orderBy(desc("count")).limit(10)
    high_pressure_patterns = [{"sleep_hours": row['sleep_hours'], "exercise_freq": row['exercise_frequency'], "count": row['count']} for row in high_pressure_lifestyle.collect()]
    lifestyle_recommendations = []
    for sleep_row in sleep_data:
        if sleep_row['pressure'] < 5.0:
            lifestyle_recommendations.append(f"建议保持{sleep_row['sleep_hours']}小时睡眠,压力水平较低")
    for exercise_row in exercise_data:
        if exercise_row['pressure'] < 5.0:
            lifestyle_recommendations.append(f"建议运动频率为{exercise_row['exercise_freq']}次/周,有助于减压")
    pandas_df = combined_df.select("sleep_hours", "exercise_frequency", "diet_quality", "social_activity", "pressure_level").toPandas()
    correlation_matrix = pandas_df.corr()['pressure_level'].to_dict()
    return JsonResponse({"status": "success", "sleep_analysis": sleep_data, "exercise_analysis": exercise_data, "diet_analysis": diet_data, "social_analysis": social_data, "work_schedule_analysis": work_schedule_data, "high_pressure_patterns": high_pressure_patterns, "recommendations": lifestyle_recommendations, "correlations": correlation_matrix})

六.系统文档展示

在这里插入图片描述

结束