基于大数据的学生生活习惯与成绩关联性的数据分析系统 | 大数据技术太复杂？生活习惯成绩分析系统用Spark+Django零基础也能上手

💖💖作者：计算机毕业设计杰瑞 💙💙个人简介：曾长期从事计算机专业培训教学，本人也热爱上课教学，语言擅长Java、微信小程序、Python、Golang、安卓Android等，开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法，也喜欢交流技术，大家有技术代码这一块的问题可以问我！ 💛💛想说的话：感谢大家的关注与支持！ 💜💜 网站实战项目安卓/小程序实战项目大数据实战项目深度学校实战项目计算机毕业设计选题推荐

基于大数据的学生生活习惯与成绩关联性的数据分析系统介绍

《学生生活习惯与成绩关联性的数据分析系统》是一套基于大数据技术构建的教育数据挖掘平台，采用Hadoop分布式存储架构和Spark大数据计算引擎，通过Python语言和Django框架实现后端业务逻辑，前端运用Vue+ElementUI+Echarts技术栈构建可视化界面。系统围绕学生的生活习惯数据与学业成绩之间的潜在关联进行深度分析，涵盖学生特征综合分析、背景环境分析、数字生活习惯分析、核心学习行为分析以及生活健康状况分析等多个维度。通过Spark SQL对海量教育数据进行高效处理，结合Pandas和NumPy进行数据预处理和统计计算，系统能够识别不同生活方式对学习效果的影响模式，为教育管理者提供基于数据驱动的决策支持。整个系统采用前后端分离架构，数据存储于MySQL数据库，支持大规模数据的并行计算和实时分析，通过丰富的图表展示让复杂的数据关系变得直观易懂，为高校学生管理和教学改进提供科学的量化分析工具。

基于大数据的学生生活习惯与成绩关联性的数据分析系统演示视频

演示视频

基于大数据的学生生活习惯与成绩关联性的数据分析系统演示图片

在这里插入图片描述

基于大数据的学生生活习惯与成绩关联性的数据分析系统代码展示

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, avg, count, when, sum as spark_sum, stddev, corr
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType
from django.http import JsonResponse
from django.views.decorators.http import require_http_methods
import json
import pandas as pd
import numpy as np
from datetime import datetime

spark = SparkSession.builder.appName("StudentLifeAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()

@require_http_methods(["POST"])
def student_comprehensive_analysis(request):
    data = json.loads(request.body)
    student_ids = data.get('student_ids', [])
    analysis_type = data.get('analysis_type', 'overall')
    df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/student_analysis").option("dbtable", "student_life_data").option("user", "root").option("password", "password").load()
    filtered_df = df.filter(col("student_id").isin(student_ids)) if student_ids else df
    sleep_score = when(col("sleep_hours") >= 7, 3).when(col("sleep_hours") >= 6, 2).otherwise(1)
    exercise_score = when(col("exercise_frequency") >= 4, 3).when(col("exercise_frequency") >= 2, 2).otherwise(1)
    study_score = when(col("study_hours") >= 6, 3).when(col("study_hours") >= 4, 2).otherwise(1)
    social_score = when(col("social_activities") <= 3, 3).when(col("social_activities") <= 5, 2).otherwise(1)
    comprehensive_df = filtered_df.withColumn("sleep_score", sleep_score).withColumn("exercise_score", exercise_score).withColumn("study_score", study_score).withColumn("social_score", social_score).withColumn("total_score", col("sleep_score") + col("exercise_score") + col("study_score") + col("social_score"))
    grade_correlation = comprehensive_df.select(corr("total_score", "final_grade").alias("correlation")).collect()[0]["correlation"]
    avg_stats = comprehensive_df.groupBy("grade_level").agg(avg("total_score").alias("avg_score"), avg("final_grade").alias("avg_grade"), count("student_id").alias("student_count"), stddev("total_score").alias("score_stddev")).collect()
    high_performers = comprehensive_df.filter(col("final_grade") >= 85).select(avg("sleep_hours").alias("avg_sleep"), avg("exercise_frequency").alias("avg_exercise"), avg("study_hours").alias("avg_study")).collect()[0]
    low_performers = comprehensive_df.filter(col("final_grade") < 70).select(avg("sleep_hours").alias("avg_sleep"), avg("exercise_frequency").alias("avg_exercise"), avg("study_hours").alias("avg_study")).collect()[0]
    habit_distribution = comprehensive_df.groupBy("sleep_score", "exercise_score").agg(avg("final_grade").alias("avg_grade"), count("student_id").alias("count")).orderBy("avg_grade", ascending=False).collect()
    result_data = {"correlation_coefficient": round(grade_correlation, 4), "grade_statistics": [{"grade_level": row["grade_level"], "average_score": round(row["avg_score"], 2), "average_grade": round(row["avg_grade"], 2), "student_count": row["student_count"], "score_deviation": round(row["score_stddev"], 2)} for row in avg_stats], "performance_comparison": {"high_performers": {"sleep": round(high_performers["avg_sleep"], 2), "exercise": round(high_performers["avg_exercise"], 2), "study": round(high_performers["avg_study"], 2)}, "low_performers": {"sleep": round(low_performers["avg_sleep"], 2), "exercise": round(low_performers["avg_exercise"], 2), "study": round(low_performers["avg_study"], 2)}}, "habit_patterns": [{"sleep_score": row["sleep_score"], "exercise_score": row["exercise_score"], "avg_grade": round(row["avg_grade"], 2), "student_count": row["count"]} for row in habit_distribution[:10]]}
    return JsonResponse({"status": "success", "data": result_data})

@require_http_methods(["POST"])
def digital_life_habit_analysis(request):
    data = json.loads(request.body)
    time_range = data.get('time_range', '30')
    analysis_dimensions = data.get('dimensions', ['screen_time', 'social_media', 'gaming'])
    df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/student_analysis").option("dbtable", "digital_habit_data").option("user", "root").option("password", "password").load()
    current_df = df.filter(col("record_date") >= (datetime.now().strftime('%Y-%m-%d')))
    screen_time_impact = current_df.groupBy(when(col("daily_screen_time") <= 4, "Low").when(col("daily_screen_time") <= 8, "Medium").otherwise("High").alias("screen_time_level")).agg(avg("academic_performance").alias("avg_performance"), count("student_id").alias("count"), avg("attention_span").alias("avg_attention")).collect()
    social_media_correlation = current_df.select(corr("social_media_hours", "academic_performance").alias("social_correlation"), corr("social_media_hours", "sleep_quality").alias("sleep_correlation")).collect()[0]
    gaming_analysis = current_df.filter(col("gaming_hours") > 0).groupBy(when(col("gaming_hours") <= 2, "Moderate").when(col("gaming_hours") <= 5, "Heavy").otherwise("Excessive").alias("gaming_level")).agg(avg("academic_performance").alias("avg_performance"), avg("social_interaction_score").alias("avg_social"), count("student_id").alias("count")).collect()
    device_usage_pattern = current_df.groupBy("primary_device_type").agg(avg("daily_screen_time").alias("avg_screen_time"), avg("academic_performance").alias("avg_performance"), avg("eye_strain_score").alias("avg_eye_strain")).collect()
    app_category_impact = current_df.select("student_id", "entertainment_app_time", "educational_app_time", "social_app_time", "academic_performance").rdd.map(lambda row: (row["student_id"], row["entertainment_app_time"] / (row["educational_app_time"] + 0.1), row["academic_performance"])).toDF(["student_id", "entertainment_ratio", "performance"])
    ratio_performance = app_category_impact.groupBy(when(col("entertainment_ratio") <= 1, "Balanced").when(col("entertainment_ratio") <= 3, "Entertainment_Heavy").otherwise("Entertainment_Dominant").alias("usage_type")).agg(avg("performance").alias("avg_performance"), count("student_id").alias("count")).collect()
    digital_addiction_indicators = current_df.select(avg(when(col("daily_screen_time") > 10, 1).otherwise(0)).alias("excessive_screen_rate"), avg(when(col("sleep_disruption_by_device") == 1, 1).otherwise(0)).alias("sleep_disruption_rate"), avg(when(col("device_dependency_score") > 7, 1).otherwise(0)).alias("high_dependency_rate")).collect()[0]
    time_distribution = current_df.groupBy("hour_of_peak_usage").agg(avg("academic_performance").alias("avg_performance"), count("student_id").alias("count")).orderBy("hour_of_peak_usage").collect()
    result_data = {"screen_time_analysis": [{"level": row["screen_time_level"], "avg_performance": round(row["avg_performance"], 2), "avg_attention": round(row["avg_attention"], 2), "student_count": row["count"]} for row in screen_time_impact], "correlation_metrics": {"social_media_academic": round(social_media_correlation["social_correlation"], 4), "social_media_sleep": round(social_media_correlation["sleep_correlation"], 4)}, "gaming_impact": [{"level": row["gaming_level"], "avg_performance": round(row["avg_performance"], 2), "avg_social_score": round(row["avg_social"], 2), "count": row["count"]} for row in gaming_analysis], "device_patterns": [{"device_type": row["primary_device_type"], "avg_screen_time": round(row["avg_screen_time"], 2), "avg_performance": round(row["avg_performance"], 2), "avg_eye_strain": round(row["avg_eye_strain"], 2)} for row in device_usage_pattern], "app_usage_balance": [{"usage_type": row["usage_type"], "avg_performance": round(row["avg_performance"], 2), "student_count": row["count"]} for row in ratio_performance], "addiction_indicators": {"excessive_screen_rate": round(digital_addiction_indicators["excessive_screen_rate"], 3), "sleep_disruption_rate": round(digital_addiction_indicators["sleep_disruption_rate"], 3), "dependency_rate": round(digital_addiction_indicators["high_dependency_rate"], 3)}, "usage_timeline": [{"hour": row["hour_of_peak_usage"], "avg_performance": round(row["avg_performance"], 2), "frequency": row["count"]} for row in time_distribution]}
    return JsonResponse({"status": "success", "data": result_data})

@require_http_methods(["POST"])
def core_learning_behavior_analysis(request):
    data = json.loads(request.body)
    subject_filter = data.get('subjects', [])
    behavior_types = data.get('behavior_types', ['study_duration', 'study_frequency', 'break_patterns'])
    df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/student_analysis").option("dbtable", "learning_behavior_data").option("user", "root").option("password", "password").load()
    filtered_df = df.filter(col("subject").isin(subject_filter)) if subject_filter else df
    study_duration_effectiveness = filtered_df.groupBy(when(col("continuous_study_minutes") <= 45, "Short").when(col("continuous_study_minutes") <= 90, "Medium").when(col("continuous_study_minutes") <= 180, "Long").otherwise("Extended").alias("duration_category")).agg(avg("comprehension_score").alias("avg_comprehension"), avg("retention_score").alias("avg_retention"), count("student_id").alias("session_count"), avg("fatigue_level").alias("avg_fatigue")).collect()
    break_pattern_analysis = filtered_df.filter(col("break_frequency") > 0).groupBy(when(col("break_interval_minutes") <= 30, "Frequent").when(col("break_interval_minutes") <= 60, "Regular").otherwise("Infrequent").alias("break_pattern")).agg(avg("productivity_score").alias("avg_productivity"), avg("concentration_duration").alias("avg_concentration"), count("student_id").alias("count")).collect()
    study_time_preference = filtered_df.groupBy("preferred_study_time").agg(avg("academic_performance").alias("avg_performance"), avg("energy_level").alias("avg_energy"), avg("distraction_resistance").alias("avg_focus"), count("student_id").alias("count")).collect()
    learning_method_effectiveness = filtered_df.groupBy("primary_learning_method").agg(avg("quiz_scores").alias("avg_quiz_score"), avg("assignment_completion_rate").alias("avg_completion"), avg("concept_understanding_rate").alias("avg_understanding"), count("student_id").alias("method_users")).collect()
    subject_performance_correlation = filtered_df.groupBy("subject").agg(corr("study_hours_per_week", "final_grade").alias("time_grade_correlation"), avg("study_hours_per_week").alias("avg_weekly_hours"), avg("final_grade").alias("avg_grade"), stddev("final_grade").alias("grade_deviation")).collect()
    procrastination_impact = filtered_df.select(avg(when(col("assignment_delay_days") > 3, 1).otherwise(0)).alias("procrastination_rate"), corr("assignment_delay_days", "final_grade").alias("delay_grade_correlation"), avg("stress_level").alias("avg_stress")).collect()[0]
    study_environment_analysis = filtered_df.groupBy("study_environment_type").agg(avg("concentration_score").alias("avg_concentration"), avg("completion_time_minutes").alias("avg_completion_time"), count("student_id").alias("environment_users"), avg("satisfaction_score").alias("avg_satisfaction")).collect()
    revision_strategy_effectiveness = filtered_df.filter(col("revision_frequency") > 0).groupBy(when(col("revision_frequency") >= 3, "High").when(col("revision_frequency") >= 1, "Medium").otherwise("Low").alias("revision_level")).agg(avg("exam_performance").alias("avg_exam_score"), avg("long_term_retention").alias("avg_retention"), count("student_id").alias("count")).collect()
    multitasking_impact = filtered_df.select(avg(when(col("simultaneous_activities") > 2, col("efficiency_score")).otherwise(None)).alias("multitask_efficiency"), avg(when(col("simultaneous_activities") <= 2, col("efficiency_score")).otherwise(None)).alias("focused_efficiency"), corr("simultaneous_activities", "error_rate").alias("multitask_error_correlation")).collect()[0]
    result_data = {"study_duration_analysis": [{"category": row["duration_category"], "avg_comprehension": round(row["avg_comprehension"], 2), "avg_retention": round(row["avg_retention"], 2), "session_count": row["session_count"], "avg_fatigue": round(row["avg_fatigue"], 2)} for row in study_duration_effectiveness], "break_patterns": [{"pattern": row["break_pattern"], "avg_productivity": round(row["avg_productivity"], 2), "avg_concentration": round(row["avg_concentration"], 2), "student_count": row["count"]} for row in break_pattern_analysis

基于大数据的学生生活习惯与成绩关联性的数据分析系统文档展示

在这里插入图片描述

💖💖作者：计算机毕业设计杰瑞 💙💙个人简介：曾长期从事计算机专业培训教学，本人也热爱上课教学，语言擅长Java、微信小程序、Python、Golang、安卓Android等，开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法，也喜欢交流技术，大家有技术代码这一块的问题可以问我！ 💛💛想说的话：感谢大家的关注与支持！ 💜💜 网站实战项目安卓/小程序实战项目大数据实战项目深度学校实战项目计算机毕业设计选题推荐