【大数据】全国健康老龄化民意调查数据分析与可视化系统计算机项目 Hadoop+Spark环境配置数据科学与大数据技术附源码+文档+讲解

一、个人简介

💖💖作者：计算机编程果茶熊 💙💙个人简介：曾长期从事计算机专业培训教学，担任过编程老师，同时本人也热爱上课教学，擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法，也喜欢交流技术，大家有技术代码这一块的问题可以问我！ 💛💛想说的话：感谢大家的关注与支持！ 💜💜 网站实战项目安卓/小程序实战项目大数据实战项目计算机毕业设计选题 💕💕文末获取源码联系计算机编程果茶熊

二、系统介绍

大数据框架：Hadoop+Spark（Hive需要定制修改）开发语言：Java+Python（两个版本都支持）数据库：MySQL 后端框架：SpringBoot(Spring+SpringMVC+Mybatis)+Django（两个版本都支持）前端：Vue+Echarts+HTML+CSS+JavaScript+jQuery

三、视频解说

全国健康老龄化民意调查数据分析与可视化系统

四、部分功能展示

在这里插入图片描述

五、部分代码展示


from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, avg, count, sum as spark_sum, stddev, percentile_approx
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DoubleType
import pandas as pd
import numpy as np
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
import json
import mysql.connector
from datetime import datetime
spark = SparkSession.builder.appName("HealthAgingAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()
def health_risk_analysis(request):
    survey_data = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/health_aging").option("dbtable", "survey_data").option("user", "root").option("password", "password").load()
    chronic_disease_df = survey_data.select("user_id", "age", "gender", "hypertension", "diabetes", "heart_disease", "stroke", "smoking", "drinking", "exercise_frequency", "bmi").filter(col("age") >= 60)
    risk_score_df = chronic_disease_df.withColumn("disease_score", when(col("hypertension") == 1, 2).otherwise(0) + when(col("diabetes") == 1, 3).otherwise(0) + when(col("heart_disease") == 1, 4).otherwise(0) + when(col("stroke") == 1, 5).otherwise(0)).withColumn("lifestyle_score", when(col("smoking") == 1, 2).otherwise(0) + when(col("drinking") == 1, 1).otherwise(0) + when(col("exercise_frequency") < 3, 2).otherwise(0)).withColumn("bmi_score", when(col("bmi") > 28, 3).when(col("bmi") > 24, 1).otherwise(0)).withColumn("total_risk_score", col("disease_score") + col("lifestyle_score") + col("bmi_score")).withColumn("risk_level", when(col("total_risk_score") >= 10, "高风险").when(col("total_risk_score") >= 6, "中风险").otherwise("低风险"))
    age_group_risk = risk_score_df.withColumn("age_group", when(col("age") < 65, "60-64岁").when(col("age") < 70, "65-69岁").when(col("age") < 75, "70-74岁").when(col("age") < 80, "75-79岁").otherwise("80岁以上")).groupBy("age_group", "risk_level").agg(count("*").alias("count")).orderBy("age_group", "risk_level")
    gender_risk = risk_score_df.groupBy("gender", "risk_level").agg(count("*").alias("count"), avg("total_risk_score").alias("avg_score")).orderBy("gender", "risk_level")
    high_risk_factors = risk_score_df.filter(col("risk_level") == "高风险").agg(avg("hypertension").alias("hypertension_rate"), avg("diabetes").alias("diabetes_rate"), avg("heart_disease").alias("heart_disease_rate"), avg("smoking").alias("smoking_rate"), avg("drinking").alias("drinking_rate"), avg("bmi").alias("avg_bmi"))
    correlation_matrix = risk_score_df.select("age", "bmi", "exercise_frequency", "total_risk_score").toPandas().corr()
    risk_distribution = risk_score_df.groupBy("risk_level").agg(count("*").alias("count"), (count("*") * 100.0 / risk_score_df.count()).alias("percentage")).collect()
    result_data = {"age_group_risk": [row.asDict() for row in age_group_risk.collect()], "gender_risk": [row.asDict() for row in gender_risk.collect()], "high_risk_factors": high_risk_factors.collect()[0].asDict(), "correlation_matrix": correlation_matrix.to_dict(), "risk_distribution": [row.asDict() for row in risk_distribution]}
    return JsonResponse({"status": "success", "data": result_data})
def medical_service_utilization_analysis(request):
    medical_data = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/health_aging").option("dbtable", "medical_service").option("user", "root").option("password", "password").load()
    user_profile = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/health_aging").option("dbtable", "user_profile").option("user", "root").option("password", "password").load()
    merged_data = medical_data.join(user_profile, "user_id", "inner").filter(col("age") >= 60)
    service_type_stats = merged_data.groupBy("service_type").agg(count("*").alias("total_visits"), avg("cost").alias("avg_cost"), spark_sum("cost").alias("total_cost")).orderBy(col("total_visits").desc())
    regional_utilization = merged_data.groupBy("region", "service_type").agg(count("*").alias("visit_count"), avg("cost").alias("avg_cost")).orderBy("region", col("visit_count").desc())
    age_service_pattern = merged_data.withColumn("age_group", when(col("age") < 65, "60-64岁").when(col("age") < 70, "65-69岁").when(col("age") < 75, "70-74岁").when(col("age") < 80, "75-79岁").otherwise("80岁以上")).groupBy("age_group", "service_type").agg(count("*").alias("usage_count"), avg("satisfaction_score").alias("avg_satisfaction")).orderBy("age_group", col("usage_count").desc())
    monthly_trend = merged_data.withColumn("visit_month", col("visit_date").substr(1, 7)).groupBy("visit_month").agg(count("*").alias("monthly_visits"), avg("cost").alias("monthly_avg_cost"), spark_sum("cost").alias("monthly_total_cost")).orderBy("visit_month")
    insurance_coverage = merged_data.groupBy("insurance_type").agg(count("*").alias("user_count"), avg("cost").alias("avg_out_of_pocket"), avg("reimbursement_rate").alias("avg_reimbursement")).orderBy(col("user_count").desc())
    accessibility_analysis = merged_data.groupBy("region").agg(avg("travel_distance").alias("avg_distance"), avg("waiting_time").alias("avg_waiting_time"), count("*").alias("service_frequency")).orderBy(col("avg_distance").desc())
    cost_burden_analysis = merged_data.withColumn("cost_burden_level", when(col("cost") > col("monthly_income") * 0.2, "高负担").when(col("cost") > col("monthly_income") * 0.1, "中负担").otherwise("低负担")).groupBy("cost_burden_level").agg(count("*").alias("user_count"), avg("satisfaction_score").alias("avg_satisfaction")).orderBy(col("user_count").desc())
    service_quality_metrics = merged_data.groupBy("hospital_level").agg(avg("satisfaction_score").alias("avg_satisfaction"), avg("treatment_effectiveness").alias("avg_effectiveness"), count("*").alias("total_services")).orderBy(col("avg_satisfaction").desc())
    utilization_result = {"service_type_stats": [row.asDict() for row in service_type_stats.collect()], "regional_utilization": [row.asDict() for row in regional_utilization.collect()], "age_service_pattern": [row.asDict() for row in age_service_pattern.collect()], "monthly_trend": [row.asDict() for row in monthly_trend.collect()], "insurance_coverage": [row.asDict() for row in insurance_coverage.collect()], "accessibility_analysis": [row.asDict() for row in accessibility_analysis.collect()], "cost_burden_analysis": [row.asDict() for row in cost_burden_analysis.collect()], "service_quality_metrics": [row.asDict() for row in service_quality_metrics.collect()]}
    return JsonResponse({"status": "success", "data": utilization_result})
def sleep_quality_correlation_analysis(request):
    sleep_data = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/health_aging").option("dbtable", "sleep_quality").option("user", "root").option("password", "password").load()
    health_indicators = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/health_aging").option("dbtable", "health_indicators").option("user", "root").option("password", "password").load()
    lifestyle_data = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/health_aging").option("dbtable", "lifestyle").option("user", "root").option("password", "password").load()
    comprehensive_data = sleep_data.join(health_indicators, "user_id", "inner").join(lifestyle_data, "user_id", "inner").filter(col("age") >= 60)
    sleep_quality_distribution = comprehensive_data.withColumn("sleep_quality_level", when(col("sleep_score") >= 80, "优质睡眠").when(col("sleep_score") >= 60, "良好睡眠").when(col("sleep_score") >= 40, "一般睡眠").otherwise("睡眠不佳")).groupBy("sleep_quality_level").agg(count("*").alias("count"), (count("*") * 100.0 / comprehensive_data.count()).alias("percentage")).orderBy(col("sleep_score").desc())
    age_sleep_correlation = comprehensive_data.withColumn("age_group", when(col("age") < 65, "60-64岁").when(col("age") < 70, "65-69岁").when(col("age") < 75, "70-74岁").when(col("age") < 80, "75-79岁").otherwise("80岁以上")).groupBy("age_group").agg(avg("sleep_score").alias("avg_sleep_score"), avg("sleep_duration").alias("avg_sleep_duration"), stddev("sleep_score").alias("sleep_score_std")).orderBy("age_group")
    health_sleep_correlation = comprehensive_data.select("sleep_score", "blood_pressure_systolic", "blood_pressure_diastolic", "blood_sugar", "cholesterol", "bmi").toPandas()
    correlation_coefficients = health_sleep_correlation.corr()["sleep_score"].drop("sleep_score").to_dict()
    lifestyle_impact = comprehensive_data.groupBy("exercise_frequency", "smoking_status", "alcohol_consumption").agg(avg("sleep_score").alias("avg_sleep_score"), count("*").alias("group_count")).filter(col("group_count") >= 10).orderBy(col("avg_sleep_score").desc())
    sleep_disorder_analysis = comprehensive_data.withColumn("has_sleep_disorder", when((col("sleep_score") < 40) | (col("sleep_duration") < 6) | (col("sleep_duration") > 9), 1).otherwise(0)).groupBy("has_sleep_disorder").agg(count("*").alias("count"), avg("hypertension").alias("hypertension_rate"), avg("diabetes").alias("diabetes_rate"), avg("depression_score").alias("avg_depression")).orderBy("has_sleep_disorder")
    medication_sleep_impact = comprehensive_data.filter(col("medication_count") > 0).groupBy("medication_count").agg(avg("sleep_score").alias("avg_sleep_score"), avg("sleep_duration").alias("avg_duration"), count("*").alias("user_count")).orderBy("medication_count")
    environmental_factors = comprehensive_data.groupBy("noise_level", "light_exposure").agg(avg("sleep_score").alias("avg_sleep_score"), count("*").alias("sample_size")).filter(col("sample_size") >= 5).orderBy(col("avg_sleep_score").desc())
    seasonal_pattern = comprehensive_data.withColumn("survey_season", when(col("survey_month").isin([12, 1, 2]), "冬季").when(col("survey_month").isin([3, 4, 5]), "春季").when(col("survey_month").isin([6, 7, 8]), "夏季").otherwise("秋季")).groupBy("survey_season").agg(avg("sleep_score").alias("avg_sleep_score"), avg("sleep_duration").alias("avg_duration")).orderBy("survey_season")
    sleep_result = {"sleep_quality_distribution": [row.asDict() for row in sleep_quality_distribution.collect()], "age_sleep_correlation": [row.asDict() for row in age_sleep_correlation.collect()], "health_correlation_coefficients": correlation_coefficients, "lifestyle_impact": [row.asDict() for row in lifestyle_impact.collect()], "sleep_disorder_analysis": [row.asDict() for row in sleep_disorder_analysis.collect()], "medication_impact": [row.asDict() for row in medication_sleep_impact.collect()], "environmental_factors": [row.asDict() for row in environmental_factors.collect()], "seasonal_pattern": [row.asDict() for row in seasonal_pattern.collect()]}
    return JsonResponse({"status": "success", "data": sleep_result})

六、部分文档展示

在这里插入图片描述

七、END

💕💕文末获取源码联系计算机编程果茶熊

【大数据】全国健康老龄化民意调查数据分析与可视化系统 计算机项目 Hadoop+Spark环境配置 数据科学与大数据技术 附源码+文档+讲解