大学导师推荐：2026年最值得做的Spark健康生活数据分析与可视化系统|毕设|计算机毕设|程序开发|项目实战

前言

💖💖作者：计算机程序员小杨 💙💙个人简介：我是一名计算机相关专业的从业者，擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。热爱技术，喜欢钻研新工具和框架，也乐于通过代码解决实际问题，大家有技术代码这一块的问题可以问我！ 💛💛想说的话：感谢大家的关注与支持！ 💕💕文末获取源码联系计算机程序员小杨 💜💜 网站实战项目安卓/小程序实战项目大数据实战项目深度学习实战项目计算机毕业设计选题 💜💜

一.开发工具简介

大数据框架：Hadoop+Spark（本次没用Hive，支持定制）开发语言：Python+Java（两个版本都支持）后端框架：Django+Spring Boot(Spring+SpringMVC+Mybatis)（两个版本都支持）前端：Vue+ElementUI+Echarts+HTML+CSS+JavaScript+jQuery 详细技术点：Hadoop、HDFS、Spark、Spark SQL、Pandas、NumPy 数据库：MySQL

二.系统内容简介

基于大数据的人类健康生活方式数据分析与可视化系统是一个集成现代大数据处理技术的综合性健康数据分析平台。该系统采用Hadoop分布式存储架构和Spark大数据计算引擎作为核心技术栈，结合Python数据科学库和Django Web框架构建后端服务，前端采用Vue.js配合ElementUI和Echarts实现数据可视化展示。系统主要功能涵盖健康生活数据的采集、存储、分析和可视化四个核心环节，通过HDFS分布式文件系统存储海量健康数据，利用Spark SQL进行高效的数据查询和处理，运用Pandas和NumPy进行深度数据分析。平台提供个人健康数据管理、生理衰退关联性分析、人群健康画像构建、生活方式影响评估以及特定人群风险预警等关键功能模块，同时配备直观的可视化大屏展示系统，帮助用户深入理解健康数据背后的规律和趋势，为个人健康管理和公共卫生决策提供数据支撑。

三.系统功能演示

四.系统界面展示

在这里插入图片描述

五.系统源码展示



from pyspark.sql import SparkSession
from pyspark.sql.functions import col, avg, count, sum, when, desc, asc
import pandas as pd
import numpy as np
from django.http import JsonResponse
from django.db import connection

spark = SparkSession.builder.appName("HealthDataAnalysis").config("spark.sql.adaptive.enabled", "true").getOrCreate()

def health_data_analysis(request):
    user_id = request.GET.get('user_id')
    time_range = request.GET.get('time_range', '30')
    health_df = spark.sql(f"""
        SELECT user_id, record_date, heart_rate, blood_pressure_high, blood_pressure_low,
               sleep_hours, exercise_duration, calorie_intake, weight, bmi_value, stress_level
        FROM health_records 
        WHERE user_id = {user_id} AND record_date >= date_sub(current_date(), {time_range})
        ORDER BY record_date DESC
    """)
    avg_metrics = health_df.agg(
        avg("heart_rate").alias("avg_heart_rate"),
        avg("blood_pressure_high").alias("avg_bp_high"),
        avg("blood_pressure_low").alias("avg_bp_low"),
        avg("sleep_hours").alias("avg_sleep"),
        avg("exercise_duration").alias("avg_exercise"),
        avg("calorie_intake").alias("avg_calories"),
        avg("weight").alias("avg_weight"),
        avg("bmi_value").alias("avg_bmi"),
        avg("stress_level").alias("avg_stress")
    ).collect()[0]
    trend_analysis = health_df.select("record_date", "weight", "bmi_value", "heart_rate").orderBy("record_date")
    trend_data = []
    for row in trend_analysis.collect():
        trend_data.append({
            'date': str(row.record_date),
            'weight': float(row.weight) if row.weight else 0,
            'bmi': float(row.bmi_value) if row.bmi_value else 0,
            'heart_rate': int(row.heart_rate) if row.heart_rate else 0
        })
    health_score = calculate_health_score(avg_metrics.avg_heart_rate, avg_metrics.avg_bp_high, 
                                        avg_metrics.avg_sleep, avg_metrics.avg_exercise, avg_metrics.avg_bmi)
    risk_factors = []
    if avg_metrics.avg_bp_high > 140:
        risk_factors.append("高血压风险")
    if avg_metrics.avg_bmi > 25:
        risk_factors.append("超重风险")
    if avg_metrics.avg_sleep < 7:
        risk_factors.append("睡眠不足")
    if avg_metrics.avg_exercise < 150:
        risk_factors.append("运动不足")
    return JsonResponse({
        'status': 'success',
        'avg_metrics': {
            'heart_rate': round(avg_metrics.avg_heart_rate, 1) if avg_metrics.avg_heart_rate else 0,
            'bp_high': round(avg_metrics.avg_bp_high, 1) if avg_metrics.avg_bp_high else 0,
            'bp_low': round(avg_metrics.avg_bp_low, 1) if avg_metrics.avg_bp_low else 0,
            'sleep_hours': round(avg_metrics.avg_sleep, 1) if avg_metrics.avg_sleep else 0,
            'exercise_duration': round(avg_metrics.avg_exercise, 1) if avg_metrics.avg_exercise else 0,
            'calorie_intake': round(avg_metrics.avg_calories, 1) if avg_metrics.avg_calories else 0,
            'weight': round(avg_metrics.avg_weight, 1) if avg_metrics.avg_weight else 0,
            'bmi': round(avg_metrics.avg_bmi, 1) if avg_metrics.avg_bmi else 0,
            'stress_level': round(avg_metrics.avg_stress, 1) if avg_metrics.avg_stress else 0
        },
        'trend_data': trend_data,
        'health_score': health_score,
        'risk_factors': risk_factors
    })

def population_health_portrait(request):
    age_group = request.GET.get('age_group', 'all')
    gender = request.GET.get('gender', 'all')
    region = request.GET.get('region', 'all')
    base_query = "SELECT * FROM health_records hr JOIN user_profiles up ON hr.user_id = up.user_id WHERE 1=1"
    conditions = []
    if age_group != 'all':
        age_ranges = {'young': '18-30', 'middle': '31-50', 'senior': '51-80'}
        if age_group in age_ranges:
            start_age, end_age = age_ranges[age_group].split('-')
            conditions.append(f"up.age BETWEEN {start_age} AND {end_age}")
    if gender != 'all':
        conditions.append(f"up.gender = '{gender}'")
    if region != 'all':
        conditions.append(f"up.region = '{region}'")
    if conditions:
        base_query += " AND " + " AND ".join(conditions)
    population_df = spark.sql(base_query)
    total_users = population_df.select("user_id").distinct().count()
    health_distribution = population_df.select("bmi_value", "heart_rate", "blood_pressure_high", 
                                             "sleep_hours", "exercise_duration").agg(
        avg("bmi_value").alias("avg_bmi"),
        avg("heart_rate").alias("avg_heart_rate"),
        avg("blood_pressure_high").alias("avg_bp_high"),
        avg("sleep_hours").alias("avg_sleep"),
        avg("exercise_duration").alias("avg_exercise"),
        count(when(col("bmi_value") < 18.5, 1)).alias("underweight_count"),
        count(when((col("bmi_value") >= 18.5) & (col("bmi_value") < 25), 1)).alias("normal_weight_count"),
        count(when((col("bmi_value") >= 25) & (col("bmi_value") < 30), 1)).alias("overweight_count"),
        count(when(col("bmi_value") >= 30, 1)).alias("obese_count"),
        count(when(col("blood_pressure_high") > 140, 1)).alias("hypertension_count"),
        count(when(col("sleep_hours") < 7, 1)).alias("sleep_insufficient_count"),
        count(when(col("exercise_duration") < 150, 1)).alias("exercise_insufficient_count")
    ).collect()[0]
    lifestyle_patterns = population_df.groupBy("up.lifestyle_type").agg(
        count("*").alias("user_count"),
        avg("bmi_value").alias("avg_bmi"),
        avg("heart_rate").alias("avg_heart_rate"),
        avg("sleep_hours").alias("avg_sleep")
    ).orderBy(desc("user_count")).collect()
    risk_analysis = {
        'hypertension_rate': (health_distribution.hypertension_count / total_users * 100) if total_users > 0 else 0,
        'obesity_rate': (health_distribution.obese_count / total_users * 100) if total_users > 0 else 0,
        'sleep_issue_rate': (health_distribution.sleep_insufficient_count / total_users * 100) if total_users > 0 else 0,
        'exercise_insufficient_rate': (health_distribution.exercise_insufficient_count / total_users * 100) if total_users > 0 else 0
    }
    lifestyle_data = []
    for row in lifestyle_patterns:
        lifestyle_data.append({
            'type': row.lifestyle_type,
            'count': row.user_count,
            'avg_bmi': round(row.avg_bmi, 1) if row.avg_bmi else 0,
            'avg_heart_rate': round(row.avg_heart_rate, 1) if row.avg_heart_rate else 0,
            'avg_sleep': round(row.avg_sleep, 1) if row.avg_sleep else 0
        })
    return JsonResponse({
        'status': 'success',
        'total_users': total_users,
        'health_averages': {
            'bmi': round(health_distribution.avg_bmi, 1) if health_distribution.avg_bmi else 0,
            'heart_rate': round(health_distribution.avg_heart_rate, 1) if health_distribution.avg_heart_rate else 0,
            'blood_pressure': round(health_distribution.avg_bp_high, 1) if health_distribution.avg_bp_high else 0,
            'sleep_hours': round(health_distribution.avg_sleep, 1) if health_distribution.avg_sleep else 0,
            'exercise_duration': round(health_distribution.avg_exercise, 1) if health_distribution.avg_exercise else 0
        },
        'weight_distribution': {
            'underweight': health_distribution.underweight_count,
            'normal': health_distribution.normal_weight_count,
            'overweight': health_distribution.overweight_count,
            'obese': health_distribution.obese_count
        },
        'risk_analysis': risk_analysis,
        'lifestyle_patterns': lifestyle_data
    })

def lifestyle_impact_analysis(request):
    lifestyle_factor = request.GET.get('factor', 'exercise')
    target_metric = request.GET.get('metric', 'bmi')
    time_period = request.GET.get('period', '90')
    correlation_df = spark.sql(f"""
        SELECT hr.user_id, hr.{lifestyle_factor}, hr.{target_metric}, hr.record_date,
               up.age, up.gender, up.region
        FROM health_records hr 
        JOIN user_profiles up ON hr.user_id = up.user_id
        WHERE hr.record_date >= date_sub(current_date(), {time_period})
        AND hr.{lifestyle_factor} IS NOT NULL 
        AND hr.{target_metric} IS NOT NULL
    """)
    factor_ranges = get_factor_ranges(lifestyle_factor)
    impact_groups = []
    for range_key, (min_val, max_val) in factor_ranges.items():
        if max_val == float('inf'):
            group_df = correlation_df.filter(col(lifestyle_factor) >= min_val)
        else:
            group_df = correlation_df.filter((col(lifestyle_factor) >= min_val) & (col(lifestyle_factor) < max_val))
        group_stats = group_df.agg(
            count("*").alias("sample_count"),
            avg(target_metric).alias("avg_target"),
            avg("age").alias("avg_age")
        ).collect()[0]
        if group_stats.sample_count > 0:
            impact_groups.append({
                'range': range_key,
                'sample_count': group_stats.sample_count,
                'avg_target': round(group_stats.avg_target, 2) if group_stats.avg_target else 0,
                'avg_age': round(group_stats.avg_age, 1) if group_stats.avg_age else 0
            })
    correlation_data = correlation_df.select(lifestyle_factor, target_metric).toPandas()
    if len(correlation_data) > 1:
        correlation_coefficient = np.corrcoef(correlation_data[lifestyle_factor], correlation_data[target_metric])[0, 1]
    else:
        correlation_coefficient = 0
    trend_analysis = correlation_df.groupBy("record_date").agg(
        avg(lifestyle_factor).alias("daily_avg_factor"),
        avg(target_metric).alias("daily_avg_target")
    ).orderBy("record_date")
    trend_data = []
    for row in trend_analysis.collect():
        trend_data.append({
            'date': str(row.record_date),
            'factor_avg': round(row.daily_avg_factor, 2) if row.daily_avg_factor else 0,
            'target_avg': round(row.daily_avg_target, 2) if row.daily_avg_target else 0
        })
    gender_impact = correlation_df.groupBy("gender").agg(
        avg(lifestyle_factor).alias("avg_factor"),
        avg(target_metric).alias("avg_target"),
        count("*").alias("count")
    ).collect()
    gender_data = []
    for row in gender_impact:
        gender_data.append({
            'gender': row.gender,
            'avg_factor': round(row.avg_factor, 2) if row.avg_factor else 0,
            'avg_target': round(row.avg_target, 2) if row.avg_target else 0,
            'count': row.count
        })
    return JsonResponse({
        'status': 'success',
        'correlation_coefficient': round(correlation_coefficient, 3),
        'impact_groups': impact_groups,
        'trend_data': trend_data,
        'gender_analysis': gender_data,
        'total_samples': correlation_df.count(),
        'analysis_period': f"{time_period}天"
    })

def calculate_health_score(heart_rate, bp_high, sleep_hours, exercise_duration, bmi):
    score = 100
    if heart_rate and heart_rate > 100:
        score -= 10
    if bp_high and bp_high > 140:
        score -= 15
    if sleep_hours and sleep_hours < 7:
        score -= 10
    if exercise_duration and exercise_duration < 150:
        score -= 10
    if bmi and (bmi < 18.5 or bmi > 25):
        score -= 15
    return max(score, 0)

def get_factor_ranges(factor):
    ranges = {
        'exercise': {'低': (0, 90), '中': (90, 210), '高': (210, float('inf'))},
        'sleep': {'不足': (0, 6), '适中': (6, 8), '充足': (8, float('inf'))},
        'calorie': {'低': (0, 1500), '中': (1500, 2500), '高': (2500, float('inf'))}
    }
    return ranges.get(factor, {'低': (0, 50), '高': (50, float('inf'))})

六.系统文档展示

在这里插入图片描述

结束

💕💕文末获取源码联系计算机程序员小杨