基于大数据的心衰患者特征数据分析系统 | Hadoop+Spark处理海量数据:心衰患者生存曲线分析系统3秒完成千万级计算

60 阅读6分钟

💖💖作者:计算机毕业设计杰瑞 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学校实战项目 计算机毕业设计选题推荐

基于大数据的心衰患者特征数据分析系统介绍

心衰患者生存曲线分析系统是一套基于大数据技术架构的医疗数据分析平台,采用Hadoop分布式存储和Spark内存计算引擎构建高性能数据处理能力。系统前端运用Vue框架结合ElementUI组件库和Echarts可视化工具,为用户提供直观友好的操作界面和丰富的图表展示效果。后端基于SpringBoot框架开发,整合Mybatis持久层技术,通过Spark SQL实现海量心衰患者数据的快速查询和分析计算。平台核心功能涵盖患者基本特征分析、死亡风险评估、多维特征关联性分析、生理指标水平统计以及群体生存曲线绘制等模块,能够处理大规模心衰患者临床数据,为医疗研究人员提供数据挖掘和统计分析服务。系统采用分布式架构设计,支持千万级数据量的秒级响应处理,通过Pandas和NumPy科学计算库实现复杂的数据预处理和特征工程操作,为心血管疾病研究提供了可靠的技术支撑平台。

基于大数据的心衰患者特征数据分析系统演示视频

演示视频

基于大数据的心衰患者特征数据分析系统演示图片

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

基于大数据的心衰患者特征数据分析系统代码展示

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, count, avg, stddev, corr
import pandas as pd
import numpy as np
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
import json

spark = SparkSession.builder.appName("HeartFailureAnalysis").config("spark.sql.adaptive.enabled", "true").getOrCreate()

def analyze_patient_death_risk(request):
    try:
        patient_data = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/heart_failure").option("dbtable", "patient_features").option("user", "root").option("password", "123456").load()
        risk_analysis = patient_data.groupBy("age_group").agg(
            count("*").alias("total_patients"),
            count(when(col("death_event") == 1, 1)).alias("death_count"),
            avg("ejection_fraction").alias("avg_ejection_fraction"),
            avg("serum_creatinine").alias("avg_creatinine"),
            avg("serum_sodium").alias("avg_sodium")
        )
        risk_analysis = risk_analysis.withColumn("death_rate", 
            (col("death_count") / col("total_patients") * 100).cast("decimal(5,2)"))
        high_risk_groups = risk_analysis.filter(col("death_rate") > 30.0)
        risk_factors = patient_data.select("age", "ejection_fraction", "serum_creatinine", "serum_sodium", "death_event")
        correlation_matrix = risk_factors.toPandas().corr()
        significant_correlations = {}
        for column in correlation_matrix.columns:
            if column != "death_event":
                corr_value = correlation_matrix.loc["death_event", column]
                if abs(corr_value) > 0.3:
                    significant_correlations[column] = round(corr_value, 4)
        survival_stats = patient_data.groupBy("death_event").agg(
            count("*").alias("count"),
            avg("time").alias("avg_survival_time"),
            stddev("time").alias("std_survival_time")
        ).collect()
        result_data = {
            "risk_groups": [row.asDict() for row in risk_analysis.collect()],
            "high_risk_groups": [row.asDict() for row in high_risk_groups.collect()],
            "correlations": significant_correlations,
            "survival_statistics": [row.asDict() for row in survival_stats]
        }
        return JsonResponse({"code": 200, "data": result_data, "message": "死亡风险分析完成"})
    except Exception as e:
        return JsonResponse({"code": 500, "message": f"分析失败: {str(e)}"})

def multi_dimensional_feature_correlation(request):
    try:
        feature_data = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/heart_failure").option("dbtable", "patient_features").option("user", "root").option("password", "123456").load()
        numerical_features = ["age", "ejection_fraction", "serum_creatinine", "serum_sodium", "platelets", "creatinine_phosphokinase"]
        correlation_results = {}
        for i, feature1 in enumerate(numerical_features):
            for j, feature2 in enumerate(numerical_features[i+1:], i+1):
                correlation_value = feature_data.stat.corr(feature1, feature2)
                if abs(correlation_value) > 0.2:
                    correlation_results[f"{feature1}_vs_{feature2}"] = round(correlation_value, 4)
        feature_clusters = feature_data.select(*numerical_features).toPandas()
        cluster_analysis = {}
        for feature in numerical_features:
            feature_stats = feature_data.select(
                avg(feature).alias("mean"),
                stddev(feature).alias("std"),
                feature_data.approxQuantile(feature, [0.25, 0.5, 0.75], 0.1)
            ).collect()[0]
            cluster_analysis[feature] = {
                "mean": round(feature_stats["mean"], 4),
                "std": round(feature_stats["std"], 4),
                "quartiles": [round(q, 4) for q in feature_stats[2]]
            }
        interaction_effects = feature_data.select(
            "ejection_fraction", "serum_creatinine", "age", "death_event"
        ).withColumn("ef_creatinine_ratio", col("ejection_fraction") / col("serum_creatinine")).withColumn("age_ef_interaction", col("age") * col("ejection_fraction") / 100)
        interaction_correlations = {}
        for interaction_col in ["ef_creatinine_ratio", "age_ef_interaction"]:
            corr_with_death = interaction_effects.stat.corr(interaction_col, "death_event")
            interaction_correlations[interaction_col] = round(corr_with_death, 4)
        dimensional_summary = {
            "total_correlations": len(correlation_results),
            "strong_correlations": len([v for v in correlation_results.values() if abs(v) > 0.5]),
            "moderate_correlations": len([v for v in correlation_results.values() if 0.2 < abs(v) <= 0.5]),
            "feature_interactions": interaction_correlations
        }
        response_data = {
            "correlations": correlation_results,
            "feature_statistics": cluster_analysis,
            "dimensional_summary": dimensional_summary
        }
        return JsonResponse({"code": 200, "data": response_data, "message": "多维特征关联分析完成"})
    except Exception as e:
        return JsonResponse({"code": 500, "message": f"关联分析失败: {str(e)}"})

def generate_survival_curves(request):
    try:
        survival_data = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/heart_failure").option("dbtable", "patient_features").option("user", "root").option("password", "123456").load()
        time_intervals = list(range(0, 301, 30))
        survival_curves = {}
        for group_feature in ["gender", "diabetes", "high_blood_pressure", "smoking"]:
            group_survival = {}
            distinct_values = survival_data.select(group_feature).distinct().collect()
            for row in distinct_values:
                group_value = row[group_feature]
                group_data = survival_data.filter(col(group_feature) == group_value)
                total_patients = group_data.count()
                survival_points = []
                for time_point in time_intervals:
                    survivors_at_time = group_data.filter(
                        (col("time") >= time_point) | (col("death_event") == 0)
                    ).count()
                    survival_rate = round((survivors_at_time / total_patients) * 100, 2)
                    survival_points.append({"time": time_point, "survival_rate": survival_rate})
                group_survival[str(group_value)] = survival_points
            survival_curves[group_feature] = group_survival
        median_survival_times = {}
        for group_feature in ["gender", "diabetes", "high_blood_pressure", "smoking"]:
            group_medians = {}
            distinct_values = survival_data.select(group_feature).distinct().collect()
            for row in distinct_values:
                group_value = row[group_feature]
                group_data = survival_data.filter(col(group_feature) == group_value)
                deceased_patients = group_data.filter(col("death_event") == 1)
                if deceased_patients.count() > 0:
                    median_time = deceased_patients.approxQuantile("time", [0.5], 0.1)[0]
                    group_medians[str(group_value)] = round(median_time, 1)
                else:
                    group_medians[str(group_value)] = None
            median_survival_times[group_feature] = group_medians
        overall_survival = survival_data.agg(
            count("*").alias("total_patients"),
            count(when(col("death_event") == 1, 1)).alias("deaths"),
            avg("time").alias("avg_follow_up"),
            avg(when(col("death_event") == 1, col("time"))).alias("avg_death_time")
        ).collect()[0]
        curve_statistics = {
            "total_patients": overall_survival["total_patients"],
            "total_deaths": overall_survival["deaths"],
            "death_rate": round((overall_survival["deaths"] / overall_survival["total_patients"]) * 100, 2),
            "avg_follow_up": round(overall_survival["avg_follow_up"], 1),
            "avg_death_time": round(overall_survival["avg_death_time"], 1) if overall_survival["avg_death_time"] else None
        }
        response_data = {
            "survival_curves": survival_curves,
            "median_survival_times": median_survival_times,
            "curve_statistics": curve_statistics,
            "time_intervals": time_intervals
        }
        return JsonResponse({"code": 200, "data": response_data, "message": "生存曲线分析完成"})
    except Exception as e:
        return JsonResponse({"code": 500, "message": f"生存曲线生成失败: {str(e)}"})

基于大数据的心衰患者特征数据分析系统文档展示

在这里插入图片描述

💖💖作者:计算机毕业设计杰瑞 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学校实战项目 计算机毕业设计选题推荐