基于大数据的慢性肾病数据分析系统 | 7天搞定慢性肾病大数据分析系统:从Hadoop环境搭建到Spark数据处理完整教程

43 阅读6分钟

💖💖作者:计算机毕业设计杰瑞 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学校实战项目 计算机毕业设计选题推荐

基于大数据的慢性肾病数据分析系统介绍

慢性肾病数据分析系统是一套基于Hadoop+Spark大数据技术栈构建的医疗数据分析平台,采用Python+Django后端框架和Vue+ElementUI前端技术实现。系统通过HDFS分布式文件系统存储海量慢性肾病患者数据,利用Spark SQL和Pandas进行高效的数据清洗与分析处理。平台核心功能涵盖血液生化指标分析、临床模式分析、疾病流行病学分析、疾病进展分析、肾功能分析和多指标综合分析等模块,通过Echarts可视化技术将复杂的医疗数据以直观的图表形式展现。系统支持用户管理、个人信息维护、密码修改等基础功能,并提供大屏展示模式便于医疗机构进行数据汇报。整个系统充分运用大数据技术的并行计算优势,能够快速处理大规模慢性肾病数据集,为医疗决策提供数据支撑,同时展现了现代大数据技术在医疗健康领域的实际应用价值。

基于大数据的慢性肾病数据分析系统演示视频

演示视频

基于大数据的慢性肾病数据分析系统演示图片

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

基于大数据的慢性肾病数据分析系统代码展示

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
import pandas as pd
import numpy as np
from django.http import JsonResponse
from django.views.decorators.http import require_http_methods
import json

spark = SparkSession.builder.appName("ChronicKidneyDiseaseAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()

@require_http_methods(["POST"])
def blood_biochemical_analysis(request):
    data = json.loads(request.body)
    patient_ids = data.get('patient_ids', [])
    analysis_type = data.get('analysis_type', 'creatinine')
    time_range = data.get('time_range', 30)
    df = spark.sql(f"SELECT patient_id, test_date, creatinine, urea_nitrogen, gfr, albumin FROM kidney_disease_data WHERE patient_id IN ({','.join(map(str, patient_ids))}) AND test_date >= date_sub(current_date(), {time_range})")
    if analysis_type == 'creatinine':
        result_df = df.select("patient_id", "test_date", "creatinine").filter(col("creatinine").isNotNull())
        stats_df = result_df.groupBy("patient_id").agg(avg("creatinine").alias("avg_creatinine"), max("creatinine").alias("max_creatinine"), min("creatinine").alias("min_creatinine"), count("creatinine").alias("test_count"))
        abnormal_df = result_df.filter(col("creatinine") > 133).groupBy("patient_id").count().alias("abnormal_count")
        trend_df = result_df.withColumn("month", date_format(col("test_date"), "yyyy-MM")).groupBy("patient_id", "month").agg(avg("creatinine").alias("monthly_avg"))
        correlation_df = df.select("creatinine", "gfr").filter(col("creatinine").isNotNull() & col("gfr").isNotNull())
        correlation_value = correlation_df.stat.corr("creatinine", "gfr")
    elif analysis_type == 'urea_nitrogen':
        result_df = df.select("patient_id", "test_date", "urea_nitrogen").filter(col("urea_nitrogen").isNotNull())
        stats_df = result_df.groupBy("patient_id").agg(avg("urea_nitrogen").alias("avg_urea"), max("urea_nitrogen").alias("max_urea"), min("urea_nitrogen").alias("min_urea"), count("urea_nitrogen").alias("test_count"))
        abnormal_df = result_df.filter(col("urea_nitrogen") > 7.5).groupBy("patient_id").count().alias("abnormal_count")
        trend_df = result_df.withColumn("month", date_format(col("test_date"), "yyyy-MM")).groupBy("patient_id", "month").agg(avg("urea_nitrogen").alias("monthly_avg"))
    stats_result = stats_df.collect()
    trend_result = trend_df.collect()
    abnormal_result = abnormal_df.collect() if 'abnormal_df' in locals() else []
    processed_data = []
    for row in stats_result:
        patient_data = row.asDict()
        patient_trends = [t.asDict() for t in trend_result if t['patient_id'] == row['patient_id']]
        patient_abnormal = next((a['count'] for a in abnormal_result if a['patient_id'] == row['patient_id']), 0)
        patient_data['trends'] = patient_trends
        patient_data['abnormal_count'] = patient_abnormal
        processed_data.append(patient_data)
    return JsonResponse({'status': 'success', 'data': processed_data, 'correlation': correlation_value if 'correlation_value' in locals() else None})

@require_http_methods(["POST"])
def disease_progression_analysis(request):
    data = json.loads(request.body)
    patient_id = data.get('patient_id')
    analysis_period = data.get('period', 'yearly')
    progression_df = spark.sql(f"SELECT patient_id, test_date, stage, gfr, creatinine, proteinuria_level FROM kidney_disease_progression WHERE patient_id = {patient_id} ORDER BY test_date")
    if analysis_period == 'yearly':
        time_grouped = progression_df.withColumn("year", year(col("test_date"))).groupBy("patient_id", "year")
    elif analysis_period == 'monthly':
        time_grouped = progression_df.withColumn("year_month", date_format(col("test_date"), "yyyy-MM")).groupBy("patient_id", "year_month")
    else:
        time_grouped = progression_df.withColumn("quarter", concat(year(col("test_date")), lit("-Q"), quarter(col("test_date")))).groupBy("patient_id", "quarter")
    progression_stats = time_grouped.agg(avg("gfr").alias("avg_gfr"), avg("creatinine").alias("avg_creatinine"), max("stage").alias("max_stage"), avg("proteinuria_level").alias("avg_proteinuria"))
    stage_transitions = progression_df.select("test_date", "stage").collect()
    stage_changes = []
    for i in range(1, len(stage_transitions)):
        prev_stage = stage_transitions[i-1]['stage']
        curr_stage = stage_transitions[i]['stage']
        if prev_stage != curr_stage:
            stage_changes.append({'from_stage': prev_stage, 'to_stage': curr_stage, 'date': stage_transitions[i]['test_date']})
    gfr_decline_rate = progression_df.select("test_date", "gfr").orderBy("test_date")
    gfr_data = gfr_decline_rate.collect()
    decline_rates = []
    for i in range(1, len(gfr_data)):
        prev_gfr = gfr_data[i-1]['gfr']
        curr_gfr = gfr_data[i]['gfr']
        time_diff = (gfr_data[i]['test_date'] - gfr_data[i-1]['test_date']).days
        if time_diff > 0 and prev_gfr and curr_gfr:
            rate = (prev_gfr - curr_gfr) / time_diff * 365
            decline_rates.append(rate)
    avg_decline_rate = sum(decline_rates) / len(decline_rates) if decline_rates else 0
    risk_factors = progression_df.filter((col("creatinine") > 200) | (col("gfr") < 30) | (col("proteinuria_level") > 3)).count()
    progression_result = progression_stats.collect()
    return JsonResponse({'status': 'success', 'progression_data': [row.asDict() for row in progression_result], 'stage_changes': stage_changes, 'avg_decline_rate': avg_decline_rate, 'risk_factor_count': risk_factors})

@require_http_methods(["POST"])
def multi_indicator_analysis(request):
    data = json.loads(request.body)
    indicators = data.get('indicators', ['creatinine', 'gfr', 'urea_nitrogen'])
    patient_group = data.get('patient_group', 'all')
    correlation_type = data.get('correlation_type', 'pearson')
    if patient_group == 'all':
        base_df = spark.sql("SELECT * FROM kidney_disease_data WHERE creatinine IS NOT NULL AND gfr IS NOT NULL AND urea_nitrogen IS NOT NULL")
    else:
        base_df = spark.sql(f"SELECT * FROM kidney_disease_data WHERE patient_group = '{patient_group}' AND creatinine IS NOT NULL AND gfr IS NOT NULL AND urea_nitrogen IS NOT NULL")
    selected_indicators = base_df.select(*indicators)
    correlation_matrix = {}
    for i, indicator1 in enumerate(indicators):
        correlation_matrix[indicator1] = {}
        for j, indicator2 in enumerate(indicators):
            if i <= j:
                if indicator1 == indicator2:
                    correlation_matrix[indicator1][indicator2] = 1.0
                else:
                    corr_value = selected_indicators.stat.corr(indicator1, indicator2)
                    correlation_matrix[indicator1][indicator2] = corr_value
                    correlation_matrix.setdefault(indicator2, {})[indicator1] = corr_value
    statistical_summary = selected_indicators.describe().collect()
    summary_dict = {}
    for row in statistical_summary:
        summary_dict[row['summary']] = {indicator: row[indicator] for indicator in indicators}
    outlier_detection = {}
    for indicator in indicators:
        q1 = selected_indicators.approxQuantile(indicator, [0.25], 0.01)[0]
        q3 = selected_indicators.approxQuantile(indicator, [0.75], 0.01)[0]
        iqr = q3 - q1
        lower_bound = q1 - 1.5 * iqr
        upper_bound = q3 + 1.5 * iqr
        outliers = selected_indicators.filter((col(indicator) < lower_bound) | (col(indicator) > upper_bound)).count()
        outlier_detection[indicator] = {'count': outliers, 'lower_bound': lower_bound, 'upper_bound': upper_bound}
    cluster_analysis = selected_indicators.sample(0.1).collect()
    pandas_df = pd.DataFrame([row.asDict() for row in cluster_analysis])
    from sklearn.preprocessing import StandardScaler
    from sklearn.cluster import KMeans
    scaler = StandardScaler()
    scaled_data = scaler.fit_transform(pandas_df[indicators])
    kmeans = KMeans(n_clusters=3, random_state=42)
    clusters = kmeans.fit_predict(scaled_data)
    cluster_centers = kmeans.cluster_centers_
    cluster_info = {'centers': cluster_centers.tolist(), 'labels': clusters.tolist()}
    return JsonResponse({'status': 'success', 'correlation_matrix': correlation_matrix, 'statistical_summary': summary_dict, 'outlier_detection': outlier_detection, 'cluster_analysis': cluster_info})

基于大数据的慢性肾病数据分析系统文档展示

在这里插入图片描述

💖💖作者:计算机毕业设计杰瑞 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学校实战项目 计算机毕业设计选题推荐