【大数据】脑肿瘤数据可视化分析系统 计算机毕业设计项目 Hadoop+Spark环境配置 数据科学与大数据技术 附源码+文档+讲解

30 阅读7分钟

前言

💖💖作者:计算机程序员小杨 💙💙个人简介:我是一名计算机相关专业的从业者,擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。热爱技术,喜欢钻研新工具和框架,也乐于通过代码解决实际问题,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💕💕文末获取源码联系 计算机程序员小杨 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目 计算机毕业设计选题 💜💜

一.开发工具简介

大数据框架:Hadoop+Spark(本次没用Hive,支持定制) 开发语言:Python+Java(两个版本都支持) 后端框架:Django+Spring Boot(Spring+SpringMVC+Mybatis)(两个版本都支持) 前端:Vue+ElementUI+Echarts+HTML+CSS+JavaScript+jQuery 详细技术点:Hadoop、HDFS、Spark、Spark SQL、Pandas、NumPy 数据库:MySQL

二.系统内容简介

脑肿瘤数据可视化分系统是一个基于Hadoop+Spark大数据框架构建的医疗数据分析平台。系统采用Python作为主要开发语言,后端使用Django框架提供RESTful API接口,前端基于Vue、ElementUI和Echarts实现交互式数据可视化界面。系统将脑肿瘤患者的临床数据存储在MySQL数据库中,通过HDFS进行分布式存储管理,利用Spark SQL进行大规模数据查询与聚合分析,结合Pandas和NumPy完成复杂的统计计算。系统核心功能涵盖用户权限管理、脑肿瘤病例信息的增删改查、临床特征的多维度分布分析、患者年龄性别等人口学特征统计、肿瘤风险因素的关联性挖掘、临床症状与病理类型的相关性分析、不同治疗方案的效果评估及预后预测,以及整合各类分析结果的可视化大屏展示。系统通过分布式计算引擎处理海量医疗数据,为医疗机构提供脑肿瘤患者数据的深度分析工具,辅助临床决策与科研工作。

三.系统功能演示

脑肿瘤数据可视化分析系统

四.系统界面展示

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

五.系统源码展示


from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, avg, sum, when, round, desc, concat_ws, collect_list
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType, DateType
from django.http import JsonResponse
from django.views.decorators.http import require_http_methods
import pandas as pd
import numpy as np
from datetime import datetime
import json
spark = SparkSession.builder.appName("BrainTumorAnalysis").config("spark.sql.warehouse.dir", "/user/hive/warehouse").config("spark.executor.memory", "4g").config("spark.driver.memory", "2g").getOrCreate()
@require_http_methods(["POST"])
def clinical_feature_distribution_analysis(request):
    data = json.loads(request.body)
    start_date = data.get('start_date')
    end_date = data.get('end_date')
    tumor_type = data.get('tumor_type', None)
    query = f"SELECT patient_id, tumor_type, tumor_grade, tumor_location, tumor_size, diagnosis_date, age, gender FROM brain_tumor_patients WHERE diagnosis_date BETWEEN '{start_date}' AND '{end_date}'"
    if tumor_type:
        query += f" AND tumor_type = '{tumor_type}'"
    df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/brain_tumor_db").option("dbtable", f"({query}) as tmp").option("user", "root").option("password", "password").option("driver", "com.mysql.cj.jdbc.Driver").load()
    tumor_type_dist = df.groupBy("tumor_type").agg(count("*").alias("count"), round(avg("tumor_size"), 2).alias("avg_size")).orderBy(desc("count"))
    tumor_grade_dist = df.groupBy("tumor_type", "tumor_grade").agg(count("*").alias("count")).orderBy("tumor_type", "tumor_grade")
    location_dist = df.groupBy("tumor_location").agg(count("*").alias("count"), round(avg("age"), 1).alias("avg_age")).orderBy(desc("count"))
    size_range_df = df.withColumn("size_range", when(col("tumor_size") < 2, "<2cm").when((col("tumor_size") >= 2) & (col("tumor_size") < 5), "2-5cm").when((col("tumor_size") >= 5) & (col("tumor_size") < 8), "5-8cm").otherwise(">=8cm"))
    size_dist = size_range_df.groupBy("size_range", "tumor_type").agg(count("*").alias("count")).orderBy("size_range", "tumor_type")
    gender_dist = df.groupBy("tumor_type", "gender").agg(count("*").alias("count")).orderBy("tumor_type", "gender")
    age_group_df = df.withColumn("age_group", when(col("age") < 18, "0-18").when((col("age") >= 18) & (col("age") < 40), "18-40").when((col("age") >= 40) & (col("age") < 60), "40-60").otherwise("60+"))
    age_tumor_dist = age_group_df.groupBy("age_group", "tumor_type").agg(count("*").alias("count")).orderBy("age_group", "tumor_type")
    type_data = [{"tumor_type": row["tumor_type"], "count": row["count"], "avg_size": float(row["avg_size"])} for row in tumor_type_dist.collect()]
    grade_data = [{"tumor_type": row["tumor_type"], "grade": row["tumor_grade"], "count": row["count"]} for row in tumor_grade_dist.collect()]
    location_data = [{"location": row["tumor_location"], "count": row["count"], "avg_age": float(row["avg_age"])} for row in location_dist.collect()]
    size_data = [{"size_range": row["size_range"], "tumor_type": row["tumor_type"], "count": row["count"]} for row in size_dist.collect()]
    gender_data = [{"tumor_type": row["tumor_type"], "gender": row["gender"], "count": row["count"]} for row in gender_dist.collect()]
    age_data = [{"age_group": row["age_group"], "tumor_type": row["tumor_type"], "count": row["count"]} for row in age_tumor_dist.collect()]
    total_patients = df.count()
    unique_types = df.select("tumor_type").distinct().count()
    avg_tumor_size = df.select(round(avg("tumor_size"), 2)).collect()[0][0]
    return JsonResponse({"status": "success", "total_patients": total_patients, "unique_types": unique_types, "avg_tumor_size": float(avg_tumor_size), "type_distribution": type_data, "grade_distribution": grade_data, "location_distribution": location_data, "size_distribution": size_data, "gender_distribution": gender_data, "age_distribution": age_data})
@require_http_methods(["POST"])
def tumor_risk_factor_analysis(request):
    data = json.loads(request.body)
    analysis_type = data.get('analysis_type', 'comprehensive')
    query = "SELECT patient_id, tumor_type, tumor_grade, age, gender, family_history, smoking_history, alcohol_consumption, radiation_exposure, chemical_exposure, hypertension, diabetes, bmi, diagnosis_date FROM brain_tumor_patients WHERE status = 'active'"
    df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/brain_tumor_db").option("dbtable", f"({query}) as tmp").option("user", "root").option("password", "password").option("driver", "com.mysql.cj.jdbc.Driver").load()
    family_risk = df.groupBy("tumor_type", "family_history").agg(count("*").alias("count")).orderBy("tumor_type", "family_history")
    lifestyle_df = df.withColumn("lifestyle_risk", when((col("smoking_history") == "yes") | (col("alcohol_consumption") == "heavy"), "high").when((col("smoking_history") == "former") | (col("alcohol_consumption") == "moderate"), "medium").otherwise("low"))
    lifestyle_risk = lifestyle_df.groupBy("tumor_type", "lifestyle_risk").agg(count("*").alias("count"), round(avg("tumor_grade"), 2).alias("avg_grade")).orderBy("tumor_type", "lifestyle_risk")
    exposure_df = df.withColumn("environmental_exposure", when((col("radiation_exposure") == "yes") | (col("chemical_exposure") == "yes"), "exposed").otherwise("not_exposed"))
    exposure_risk = exposure_df.groupBy("tumor_type", "environmental_exposure").agg(count("*").alias("count")).orderBy("tumor_type", "environmental_exposure")
    comorbidity_df = df.withColumn("has_comorbidity", when((col("hypertension") == "yes") | (col("diabetes") == "yes"), "yes").otherwise("no"))
    comorbidity_risk = comorbidity_df.groupBy("tumor_type", "has_comorbidity").agg(count("*").alias("count"), round(avg("age"), 1).alias("avg_age")).orderBy("tumor_type", "has_comorbidity")
    bmi_df = df.withColumn("bmi_category", when(col("bmi") < 18.5, "underweight").when((col("bmi") >= 18.5) & (col("bmi") < 24), "normal").when((col("bmi") >= 24) & (col("bmi") < 28), "overweight").otherwise("obese"))
    bmi_risk = bmi_df.groupBy("tumor_type", "bmi_category").agg(count("*").alias("count")).orderBy("tumor_type", "bmi_category")
    age_risk_df = df.withColumn("age_category", when(col("age") < 30, "young").when((col("age") >= 30) & (col("age") < 50), "middle").otherwise("elderly"))
    age_risk = age_risk_df.groupBy("tumor_type", "age_category", "gender").agg(count("*").alias("count")).orderBy("tumor_type", "age_category", "gender")
    high_risk_patients = df.filter((col("family_history") == "yes") & ((col("smoking_history") == "yes") | (col("alcohol_consumption") == "heavy")) & ((col("radiation_exposure") == "yes") | (col("chemical_exposure") == "yes")))
    high_risk_count = high_risk_patients.count()
    high_risk_types = high_risk_patients.groupBy("tumor_type").agg(count("*").alias("count")).orderBy(desc("count"))
    family_data = [{"tumor_type": row["tumor_type"], "family_history": row["family_history"], "count": row["count"]} for row in family_risk.collect()]
    lifestyle_data = [{"tumor_type": row["tumor_type"], "risk_level": row["lifestyle_risk"], "count": row["count"], "avg_grade": float(row["avg_grade"])} for row in lifestyle_risk.collect()]
    exposure_data = [{"tumor_type": row["tumor_type"], "exposure": row["environmental_exposure"], "count": row["count"]} for row in exposure_risk.collect()]
    comorbidity_data = [{"tumor_type": row["tumor_type"], "has_comorbidity": row["has_comorbidity"], "count": row["count"], "avg_age": float(row["avg_age"])} for row in comorbidity_risk.collect()]
    bmi_data = [{"tumor_type": row["tumor_type"], "bmi_category": row["bmi_category"], "count": row["count"]} for row in bmi_risk.collect()]
    age_data = [{"tumor_type": row["tumor_type"], "age_category": row["age_category"], "gender": row["gender"], "count": row["count"]} for row in age_risk.collect()]
    high_risk_type_data = [{"tumor_type": row["tumor_type"], "count": row["count"]} for row in high_risk_types.collect()]
    return JsonResponse({"status": "success", "family_history_analysis": family_data, "lifestyle_risk_analysis": lifestyle_data, "environmental_exposure_analysis": exposure_data, "comorbidity_analysis": comorbidity_data, "bmi_analysis": bmi_data, "age_gender_analysis": age_data, "high_risk_patients_count": high_risk_count, "high_risk_tumor_types": high_risk_type_data})
@require_http_methods(["POST"])
def treatment_prognosis_analysis(request):
    data = json.loads(request.body)
    start_date = data.get('start_date')
    end_date = data.get('end_date')
    treatment_type = data.get('treatment_type', None)
    query = f"SELECT p.patient_id, p.tumor_type, p.tumor_grade, p.age, p.gender, t.treatment_type, t.treatment_start_date, t.treatment_end_date, t.treatment_outcome, f.survival_months, f.recurrence, f.quality_of_life_score, f.complication FROM brain_tumor_patients p JOIN treatments t ON p.patient_id = t.patient_id LEFT JOIN followup_records f ON p.patient_id = f.patient_id WHERE t.treatment_start_date BETWEEN '{start_date}' AND '{end_date}'"
    if treatment_type:
        query += f" AND t.treatment_type = '{treatment_type}'"
    df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/brain_tumor_db").option("dbtable", f"({query}) as tmp").option("user", "root").option("password", "password").option("driver", "com.mysql.cj.jdbc.Driver").load()
    treatment_outcome = df.groupBy("treatment_type", "treatment_outcome").agg(count("*").alias("count")).orderBy("treatment_type", "treatment_outcome")
    survival_analysis = df.groupBy("treatment_type", "tumor_type").agg(round(avg("survival_months"), 1).alias("avg_survival"), count("*").alias("patient_count")).orderBy("treatment_type", "tumor_type")
    recurrence_rate = df.groupBy("treatment_type", "recurrence").agg(count("*").alias("count")).orderBy("treatment_type", "recurrence")
    total_by_treatment = df.groupBy("treatment_type").agg(count("*").alias("total"))
    recurrence_with_rate = recurrence_rate.join(total_by_treatment, "treatment_type").withColumn("recurrence_rate", round((col("count") / col("total")) * 100, 2)).select("treatment_type", "recurrence", "count", "recurrence_rate")
    qol_analysis = df.filter(col("quality_of_life_score").isNotNull()).groupBy("treatment_type", "tumor_grade").agg(round(avg("quality_of_life_score"), 2).alias("avg_qol_score"), count("*").alias("count")).orderBy("treatment_type", "tumor_grade")
    complication_analysis = df.filter(col("complication").isNotNull()).groupBy("treatment_type", "complication").agg(count("*").alias("count")).orderBy("treatment_type", desc("count"))
    age_group_df = df.withColumn("age_group", when(col("age") < 40, "<40").when((col("age") >= 40) & (col("age") < 60), "40-60").otherwise("60+"))
    age_prognosis = age_group_df.groupBy("treatment_type", "age_group").agg(round(avg("survival_months"), 1).alias("avg_survival"), count("*").alias("count")).orderBy("treatment_type", "age_group")
    gender_prognosis = df.groupBy("treatment_type", "gender").agg(round(avg("survival_months"), 1).alias("avg_survival"), round(avg("quality_of_life_score"), 2).alias("avg_qol"), count("*").alias("count")).orderBy("treatment_type", "gender")
    grade_outcome = df.groupBy("tumor_grade", "treatment_type", "treatment_outcome").agg(count("*").alias("count")).orderBy("tumor_grade", "treatment_type", "treatment_outcome")
    best_treatment = survival_analysis.orderBy(desc("avg_survival")).limit(5)
    outcome_data = [{"treatment_type": row["treatment_type"], "outcome": row["treatment_outcome"], "count": row["count"]} for row in treatment_outcome.collect()]
    survival_data = [{"treatment_type": row["treatment_type"], "tumor_type": row["tumor_type"], "avg_survival_months": float(row["avg_survival"]), "patient_count": row["patient_count"]} for row in survival_analysis.collect()]
    recurrence_data = [{"treatment_type": row["treatment_type"], "recurrence": row["recurrence"], "count": row["count"], "rate": float(row["recurrence_rate"])} for row in recurrence_with_rate.collect()]
    qol_data = [{"treatment_type": row["treatment_type"], "tumor_grade": row["tumor_grade"], "avg_qol_score": float(row["avg_qol_score"]), "count": row["count"]} for row in qol_analysis.collect()]
    complication_data = [{"treatment_type": row["treatment_type"], "complication": row["complication"], "count": row["count"]} for row in complication_analysis.collect()]
    age_data = [{"treatment_type": row["treatment_type"], "age_group": row["age_group"], "avg_survival_months": float(row["avg_survival"]), "count": row["count"]} for row in age_prognosis.collect()]
    gender_data = [{"treatment_type": row["treatment_type"], "gender": row["gender"], "avg_survival_months": float(row["avg_survival"]), "avg_qol": float(row["avg_qol"]), "count": row["count"]} for row in gender_prognosis.collect()]
    grade_data = [{"tumor_grade": row["tumor_grade"], "treatment_type": row["treatment_type"], "outcome": row["treatment_outcome"], "count": row["count"]} for row in grade_outcome.collect()]
    best_treatment_data = [{"treatment_type": row["treatment_type"], "tumor_type": row["tumor_type"], "avg_survival_months": float(row["avg_survival"])} for row in best_treatment.collect()]
    return JsonResponse({"status": "success", "treatment_outcome_distribution": outcome_data, "survival_analysis": survival_data, "recurrence_analysis": recurrence_data, "quality_of_life_analysis": qol_data, "complication_analysis": complication_data, "age_group_prognosis": age_data, "gender_prognosis": gender_data, "grade_outcome_analysis": grade_data, "best_treatment_options": best_treatment_data})

六.系统文档展示

在这里插入图片描述

结束

💕💕文末获取源码联系 计算机程序员小杨