大数据毕业设计选题:基于Python+Django的眼癌分析系统详解|毕设|计算机毕设|程序开发|项目实战

42 阅读6分钟

前言

💖💖作者:计算机程序员小杨 💙💙个人简介:我是一名计算机相关专业的从业者,擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。热爱技术,喜欢钻研新工具和框架,也乐于通过代码解决实际问题,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💕💕文末获取源码联系 计算机程序员小杨 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目 计算机毕业设计选题 💜💜

一.开发工具简介

大数据框架:Hadoop+Spark(本次没用Hive,支持定制) 开发语言:Python+Java(两个版本都支持) 后端框架:Django+Spring Boot(Spring+SpringMVC+Mybatis)(两个版本都支持) 前端:Vue+ElementUI+Echarts+HTML+CSS+JavaScript+jQuery 详细技术点:Hadoop、HDFS、Spark、Spark SQL、Pandas、NumPy 数据库:MySQL

二.系统内容简介

《基于大数据的眼癌数据分析与可视化系统》是一套运用现代大数据技术构建的医疗数据智能分析平台。系统采用Hadoop分布式存储框架搭配Spark内存计算引擎,实现对海量眼癌临床数据的高效处理与深度挖掘。前端采用Vue框架结合ElementUI组件库构建交互界面,通过Echarts图表库实现数据的多维度可视化展示。后端基于Python语言的Django框架进行开发,整合Pandas和NumPy科学计算库对医疗数据进行统计分析。系统核心功能涵盖眼癌数据管理、患者基本画像分析、临床特征分析、治疗方案效果评估以及生存预后预测等模块。通过HDFS分布式文件系统存储大规模临床数据,运用Spark SQL进行复杂查询分析,最终在可视化大屏中呈现眼癌疾病的流行病学特征、治疗效果对比以及预后风险评估结果,为医疗决策提供数据支撑,同时也为大数据技术在医疗健康领域的应用提供了实践案例。

三.系统功能演示

大数据毕业设计选题:基于Python+Django的眼癌分析系统详解|毕设|计算机毕设|程序开发|项目实战

四.系统界面展示

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

五.系统源码展示


from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, avg, max, min, when, isnan, isnull
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.evaluation import BinaryClassificationEvaluator
import pandas as pd
import numpy as np
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
import json

spark = SparkSession.builder.appName("EyeCancerAnalysis").config("spark.sql.adaptive.enabled", "true").getOrCreate()

def patient_profile_analysis(request):
    df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/eyecancer").option("dbtable", "patient_data").option("user", "root").option("password", "password").load()
    total_patients = df.count()
    age_distribution = df.groupBy("age_group").count().orderBy("age_group").collect()
    gender_distribution = df.groupBy("gender").count().collect()
    region_distribution = df.groupBy("region").count().orderBy(col("count").desc()).limit(10).collect()
    avg_age = df.agg(avg("age")).collect()[0][0]
    age_ranges = df.select((col("age") < 18).alias("child"), (col("age") >= 18) & (col("age") < 60).alias("adult"), (col("age") >= 60).alias("elderly"))
    age_stats = age_ranges.select(count(when(col("child"), 1)).alias("child_count"), count(when(col("adult"), 1)).alias("adult_count"), count(when(col("elderly"), 1)).alias("elderly_count")).collect()[0]
    occupation_stats = df.groupBy("occupation").count().orderBy(col("count").desc()).collect()
    education_stats = df.groupBy("education_level").count().collect()
    family_history = df.groupBy("family_history").count().collect()
    smoking_stats = df.groupBy("smoking_status").count().collect()
    result_data = {"total_patients": total_patients, "avg_age": round(avg_age, 2), "age_distribution": [{"age_group": row["age_group"], "count": row["count"]} for row in age_distribution], "gender_distribution": [{"gender": row["gender"], "count": row["count"]} for row in gender_distribution], "region_distribution": [{"region": row["region"], "count": row["count"]} for row in region_distribution], "age_stats": {"child": age_stats["child_count"], "adult": age_stats["adult_count"], "elderly": age_stats["elderly_count"]}, "occupation_stats": [{"occupation": row["occupation"], "count": row["count"]} for row in occupation_stats[:10]], "education_stats": [{"education": row["education_level"], "count": row["count"]} for row in education_stats], "family_history": [{"status": row["family_history"], "count": row["count"]} for row in family_history], "smoking_stats": [{"status": row["smoking_status"], "count": row["count"]} for row in smoking_stats]}
    return JsonResponse(result_data, safe=False)

def clinical_feature_analysis(request):
    clinical_df = spark.read.format("jdbc").option("url", "jdbc://localhost:3306/eyecancer").option("dbtable", "clinical_data").option("user", "root").option("password", "password").load()
    tumor_type_stats = clinical_df.groupBy("tumor_type").count().orderBy(col("count").desc()).collect()
    tumor_stage_stats = clinical_df.groupBy("tumor_stage").count().orderBy("tumor_stage").collect()
    tumor_size_stats = clinical_df.agg(avg("tumor_size"), max("tumor_size"), min("tumor_size")).collect()[0]
    location_stats = clinical_df.groupBy("tumor_location").count().collect()
    vision_impact = clinical_df.groupBy("vision_loss_level").count().collect()
    symptom_analysis = clinical_df.select("primary_symptoms").rdd.flatMap(lambda x: x[0].split(",") if x[0] else []).map(lambda x: (x.strip(), 1)).reduceByKey(lambda a, b: a + b).collect()
    metastasis_stats = clinical_df.groupBy("metastasis_status").count().collect()
    lymph_node_involvement = clinical_df.groupBy("lymph_node_status").count().collect()
    histological_grade = clinical_df.groupBy("histological_grade").count().collect()
    biomarker_positive = clinical_df.filter(col("biomarker_status") == "positive").count()
    biomarker_total = clinical_df.count()
    biomarker_rate = (biomarker_positive / biomarker_total * 100) if biomarker_total > 0 else 0
    stage_size_correlation = clinical_df.groupBy("tumor_stage").agg(avg("tumor_size").alias("avg_size")).orderBy("tumor_stage").collect()
    age_stage_correlation = clinical_df.join(spark.read.format("jdbc").option("url", "jdbc://localhost:3306/eyecancer").option("dbtable", "patient_data").option("user", "root").option("password", "password").load(), "patient_id").groupBy("tumor_stage").agg(avg("age").alias("avg_age")).collect()
    response_data = {"tumor_type_distribution": [{"type": row["tumor_type"], "count": row["count"]} for row in tumor_type_stats], "tumor_stage_distribution": [{"stage": row["tumor_stage"], "count": row["count"]} for row in tumor_stage_stats], "tumor_size_stats": {"average": round(tumor_size_stats[0], 2), "maximum": tumor_size_stats[1], "minimum": tumor_size_stats[2]}, "tumor_location": [{"location": row["tumor_location"], "count": row["count"]} for row in location_stats], "vision_impact": [{"level": row["vision_loss_level"], "count": row["count"]} for row in vision_impact], "common_symptoms": [{"symptom": symptom[0], "frequency": symptom[1]} for symptom in sorted(symptom_analysis, key=lambda x: x[1], reverse=True)[:10]], "metastasis_status": [{"status": row["metastasis_status"], "count": row["count"]} for row in metastasis_stats], "lymph_node_status": [{"status": row["lymph_node_status"], "count": row["count"]} for row in lymph_node_involvement], "histological_distribution": [{"grade": row["histological_grade"], "count": row["count"]} for row in histological_grade], "biomarker_positive_rate": round(biomarker_rate, 2), "stage_size_correlation": [{"stage": row["tumor_stage"], "avg_size": round(row["avg_size"], 2)} for row in stage_size_correlation], "age_stage_correlation": [{"stage": row["tumor_stage"], "avg_age": round(row["avg_age"], 2)} for row in age_stage_correlation]}
    return JsonResponse(response_data, safe=False)

def survival_prognosis_analysis(request):
    survival_df = spark.read.format("jdbc").option("url", "jdbc://localhost:3306/eyecancer").option("dbtable", "survival_data").option("user", "root").option("password", "password").load()
    patient_df = spark.read.format("jdbc").option("url", "jdbc://localhost:3306/eyecancer").option("dbtable", "patient_data").option("user", "root").option("password", "password").load()
    clinical_df = spark.read.format("jdbc").option("url", "jdbc://localhost:3306/eyecancer").option("dbtable", "clinical_data").option("user", "root").option("password", "password").load()
    combined_df = survival_df.join(patient_df, "patient_id").join(clinical_df, "patient_id")
    overall_survival_rate = combined_df.filter(col("survival_status") == "alive").count() / combined_df.count() * 100
    survival_by_stage = combined_df.groupBy("tumor_stage").agg((count(when(col("survival_status") == "alive", 1)) / count("*") * 100).alias("survival_rate")).collect()
    survival_by_treatment = combined_df.groupBy("treatment_type").agg((count(when(col("survival_status") == "alive", 1)) / count("*") * 100).alias("survival_rate")).collect()
    age_survival_correlation = combined_df.groupBy((col("age") / 10).cast("int").alias("age_decade")).agg((count(when(col("survival_status") == "alive", 1)) / count("*") * 100).alias("survival_rate")).orderBy("age_decade").collect()
    survival_time_stats = combined_df.filter(col("survival_time").isNotNull()).agg(avg("survival_time"), max("survival_time"), min("survival_time")).collect()[0]
    risk_factors = combined_df.select("tumor_size", "tumor_stage", "age", "metastasis_status", "survival_status").na.drop()
    feature_cols = ["tumor_size", "age"]
    assembler = VectorAssembler(inputCols=feature_cols, outputCol="features")
    feature_df = assembler.transform(risk_factors)
    feature_df = feature_df.withColumn("label", when(col("survival_status") == "alive", 0).otherwise(1))
    train_df, test_df = feature_df.randomSplit([0.8, 0.2], seed=42)
    rf = RandomForestClassifier(featuresCol="features", labelCol="label", numTrees=50)
    model = rf.fit(train_df)
    predictions = model.transform(test_df)
    evaluator = BinaryClassificationEvaluator(labelCol="label", rawPredictionCol="rawPrediction")
    auc_score = evaluator.evaluate(predictions)
    feature_importance = model.featureImportances.toArray()
    recurrence_rate = combined_df.filter(col("recurrence_status") == "recurrent").count() / combined_df.count() * 100
    time_to_recurrence = combined_df.filter(col("recurrence_status") == "recurrent").agg(avg("time_to_recurrence")).collect()[0][0]
    prognosis_data = {"overall_survival_rate": round(overall_survival_rate, 2), "survival_by_stage": [{"stage": row["tumor_stage"], "survival_rate": round(row["survival_rate"], 2)} for row in survival_by_stage], "survival_by_treatment": [{"treatment": row["treatment_type"], "survival_rate": round(row["survival_rate"], 2)} for row in survival_by_treatment], "age_survival_correlation": [{"age_decade": f"{row['age_decade']*10}-{(row['age_decade']+1)*10}", "survival_rate": round(row["survival_rate"], 2)} for row in age_survival_correlation], "survival_time_stats": {"average_months": round(survival_time_stats[0], 2) if survival_time_stats[0] else 0, "max_months": survival_time_stats[1] if survival_time_stats[1] else 0, "min_months": survival_time_stats[2] if survival_time_stats[2] else 0}, "risk_prediction_accuracy": round(auc_score, 3), "feature_importance": {"tumor_size": round(feature_importance[0], 3), "age": round(feature_importance[1], 3)}, "recurrence_rate": round(recurrence_rate, 2), "avg_time_to_recurrence": round(time_to_recurrence, 2) if time_to_recurrence else 0}
    return JsonResponse(prognosis_data, safe=False)

六.系统文档展示

在这里插入图片描述

结束

💕💕文末获取源码联系 计算机程序员小杨