一、个人简介
💖💖作者:计算机编程果茶熊 💙💙个人简介:曾长期从事计算机专业培训教学,担任过编程老师,同时本人也热爱上课教学,擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 计算机毕业设计选题 💕💕文末获取源码联系计算机编程果茶熊
二、系统介绍
大数据框架:Hadoop+Spark(Hive需要定制修改) 开发语言:Java+Python(两个版本都支持) 数据库:MySQL 后端框架:SpringBoot(Spring+SpringMVC+Mybatis)+Django(两个版本都支持) 前端:Vue+Echarts+HTML+CSS+JavaScript+jQuery
本系统基于Hadoop+Spark大数据框架构建了一套全面的皮肤病症状数据可视化分析平台,通过整合医疗数据资源实现对皮肤病患者症状、诊断及治疗过程的深度挖掘与智能分析。系统采用Django作为后端服务框架,结合Vue+ElementUI+Echarts打造交互友好的前端界面,利用HDFS分布式存储海量医疗数据,通过Spark SQL进行高效数据查询与计算,配合Pandas和NumPy完成统计分析任务。在功能设计上,系统提供用户权限管理、皮肤病数据的增删改查、基础分布分析(年龄分布、性别分布、地域分布等)、疾病特征分析(症状类型、严重程度、并发症情况)、治疗效果分析(用药方案、疗效评估、复发率统计)、综合关联分析(症状与疾病类型关联、治疗方案与疗效关联)以及可视化大屏展示等模块,帮助医疗工作者从多维度洞察皮肤病的发病规律和治疗趋势,为临床决策提供数据支撑。
三、视频解说
一、个人简介
💖💖作者:计算机编程果茶熊 💙💙个人简介:曾长期从事计算机专业培训教学,担任过编程老师,同时本人也热爱上课教学,擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 计算机毕业设计选题 💕💕文末获取源码联系计算机编程果茶熊
二、系统介绍
大数据框架:Hadoop+Spark(Hive需要定制修改) 开发语言:Java+Python(两个版本都支持) 数据库:MySQL 后端框架:SpringBoot(Spring+SpringMVC+Mybatis)+Django(两个版本都支持) 前端:Vue+Echarts+HTML+CSS+JavaScript+jQuery
本系统基于Hadoop+Spark大数据框架构建了一套全面的皮肤病症状数据可视化分析平台,通过整合医疗数据资源实现对皮肤病患者症状、诊断及治疗过程的深度挖掘与智能分析。系统采用Django作为后端服务框架,结合Vue+ElementUI+Echarts打造交互友好的前端界面,利用HDFS分布式存储海量医疗数据,通过Spark SQL进行高效数据查询与计算,配合Pandas和NumPy完成统计分析任务。在功能设计上,系统提供用户权限管理、皮肤病数据的增删改查、基础分布分析(年龄分布、性别分布、地域分布等)、疾病特征分析(症状类型、严重程度、并发症情况)、治疗效果分析(用药方案、疗效评估、复发率统计)、综合关联分析(症状与疾病类型关联、治疗方案与疗效关联)以及可视化大屏展示等模块,帮助医疗工作者从多维度洞察皮肤病的发病规律和治疗趋势,为临床决策提供数据支撑。
三、视频解说
四、部分功能展示
五、部分代码展示
from pyspark.sql import SparkSession
from pyspark.sql.functions import col,count,avg,sum as spark_sum,when,datediff,current_date,round as spark_round,dense_rank,desc
from pyspark.sql.window import Window
from django.http import JsonResponse
from django.views.decorators.http import require_http_methods
from django.db import connection
import json
import pandas as pd
import numpy as np
spark=SparkSession.builder.appName("SkinDiseaseAnalysis").config("spark.sql.shuffle.partitions","4").config("spark.driver.memory","2g").getOrCreate()
@require_http_methods(["GET"])
def disease_feature_analysis(request):
disease_type=request.GET.get('disease_type','all')
severity_level=request.GET.get('severity','all')
age_range=request.GET.get('age_range','all')
query="SELECT patient_id,disease_name,symptom_type,severity,age,gender,onset_date,complication FROM skin_disease_data WHERE 1=1"
params=[]
if disease_type!='all':
query+=" AND disease_name=%s"
params.append(disease_type)
if severity_level!='all':
query+=" AND severity=%s"
params.append(severity_level)
df_pd=pd.read_sql(query,connection,params=params if params else None)
if df_pd.empty:
return JsonResponse({'status':'error','message':'没有符合条件的数据'},status=404)
df_spark=spark.createDataFrame(df_pd)
if age_range!='all':
age_ranges={'young':(0,18),'middle':(19,45),'elder':(46,100)}
min_age,max_age=age_ranges.get(age_range,(0,100))
df_spark=df_spark.filter((col('age')>=min_age)&(col('age')<=max_age))
symptom_distribution=df_spark.groupBy('symptom_type').agg(count('patient_id').alias('patient_count'),spark_round(avg('severity'),2).alias('avg_severity')).orderBy(desc('patient_count'))
symptom_result=symptom_distribution.collect()
symptom_data={'labels':[row['symptom_type'] for row in symptom_result],'counts':[row['patient_count'] for row in symptom_result],'avg_severity':[float(row['avg_severity']) for row in symptom_result]}
severity_distribution=df_spark.groupBy('severity','gender').agg(count('patient_id').alias('count')).orderBy('severity')
severity_pivot=severity_distribution.groupBy('severity').pivot('gender').agg(spark_sum('count')).na.fill(0)
severity_result=severity_pivot.collect()
severity_data={'levels':[row['severity'] for row in severity_result],'male':[int(row['男']) if '男' in row.asDict() else 0 for row in severity_result],'female':[int(row['女']) if '女' in row.asDict() else 0 for row in severity_result]}
complication_data=df_spark.filter(col('complication').isNotNull()&(col('complication')!='无'))
complication_stats=complication_data.groupBy('complication','disease_name').agg(count('patient_id').alias('count')).orderBy(desc('count')).limit(10)
complication_result=complication_stats.collect()
complication_info=[{'complication':row['complication'],'disease':row['disease_name'],'count':row['count']} for row in complication_result]
age_symptom_corr=df_spark.groupBy('symptom_type').agg(spark_round(avg('age'),1).alias('avg_age'),count('patient_id').alias('total')).filter(col('total')>5).orderBy('avg_age')
age_corr_result=age_symptom_corr.collect()
age_correlation={'symptoms':[row['symptom_type'] for row in age_corr_result],'avg_ages':[float(row['avg_age']) for row in age_corr_result]}
return JsonResponse({'status':'success','data':{'symptom_distribution':symptom_data,'severity_distribution':severity_data,'complication_analysis':complication_info,'age_correlation':age_correlation}})
@require_http_methods(["POST"])
def treatment_effect_analysis(request):
body=json.loads(request.body)
treatment_plan=body.get('treatment_plan','all')
disease_name=body.get('disease_name','all')
date_start=body.get('date_start',None)
date_end=body.get('date_end',None)
query="SELECT t.patient_id,t.treatment_plan,t.medication,t.treatment_start,t.treatment_end,t.effect_score,t.relapse_flag,d.disease_name,d.severity,d.age FROM treatment_records t JOIN skin_disease_data d ON t.patient_id=d.patient_id WHERE 1=1"
params=[]
if treatment_plan!='all':
query+=" AND t.treatment_plan=%s"
params.append(treatment_plan)
if disease_name!='all':
query+=" AND d.disease_name=%s"
params.append(disease_name)
if date_start:
query+=" AND t.treatment_start>=%s"
params.append(date_start)
if date_end:
query+=" AND t.treatment_end<=%s"
params.append(date_end)
df_pd=pd.read_sql(query,connection,params=params if params else None)
if df_pd.empty:
return JsonResponse({'status':'error','message':'未找到治疗记录'},status=404)
df_spark=spark.createDataFrame(df_pd)
df_spark=df_spark.withColumn('treatment_duration',datediff(col('treatment_end'),col('treatment_start')))
plan_effect=df_spark.groupBy('treatment_plan').agg(count('patient_id').alias('total_patients'),spark_round(avg('effect_score'),2).alias('avg_effect'),spark_round(avg('treatment_duration'),1).alias('avg_duration'),spark_sum(when(col('relapse_flag')==1,1).otherwise(0)).alias('relapse_count'))
plan_effect=plan_effect.withColumn('relapse_rate',spark_round((col('relapse_count')/col('total_patients'))*100,2))
plan_result=plan_effect.orderBy(desc('avg_effect')).collect()
plan_data=[{'plan':row['treatment_plan'],'patients':row['total_patients'],'avg_effect':float(row['avg_effect']),'avg_duration':float(row['avg_duration']),'relapse_rate':float(row['relapse_rate'])} for row in plan_result]
medication_effect=df_spark.groupBy('medication','disease_name').agg(count('patient_id').alias('use_count'),spark_round(avg('effect_score'),2).alias('avg_score')).filter(col('use_count')>3)
window_spec=Window.partitionBy('disease_name').orderBy(desc('avg_score'))
medication_ranked=medication_effect.withColumn('rank',dense_rank().over(window_spec)).filter(col('rank')<=5)
medication_result=medication_ranked.collect()
medication_data=[{'disease':row['disease_name'],'medication':row['medication'],'count':row['use_count'],'score':float(row['avg_score']),'rank':row['rank']} for row in medication_result]
severity_effect=df_spark.groupBy('severity','treatment_plan').agg(spark_round(avg('effect_score'),2).alias('avg_effect'),count('patient_id').alias('count')).filter(col('count')>2)
severity_pivot=severity_effect.groupBy('severity').pivot('treatment_plan').agg(spark_round(avg('avg_effect'),2)).na.fill(0)
severity_result=severity_pivot.collect()
severity_plans=list(set([row['treatment_plan'] for row in severity_effect.collect()]))
severity_matrix={'severity_levels':[row['severity'] for row in severity_result],'plans':severity_plans,'matrix':[[float(row[plan]) if plan in row.asDict() else 0 for plan in severity_plans] for row in severity_result]}
age_effect_data=df_spark.withColumn('age_group',when(col('age')<20,'<20岁').when((col('age')>=20)&(col('age')<40),'20-40岁').when((col('age')>=40)&(col('age')<60),'40-60岁').otherwise('60岁以上'))
age_effect_stats=age_effect_data.groupBy('age_group').agg(spark_round(avg('effect_score'),2).alias('avg_effect'),count('patient_id').alias('total'))
age_effect_result=age_effect_stats.orderBy('age_group').collect()
age_effect_info={'age_groups':[row['age_group'] for row in age_effect_result],'avg_effects':[float(row['avg_effect']) for row in age_effect_result],'totals':[row['total'] for row in age_effect_result]}
return JsonResponse({'status':'success','data':{'plan_effectiveness':plan_data,'medication_ranking':medication_data,'severity_effect_matrix':severity_matrix,'age_effect_analysis':age_effect_info}})
@require_http_methods(["POST"])
def comprehensive_correlation_analysis(request):
body=json.loads(request.body)
analysis_type=body.get('type','symptom_disease')
min_support=float(body.get('min_support',0.05))
top_n=int(body.get('top_n',20))
query_disease="SELECT d.patient_id,d.disease_name,d.symptom_type,d.severity,d.age,d.gender,d.complication,t.treatment_plan,t.medication,t.effect_score FROM skin_disease_data d LEFT JOIN treatment_records t ON d.patient_id=t.patient_id"
df_pd=pd.read_sql(query_disease,connection)
if df_pd.empty:
return JsonResponse({'status':'error','message':'数据为空'},status=404)
df_spark=spark.createDataFrame(df_pd)
total_records=df_spark.count()
if analysis_type=='symptom_disease':
symptom_disease=df_spark.groupBy('symptom_type','disease_name').agg(count('patient_id').alias('co_occurrence'))
symptom_disease=symptom_disease.withColumn('support',col('co_occurrence')/total_records)
symptom_disease=symptom_disease.filter(col('support')>=min_support)
symptom_total=df_spark.groupBy('symptom_type').agg(count('patient_id').alias('symptom_total'))
disease_total=df_spark.groupBy('disease_name').agg(count('patient_id').alias('disease_total'))
correlation=symptom_disease.join(symptom_total,'symptom_type').join(disease_total,'disease_name')
correlation=correlation.withColumn('confidence',spark_round(col('co_occurrence')/col('symptom_total'),3))
correlation=correlation.withColumn('lift',spark_round((col('co_occurrence')/total_records)/(col('symptom_total')/total_records*col('disease_total')/total_records),3))
correlation_result=correlation.select('symptom_type','disease_name','co_occurrence','support','confidence','lift').orderBy(desc('lift')).limit(top_n).collect()
result_data=[{'symptom':row['symptom_type'],'disease':row['disease_name'],'count':row['co_occurrence'],'support':float(row['support']),'confidence':float(row['confidence']),'lift':float(row['lift'])} for row in correlation_result]
elif analysis_type=='treatment_effect':
treatment_effect_df=df_spark.filter(col('treatment_plan').isNotNull()&col('effect_score').isNotNull())
treatment_disease=treatment_effect_df.groupBy('treatment_plan','disease_name','severity').agg(spark_round(avg('effect_score'),2).alias('avg_effect'),count('patient_id').alias('sample_count')).filter(col('sample_count')>3)
overall_avg=treatment_effect_df.agg(avg('effect_score')).collect()[0][0]
treatment_disease=treatment_disease.withColumn('effect_diff',spark_round(col('avg_effect')-overall_avg,2))
treatment_result=treatment_disease.orderBy(desc('effect_diff')).limit(top_n).collect()
result_data=[{'treatment':row['treatment_plan'],'disease':row['disease_name'],'severity':row['severity'],'avg_effect':float(row['avg_effect']),'sample_count':row['sample_count'],'effect_diff':float(row['effect_diff'])} for row in treatment_result]
elif analysis_type=='multi_factor':
multi_factor_df=df_spark.filter(col('treatment_plan').isNotNull()&col('effect_score').isNotNull())
multi_factor_df=multi_factor_df.withColumn('age_group',when(col('age')<30,'青年').when((col('age')>=30)&(col('age')<50),'中年').otherwise('老年'))
factor_stats=multi_factor_df.groupBy('disease_name','age_group','gender','severity').agg(spark_round(avg('effect_score'),2).alias('avg_effect'),count('patient_id').alias('count')).filter(col('count')>2)
factor_result=factor_stats.orderBy(desc('avg_effect')).limit(top_n).collect()
result_data=[{'disease':row['disease_name'],'age_group':row['age_group'],'gender':row['gender'],'severity':row['severity'],'avg_effect':float(row['avg_effect']),'count':row['count']} for row in factor_result]
gender_severity=multi_factor_df.groupBy('gender','severity').agg(count('patient_id').alias('count'),spark_round(avg('effect_score'),2).alias('avg_effect'))
gender_severity_result=gender_severity.collect()
gender_severity_data=[{'gender':row['gender'],'severity':row['severity'],'count':row['count'],'avg_effect':float(row['avg_effect'])} for row in gender_severity_result]
result_data={'main_analysis':result_data,'gender_severity_cross':gender_severity_data}
else:
return JsonResponse({'status':'error','message':'不支持的分析类型'},status=400)
return JsonResponse({'status':'success','analysis_type':analysis_type,'data':result_data,'total_records':total_records})
六、部分文档展示
七、END
💕💕文末获取源码联系计算机编程果茶熊)
四、部分功能展示
五、部分代码展示
from pyspark.sql import SparkSession
from pyspark.sql.functions import col,count,avg,sum as spark_sum,when,datediff,current_date,round as spark_round,dense_rank,desc
from pyspark.sql.window import Window
from django.http import JsonResponse
from django.views.decorators.http import require_http_methods
from django.db import connection
import json
import pandas as pd
import numpy as np
spark=SparkSession.builder.appName("SkinDiseaseAnalysis").config("spark.sql.shuffle.partitions","4").config("spark.driver.memory","2g").getOrCreate()
@require_http_methods(["GET"])
def disease_feature_analysis(request):
disease_type=request.GET.get('disease_type','all')
severity_level=request.GET.get('severity','all')
age_range=request.GET.get('age_range','all')
query="SELECT patient_id,disease_name,symptom_type,severity,age,gender,onset_date,complication FROM skin_disease_data WHERE 1=1"
params=[]
if disease_type!='all':
query+=" AND disease_name=%s"
params.append(disease_type)
if severity_level!='all':
query+=" AND severity=%s"
params.append(severity_level)
df_pd=pd.read_sql(query,connection,params=params if params else None)
if df_pd.empty:
return JsonResponse({'status':'error','message':'没有符合条件的数据'},status=404)
df_spark=spark.createDataFrame(df_pd)
if age_range!='all':
age_ranges={'young':(0,18),'middle':(19,45),'elder':(46,100)}
min_age,max_age=age_ranges.get(age_range,(0,100))
df_spark=df_spark.filter((col('age')>=min_age)&(col('age')<=max_age))
symptom_distribution=df_spark.groupBy('symptom_type').agg(count('patient_id').alias('patient_count'),spark_round(avg('severity'),2).alias('avg_severity')).orderBy(desc('patient_count'))
symptom_result=symptom_distribution.collect()
symptom_data={'labels':[row['symptom_type'] for row in symptom_result],'counts':[row['patient_count'] for row in symptom_result],'avg_severity':[float(row['avg_severity']) for row in symptom_result]}
severity_distribution=df_spark.groupBy('severity','gender').agg(count('patient_id').alias('count')).orderBy('severity')
severity_pivot=severity_distribution.groupBy('severity').pivot('gender').agg(spark_sum('count')).na.fill(0)
severity_result=severity_pivot.collect()
severity_data={'levels':[row['severity'] for row in severity_result],'male':[int(row['男']) if '男' in row.asDict() else 0 for row in severity_result],'female':[int(row['女']) if '女' in row.asDict() else 0 for row in severity_result]}
complication_data=df_spark.filter(col('complication').isNotNull()&(col('complication')!='无'))
complication_stats=complication_data.groupBy('complication','disease_name').agg(count('patient_id').alias('count')).orderBy(desc('count')).limit(10)
complication_result=complication_stats.collect()
complication_info=[{'complication':row['complication'],'disease':row['disease_name'],'count':row['count']} for row in complication_result]
age_symptom_corr=df_spark.groupBy('symptom_type').agg(spark_round(avg('age'),1).alias('avg_age'),count('patient_id').alias('total')).filter(col('total')>5).orderBy('avg_age')
age_corr_result=age_symptom_corr.collect()
age_correlation={'symptoms':[row['symptom_type'] for row in age_corr_result],'avg_ages':[float(row['avg_age']) for row in age_corr_result]}
return JsonResponse({'status':'success','data':{'symptom_distribution':symptom_data,'severity_distribution':severity_data,'complication_analysis':complication_info,'age_correlation':age_correlation}})
@require_http_methods(["POST"])
def treatment_effect_analysis(request):
body=json.loads(request.body)
treatment_plan=body.get('treatment_plan','all')
disease_name=body.get('disease_name','all')
date_start=body.get('date_start',None)
date_end=body.get('date_end',None)
query="SELECT t.patient_id,t.treatment_plan,t.medication,t.treatment_start,t.treatment_end,t.effect_score,t.relapse_flag,d.disease_name,d.severity,d.age FROM treatment_records t JOIN skin_disease_data d ON t.patient_id=d.patient_id WHERE 1=1"
params=[]
if treatment_plan!='all':
query+=" AND t.treatment_plan=%s"
params.append(treatment_plan)
if disease_name!='all':
query+=" AND d.disease_name=%s"
params.append(disease_name)
if date_start:
query+=" AND t.treatment_start>=%s"
params.append(date_start)
if date_end:
query+=" AND t.treatment_end<=%s"
params.append(date_end)
df_pd=pd.read_sql(query,connection,params=params if params else None)
if df_pd.empty:
return JsonResponse({'status':'error','message':'未找到治疗记录'},status=404)
df_spark=spark.createDataFrame(df_pd)
df_spark=df_spark.withColumn('treatment_duration',datediff(col('treatment_end'),col('treatment_start')))
plan_effect=df_spark.groupBy('treatment_plan').agg(count('patient_id').alias('total_patients'),spark_round(avg('effect_score'),2).alias('avg_effect'),spark_round(avg('treatment_duration'),1).alias('avg_duration'),spark_sum(when(col('relapse_flag')==1,1).otherwise(0)).alias('relapse_count'))
plan_effect=plan_effect.withColumn('relapse_rate',spark_round((col('relapse_count')/col('total_patients'))*100,2))
plan_result=plan_effect.orderBy(desc('avg_effect')).collect()
plan_data=[{'plan':row['treatment_plan'],'patients':row['total_patients'],'avg_effect':float(row['avg_effect']),'avg_duration':float(row['avg_duration']),'relapse_rate':float(row['relapse_rate'])} for row in plan_result]
medication_effect=df_spark.groupBy('medication','disease_name').agg(count('patient_id').alias('use_count'),spark_round(avg('effect_score'),2).alias('avg_score')).filter(col('use_count')>3)
window_spec=Window.partitionBy('disease_name').orderBy(desc('avg_score'))
medication_ranked=medication_effect.withColumn('rank',dense_rank().over(window_spec)).filter(col('rank')<=5)
medication_result=medication_ranked.collect()
medication_data=[{'disease':row['disease_name'],'medication':row['medication'],'count':row['use_count'],'score':float(row['avg_score']),'rank':row['rank']} for row in medication_result]
severity_effect=df_spark.groupBy('severity','treatment_plan').agg(spark_round(avg('effect_score'),2).alias('avg_effect'),count('patient_id').alias('count')).filter(col('count')>2)
severity_pivot=severity_effect.groupBy('severity').pivot('treatment_plan').agg(spark_round(avg('avg_effect'),2)).na.fill(0)
severity_result=severity_pivot.collect()
severity_plans=list(set([row['treatment_plan'] for row in severity_effect.collect()]))
severity_matrix={'severity_levels':[row['severity'] for row in severity_result],'plans':severity_plans,'matrix':[[float(row[plan]) if plan in row.asDict() else 0 for plan in severity_plans] for row in severity_result]}
age_effect_data=df_spark.withColumn('age_group',when(col('age')<20,'<20岁').when((col('age')>=20)&(col('age')<40),'20-40岁').when((col('age')>=40)&(col('age')<60),'40-60岁').otherwise('60岁以上'))
age_effect_stats=age_effect_data.groupBy('age_group').agg(spark_round(avg('effect_score'),2).alias('avg_effect'),count('patient_id').alias('total'))
age_effect_result=age_effect_stats.orderBy('age_group').collect()
age_effect_info={'age_groups':[row['age_group'] for row in age_effect_result],'avg_effects':[float(row['avg_effect']) for row in age_effect_result],'totals':[row['total'] for row in age_effect_result]}
return JsonResponse({'status':'success','data':{'plan_effectiveness':plan_data,'medication_ranking':medication_data,'severity_effect_matrix':severity_matrix,'age_effect_analysis':age_effect_info}})
@require_http_methods(["POST"])
def comprehensive_correlation_analysis(request):
body=json.loads(request.body)
analysis_type=body.get('type','symptom_disease')
min_support=float(body.get('min_support',0.05))
top_n=int(body.get('top_n',20))
query_disease="SELECT d.patient_id,d.disease_name,d.symptom_type,d.severity,d.age,d.gender,d.complication,t.treatment_plan,t.medication,t.effect_score FROM skin_disease_data d LEFT JOIN treatment_records t ON d.patient_id=t.patient_id"
df_pd=pd.read_sql(query_disease,connection)
if df_pd.empty:
return JsonResponse({'status':'error','message':'数据为空'},status=404)
df_spark=spark.createDataFrame(df_pd)
total_records=df_spark.count()
if analysis_type=='symptom_disease':
symptom_disease=df_spark.groupBy('symptom_type','disease_name').agg(count('patient_id').alias('co_occurrence'))
symptom_disease=symptom_disease.withColumn('support',col('co_occurrence')/total_records)
symptom_disease=symptom_disease.filter(col('support')>=min_support)
symptom_total=df_spark.groupBy('symptom_type').agg(count('patient_id').alias('symptom_total'))
disease_total=df_spark.groupBy('disease_name').agg(count('patient_id').alias('disease_total'))
correlation=symptom_disease.join(symptom_total,'symptom_type').join(disease_total,'disease_name')
correlation=correlation.withColumn('confidence',spark_round(col('co_occurrence')/col('symptom_total'),3))
correlation=correlation.withColumn('lift',spark_round((col('co_occurrence')/total_records)/(col('symptom_total')/total_records*col('disease_total')/total_records),3))
correlation_result=correlation.select('symptom_type','disease_name','co_occurrence','support','confidence','lift').orderBy(desc('lift')).limit(top_n).collect()
result_data=[{'symptom':row['symptom_type'],'disease':row['disease_name'],'count':row['co_occurrence'],'support':float(row['support']),'confidence':float(row['confidence']),'lift':float(row['lift'])} for row in correlation_result]
elif analysis_type=='treatment_effect':
treatment_effect_df=df_spark.filter(col('treatment_plan').isNotNull()&col('effect_score').isNotNull())
treatment_disease=treatment_effect_df.groupBy('treatment_plan','disease_name','severity').agg(spark_round(avg('effect_score'),2).alias('avg_effect'),count('patient_id').alias('sample_count')).filter(col('sample_count')>3)
overall_avg=treatment_effect_df.agg(avg('effect_score')).collect()[0][0]
treatment_disease=treatment_disease.withColumn('effect_diff',spark_round(col('avg_effect')-overall_avg,2))
treatment_result=treatment_disease.orderBy(desc('effect_diff')).limit(top_n).collect()
result_data=[{'treatment':row['treatment_plan'],'disease':row['disease_name'],'severity':row['severity'],'avg_effect':float(row['avg_effect']),'sample_count':row['sample_count'],'effect_diff':float(row['effect_diff'])} for row in treatment_result]
elif analysis_type=='multi_factor':
multi_factor_df=df_spark.filter(col('treatment_plan').isNotNull()&col('effect_score').isNotNull())
multi_factor_df=multi_factor_df.withColumn('age_group',when(col('age')<30,'青年').when((col('age')>=30)&(col('age')<50),'中年').otherwise('老年'))
factor_stats=multi_factor_df.groupBy('disease_name','age_group','gender','severity').agg(spark_round(avg('effect_score'),2).alias('avg_effect'),count('patient_id').alias('count')).filter(col('count')>2)
factor_result=factor_stats.orderBy(desc('avg_effect')).limit(top_n).collect()
result_data=[{'disease':row['disease_name'],'age_group':row['age_group'],'gender':row['gender'],'severity':row['severity'],'avg_effect':float(row['avg_effect']),'count':row['count']} for row in factor_result]
gender_severity=multi_factor_df.groupBy('gender','severity').agg(count('patient_id').alias('count'),spark_round(avg('effect_score'),2).alias('avg_effect'))
gender_severity_result=gender_severity.collect()
gender_severity_data=[{'gender':row['gender'],'severity':row['severity'],'count':row['count'],'avg_effect':float(row['avg_effect'])} for row in gender_severity_result]
result_data={'main_analysis':result_data,'gender_severity_cross':gender_severity_data}
else:
return JsonResponse({'status':'error','message':'不支持的分析类型'},status=400)
return JsonResponse({'status':'success','analysis_type':analysis_type,'data':result_data,'total_records':total_records})
六、部分文档展示
七、END
💕💕文末获取源码联系计算机编程果茶熊