同样是大数据分析,为什么宫颈癌风险系统最受导师青睐?答案让人意外

55 阅读7分钟

💖💖作者:计算机编程小咖 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目

@TOC

宫颈癌风险因素分析与可视化系统介绍

《基于大数据的宫颈癌风险因素分析与可视化系统》是一套融合现代大数据技术与医疗数据分析的综合性应用系统,该系统采用Hadoop分布式存储架构结合Spark大数据处理引擎,实现对宫颈癌相关风险因素的深度挖掘与智能分析。系统后端基于Python语言开发,运用Django框架构建稳定的服务层,同时集成Pandas、NumPy等专业数据科学库进行高效的数据预处理与统计计算,前端采用Vue.js框架配合ElementUI组件库打造直观友好的用户界面,通过Echarts图表库实现数据的多维度可视化展现。系统核心功能涵盖宫颈癌风险数据的全生命周期管理,包括用户权限控制、个人信息维护、大屏数据可视化展示等基础模块,以及人口学与生活方式分析、患者风险画像构建、筛查方法验证分析、性行为及STDs关联性分析等专业医疗数据分析模块。通过Spark SQL进行复杂查询处理,结合HDFS分布式文件系统保障海量医疗数据的安全存储与高速访问,系统能够从多个维度深入分析影响宫颈癌发病的关键风险因素,为医疗研究人员和临床医生提供科学的数据支撑和决策参考,充分体现了大数据技术在精准医疗领域的实际应用价值与技术优势。

宫颈癌风险因素分析与可视化系统演示视频

演示视频

宫颈癌风险因素分析与可视化系统演示图片

登陆界面.png

宫颈癌风险数据.png

患者风险画像分析.png

人口统计与生活方式分析.png

筛查方法验证分析.png

数据大屏.png

性行为及stds分析.png

用户管理.png

宫颈癌风险因素分析与可视化系统代码展示

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, avg, sum, when, desc, asc
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.stat import Correlation
import pandas as pd
import numpy as np
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
import json

spark = SparkSession.builder.appName("CervicalCancerRiskAnalysis").config("spark.some.config.option", "some-value").getOrCreate()

def cervical_cancer_risk_data_analysis(request):
   if request.method == 'POST':
       data = json.loads(request.body)
       age_range = data.get('age_range', [18, 80])
       lifestyle_factors = data.get('lifestyle_factors', [])
       risk_level = data.get('risk_level', 'all')
       df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/cervical_cancer_db").option("dbtable", "cervical_cancer_data").option("user", "root").option("password", "password").load()
       filtered_df = df.filter((col("age") >= age_range[0]) & (col("age") <= age_range[1]))
       if lifestyle_factors:
           for factor in lifestyle_factors:
               filtered_df = filtered_df.filter(col(factor) == 1)
       if risk_level != 'all':
           filtered_df = filtered_df.filter(col("risk_level") == risk_level)
       risk_stats = filtered_df.groupBy("risk_level").agg(count("*").alias("count"), avg("age").alias("avg_age"), avg("num_of_pregnancies").alias("avg_pregnancies"))
       lifestyle_correlation = filtered_df.select("smoking", "hormonal_contraceptives", "iud", "stds", "dx_cancer", "dx_cin", "dx_hpv").toPandas()
       correlation_matrix = lifestyle_correlation.corr().to_dict()
       age_distribution = filtered_df.groupBy("age_group").count().orderBy("age_group").collect()
       pregnancy_risk = filtered_df.groupBy("num_of_pregnancies").agg(count("*").alias("total"), sum(when(col("dx_cancer") == 1, 1).otherwise(0)).alias("cancer_cases")).collect()
       screening_effectiveness = filtered_df.groupBy("screening_method").agg(count("*").alias("total_screened"), sum(when(col("dx_cancer") == 1, 1).otherwise(0)).alias("detected_cases")).collect()
       result_data = {"risk_statistics": [row.asDict() for row in risk_stats.collect()], "correlation_matrix": correlation_matrix, "age_distribution": [{"age_group": row["age_group"], "count": row["count"]} for row in age_distribution], "pregnancy_risk_analysis": [{"pregnancies": row["num_of_pregnancies"], "total": row["total"], "cancer_cases": row["cancer_cases"]} for row in pregnancy_risk], "screening_analysis": [{"method": row["screening_method"], "total": row["total_screened"], "detected": row["detected_cases"]} for row in screening_effectiveness]}
       return JsonResponse({"status": "success", "data": result_data})

def patient_risk_profile_analysis(request):
   if request.method == 'POST':
       data = json.loads(request.body)
       patient_id = data.get('patient_id')
       analysis_type = data.get('analysis_type', 'comprehensive')
       df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/cervical_cancer_db").option("dbtable", "patient_profiles").option("user", "root").option("password", "password").load()
       patient_data = df.filter(col("patient_id") == patient_id)
       if patient_data.count() == 0:
           return JsonResponse({"status": "error", "message": "患者数据不存在"})
       patient_info = patient_data.collect()[0].asDict()
       similar_patients = df.filter((col("age") >= patient_info['age'] - 5) & (col("age") <= patient_info['age'] + 5) & (col("smoking") == patient_info['smoking']) & (col("hormonal_contraceptives") == patient_info['hormonal_contraceptives']))
       risk_factors = []
       if patient_info['smoking'] == 1:
           risk_factors.append({"factor": "吸烟", "risk_score": 0.25, "description": "吸烟显著增加宫颈癌风险"})
       if patient_info['hormonal_contraceptives'] == 1 and patient_info['hormonal_contraceptives_years'] > 5:
           risk_factors.append({"factor": "长期激素避孕", "risk_score": 0.20, "description": "长期使用激素避孕药增加风险"})
       if patient_info['num_of_pregnancies'] > 3:
           risk_factors.append({"factor": "多次妊娠", "risk_score": 0.15, "description": "多次妊娠史关联宫颈癌风险"})
       if patient_info['stds'] == 1:
           risk_factors.append({"factor": "性传播疾病史", "risk_score": 0.30, "description": "STDs病史显著提升患癌风险"})
       total_risk_score = sum([factor['risk_score'] for factor in risk_factors])
       similar_analysis = similar_patients.agg(count("*").alias("similar_count"), avg("dx_cancer").alias("cancer_rate"), avg("dx_cin").alias("cin_rate")).collect()[0]
       age_group_stats = df.filter((col("age") >= patient_info['age'] - 10) & (col("age") <= patient_info['age'] + 10)).agg(avg("dx_cancer").alias("age_group_cancer_rate")).collect()[0]
       recommendation_list = []
       if total_risk_score > 0.5:
           recommendation_list.append("建议每年进行宫颈癌筛查")
           recommendation_list.append("戒烟并避免被动吸烟")
       elif total_risk_score > 0.3:
           recommendation_list.append("建议每两年进行一次筛查")
       else:
           recommendation_list.append("按常规筛查周期即可")
       profile_result = {"patient_basic_info": patient_info, "risk_factors": risk_factors, "total_risk_score": round(total_risk_score, 3), "similar_patients_analysis": {"count": similar_analysis["similar_count"], "cancer_rate": round(similar_analysis["cancer_rate"], 3), "cin_rate": round(similar_analysis["cin_rate"], 3)}, "age_group_comparison": {"cancer_rate": round(age_group_stats["age_group_cancer_rate"], 3)}, "recommendations": recommendation_list}
       return JsonResponse({"status": "success", "data": profile_result})

def screening_method_validation_analysis(request):
   if request.method == 'POST':
       data = json.loads(request.body)
       screening_methods = data.get('screening_methods', ['cytology', 'hpv_test', 'colposcopy'])
       time_period = data.get('time_period', '2020-2024')
       validation_type = data.get('validation_type', 'effectiveness')
       df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/cervical_cancer_db").option("dbtable", "screening_records").option("user", "root").option("password", "password").load()
       filtered_df = df.filter(col("screening_date").between(time_period.split('-')[0] + '-01-01', time_period.split('-')[1] + '-12-31'))
       method_effectiveness = {}
       for method in screening_methods:
           method_data = filtered_df.filter(col("screening_method") == method)
           total_screenings = method_data.count()
           true_positives = method_data.filter((col("screening_result") == "positive") & (col("confirmed_diagnosis") == "cancer")).count()
           false_positives = method_data.filter((col("screening_result") == "positive") & (col("confirmed_diagnosis") == "normal")).count()
           true_negatives = method_data.filter((col("screening_result") == "negative") & (col("confirmed_diagnosis") == "normal")).count()
           false_negatives = method_data.filter((col("screening_result") == "negative") & (col("confirmed_diagnosis") == "cancer")).count()
           sensitivity = true_positives / (true_positives + false_negatives) if (true_positives + false_negatives) > 0 else 0
           specificity = true_negatives / (true_negatives + false_positives) if (true_negatives + false_positives) > 0 else 0
           positive_predictive_value = true_positives / (true_positives + false_positives) if (true_positives + false_positives) > 0 else 0
           negative_predictive_value = true_negatives / (true_negatives + false_negatives) if (true_negatives + false_negatives) > 0 else 0
           accuracy = (true_positives + true_negatives) / total_screenings if total_screenings > 0 else 0
           cost_per_screening = method_data.agg(avg("cost")).collect()[0][0] if method_data.count() > 0 else 0
           detection_rate_by_age = method_data.groupBy("age_group").agg(count("*").alias("total"), sum(when(col("screening_result") == "positive", 1).otherwise(0)).alias("positive_cases")).collect()
           method_effectiveness[method] = {"total_screenings": total_screenings, "sensitivity": round(sensitivity, 3), "specificity": round(specificity, 3), "positive_predictive_value": round(positive_predictive_value, 3), "negative_predictive_value": round(negative_predictive_value, 3), "accuracy": round(accuracy, 3), "cost_per_screening": round(cost_per_screening, 2), "age_group_detection": [{"age_group": row["age_group"], "total": row["total"], "positive_rate": round(row["positive_cases"]/row["total"], 3)} for row in detection_rate_by_age]}
       comparative_analysis = []
       methods_list = list(method_effectiveness.keys())
       for i in range(len(methods_list)):
           for j in range(i+1, len(methods_list)):
               method1, method2 = methods_list[i], methods_list[j]
               comparison = {"methods": f"{method1} vs {method2}", "sensitivity_diff": round(method_effectiveness[method1]["sensitivity"] - method_effectiveness[method2]["sensitivity"], 3), "specificity_diff": round(method_effectiveness[method1]["specificity"] - method_effectiveness[method2]["specificity"], 3), "cost_diff": round(method_effectiveness[method1]["cost_per_screening"] - method_effectiveness[method2]["cost_per_screening"], 2)}
               comparative_analysis.append(comparison)
       validation_result = {"method_effectiveness": method_effectiveness, "comparative_analysis": comparative_analysis, "overall_statistics": {"total_screenings": sum([method_effectiveness[method]["total_screenings"] for method in method_effectiveness]), "average_accuracy": round(sum([method_effectiveness[method]["accuracy"] for method in method_effectiveness]) / len(method_effectiveness), 3)}}
       return JsonResponse({"status": "success", "data": validation_result})

宫颈癌风险因素分析与可视化系统文档展示

文档.png

💖💖作者:计算机编程小咖 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目