毕设选医疗数据分析真的好做吗?Spark+Python构建肝炎患者系统的真实体验

53 阅读5分钟

一、个人简介

💖💖作者:计算机编程果茶熊 💙💙个人简介:曾长期从事计算机专业培训教学,担任过编程老师,同时本人也热爱上课教学,擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 计算机毕业设计选题 💕💕文末获取源码联系计算机编程果茶熊

二、系统介绍

大数据框架:Hadoop+Spark(Hive需要定制修改) 开发语言:Java+Python(两个版本都支持) 数据库:MySQL 后端框架:SpringBoot(Spring+SpringMVC+Mybatis)+Django(两个版本都支持) 前端:Vue+Echarts+HTML+CSS+JavaScript+jQuery

《丙型肝炎患者数据可视化分析系统》是一套基于Hadoop+Spark大数据框架构建的医疗数据分析平台,采用Python+Django后端架构,前端使用Vue+ElementUI+Echarts技术栈实现交互式数据可视化。系统整合了用户管理、丙型肝炎患者数据管理、生化指标关联性分析、生化指标分析、临床应用价值分析、疾病进展分析、患者基本特征分析、患者群体特征分析以及大屏可视化等九大核心功能模块。通过HDFS分布式存储患者医疗数据,利用Spark SQL进行高效的数据查询和处理,结合Pandas、NumPy等数据科学库完成复杂的统计分析计算,最终通过Echarts图表库将分析结果以直观的可视化形式展现给医护人员和研究者,为丙型肝炎的临床诊疗和科研工作提供数据支撑和决策参考。

三、丙型肝炎患者数据可视化分析系统-视频解说

毕设选医疗数据分析真的好做吗?Spark+Python构建肝炎患者系统的真实体验

四、丙型肝炎患者数据可视化分析系统-功能展示

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

五、丙型肝炎患者数据可视化分析系统-代码展示


from pyspark.sql import SparkSession
from pyspark.sql.functions import col, avg, count, max, min, stddev, corr, when, desc
import pandas as pd
import numpy as np
from django.http import JsonResponse
from django.views import View
import json

spark = SparkSession.builder.appName("HepatitisDataAnalysis").config("spark.sql.adaptive.enabled", "true").getOrCreate()

class BiochemicalCorrelationAnalysis(View):
    def post(self, request):
        data = json.loads(request.body)
        patient_ids = data.get('patient_ids', [])
        indicators = data.get('indicators', ['ALT', 'AST', 'TBIL', 'HCV_RNA'])
        df = spark.sql(f"SELECT patient_id, ALT, AST, TBIL, HCV_RNA, collection_date FROM biochemical_data WHERE patient_id IN ({','.join(map(str, patient_ids))})")
        correlation_results = {}
        for i, indicator1 in enumerate(indicators):
            for j, indicator2 in enumerate(indicators[i+1:], i+1):
                correlation_value = df.select(corr(col(indicator1), col(indicator2)).alias('correlation')).collect()[0]['correlation']
                correlation_results[f"{indicator1}_{indicator2}"] = round(correlation_value, 4) if correlation_value else 0
        significance_analysis = df.groupBy().agg(
            avg(col('ALT')).alias('avg_alt'),
            stddev(col('ALT')).alias('std_alt'),
            avg(col('AST')).alias('avg_ast'),
            stddev(col('AST')).alias('std_ast')
        ).collect()[0]
        abnormal_threshold = {
            'ALT': significance_analysis['avg_alt'] + 2 * significance_analysis['std_alt'],
            'AST': significance_analysis['avg_ast'] + 2 * significance_analysis['std_ast']
        }
        risk_patients = df.filter(
            (col('ALT') > abnormal_threshold['ALT']) | 
            (col('AST') > abnormal_threshold['AST'])
        ).select('patient_id').distinct().count()
        pattern_analysis = df.groupBy(
            when(col('HCV_RNA') > 1000, 'High').when(col('HCV_RNA') > 100, 'Medium').otherwise('Low').alias('viral_load_level')
        ).agg(
            avg('ALT').alias('avg_alt'),
            avg('AST').alias('avg_ast'),
            count('patient_id').alias('patient_count')
        ).orderBy(desc('patient_count')).collect()
        return JsonResponse({
            'correlation_matrix': correlation_results,
            'risk_patient_count': risk_patients,
            'pattern_analysis': [row.asDict() for row in pattern_analysis]
        })

class DiseaseProgressionAnalysis(View):
    def post(self, request):
        data = json.loads(request.body)
        patient_id = data.get('patient_id')
        time_range = data.get('time_range', 12)
        progression_df = spark.sql(f"""
            SELECT patient_id, collection_date, ALT, AST, TBIL, HCV_RNA, 
                   ROW_NUMBER() OVER (PARTITION BY patient_id ORDER BY collection_date) as visit_sequence
            FROM biochemical_data 
            WHERE patient_id = {patient_id} 
            AND collection_date >= DATE_SUB(CURRENT_DATE(), {time_range * 30})
            ORDER BY collection_date
        """)
        progression_trend = progression_df.select(
            'visit_sequence', 'ALT', 'AST', 'TBIL', 'HCV_RNA'
        ).collect()
        baseline_values = progression_df.filter(col('visit_sequence') == 1).select(
            'ALT', 'AST', 'TBIL', 'HCV_RNA'
        ).collect()[0] if progression_df.count() > 0 else None
        if baseline_values:
            latest_values = progression_df.orderBy(desc('visit_sequence')).select(
                'ALT', 'AST', 'TBIL', 'HCV_RNA'
            ).first()
            improvement_rate = {
                'ALT_change': ((latest_values['ALT'] - baseline_values['ALT']) / baseline_values['ALT'] * 100) if baseline_values['ALT'] > 0 else 0,
                'AST_change': ((latest_values['AST'] - baseline_values['AST']) / baseline_values['AST'] * 100) if baseline_values['AST'] > 0 else 0,
                'HCV_RNA_change': ((latest_values['HCV_RNA'] - baseline_values['HCV_RNA']) / baseline_values['HCV_RNA'] * 100) if baseline_values['HCV_RNA'] > 0 else 0
            }
        stage_classification = spark.sql(f"""
            SELECT 
                CASE 
                    WHEN ALT <= 40 AND AST <= 40 AND HCV_RNA < 100 THEN 'Stable'
                    WHEN ALT > 80 OR AST > 80 OR HCV_RNA > 10000 THEN 'Severe'
                    ELSE 'Moderate'
                END as disease_stage,
                collection_date,
                visit_sequence
            FROM biochemical_data 
            WHERE patient_id = {patient_id}
            ORDER BY collection_date DESC
            LIMIT 5
        """).collect()
        treatment_response = 'Positive' if improvement_rate.get('HCV_RNA_change', 0) < -50 else 'Limited' if improvement_rate.get('HCV_RNA_change', 0) < 0 else 'Poor'
        return JsonResponse({
            'progression_data': [row.asDict() for row in progression_trend],
            'improvement_rate': improvement_rate,
            'stage_progression': [row.asDict() for row in stage_classification],
            'treatment_response': treatment_response
        })

class PatientGroupCharacteristics(View):
    def post(self, request):
        data = json.loads(request.body)
        analysis_type = data.get('analysis_type', 'demographic')
        group_filters = data.get('filters', {})
        base_query = "SELECT * FROM patient_data p JOIN biochemical_data b ON p.patient_id = b.patient_id"
        if group_filters:
            filter_conditions = []
            for key, value in group_filters.items():
                if key == 'age_range':
                    filter_conditions.append(f"p.age BETWEEN {value[0]} AND {value[1]}")
                elif key == 'gender':
                    filter_conditions.append(f"p.gender = '{value}'")
            if filter_conditions:
                base_query += " WHERE " + " AND ".join(filter_conditions)
        group_df = spark.sql(base_query)
        if analysis_type == 'demographic':
            demographic_stats = group_df.groupBy('gender').agg(
                count('p.patient_id').alias('patient_count'),
                avg('age').alias('avg_age'),
                avg('ALT').alias('avg_alt'),
                avg('AST').alias('avg_ast')
            ).collect()
            age_distribution = group_df.groupBy(
                when(col('age') < 30, 'Young').when(col('age') < 50, 'Middle').otherwise('Senior').alias('age_group')
            ).agg(count('p.patient_id').alias('count')).collect()
        elif analysis_type == 'clinical':
            severity_distribution = group_df.groupBy(
                when((col('ALT') > 80) | (col('AST') > 80), 'High Risk')
                .when((col('ALT') > 40) | (col('AST') > 40), 'Medium Risk')
                .otherwise('Low Risk').alias('risk_level')
            ).agg(count('p.patient_id').alias('patient_count')).collect()
            treatment_effectiveness = group_df.filter(col('HCV_RNA') < 100).count()
            total_patients = group_df.count()
            response_rate = (treatment_effectiveness / total_patients * 100) if total_patients > 0 else 0
        geographical_distribution = group_df.groupBy('region').agg(
            count('p.patient_id').alias('patient_count'),
            avg('HCV_RNA').alias('avg_viral_load')
        ).orderBy(desc('patient_count')).collect()
        comorbidity_analysis = group_df.filter(col('comorbidity').isNotNull()).groupBy('comorbidity').agg(
            count('p.patient_id').alias('count')
        ).collect()
        return JsonResponse({
            'demographic_analysis': [row.asDict() for row in demographic_stats] if analysis_type == 'demographic' else [],
            'age_distribution': [row.asDict() for row in age_distribution] if analysis_type == 'demographic' else [],
            'severity_distribution': [row.asDict() for row in severity_distribution] if analysis_type == 'clinical' else [],
            'treatment_response_rate': response_rate if analysis_type == 'clinical' else 0,
            'geographical_stats': [row.asDict() for row in geographical_distribution],
            'comorbidity_patterns': [row.asDict() for row in comorbidity_analysis]
        })

六、丙型肝炎患者数据可视化分析系统-文档展示

在这里插入图片描述

七、END

💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 计算机毕业设计选题 💕💕文末获取源码联系计算机编程果茶熊