什么样的大数据毕设最容易通过答辩?帕金森病数据可视化分析系统告诉你答案 毕业设计/选题推荐/毕设选题/数据分析

51 阅读10分钟

计算机编程指导师

⭐⭐个人介绍:自己非常喜欢研究技术问题!专业做Java、Python、小程序、安卓、大数据、爬虫、Golang、大屏、爬虫、深度学习、机器学习、预测等实战项目。

⛽⛽实战项目:有源码或者技术上的问题欢迎在评论区一起讨论交流!

⚡⚡如果遇到具体的技术问题或计算机毕设方面需求,你也可以在主页上咨询我~~

⚡⚡获取源码主页--> space.bilibili.com/35463818075…

帕金森病数据可视化分析系统- 简介

基于Spark+Django的帕金森病数据可视化分析系统是一个集成了大数据处理技术和Web开发框架的医疗数据分析平台,专门用于处理和分析帕金森病患者的语音特征数据。该系统采用Hadoop分布式文件系统存储海量医疗数据,利用Spark强大的内存计算能力进行快速数据处理和分析,通过Django框架构建稳定的后端服务,前端采用Vue+ElementUI+Echarts技术栈实现直观的数据可视化展示。系统核心功能涵盖四个主要分析维度:数据集总体健康度与患者群体画像分析、帕金森病核心语音特征差异性对比分析、特征关联性挖掘与关键指标识别、非线性动力学特征深度探索。通过对22个语音指标的深度挖掘,系统能够从音高特征、频率微扰、振幅微扰、嗓音质量等多个角度分析帕金森病患者与健康人群的语音差异,同时运用RPDE、DFA、D2、PPE等非线性动力学指标进行更深层次的特征探索。系统支持MySQL数据库存储,具备完整的数据导入、清洗、分析、可视化展示功能,为医疗研究人员提供了一个便捷高效的帕金森病语音特征分析工具。

帕金森病数据可视化分析系统-技术 框架

开发语言:Python或Java(两个版本都支持)

大数据框架:Hadoop+Spark(本次没用Hive,支持定制)

后端框架:Django+Spring Boot(Spring+SpringMVC+Mybatis)(两个版本都支持)

前端:Vue+ElementUI+Echarts+HTML+CSS+JavaScript+jQuery

详细技术点:Hadoop、HDFS、Spark、Spark SQL、Pandas、NumPy

数据库:MySQL 

帕金森病数据可视化分析系统- 背景

随着人口老龄化进程的加速,帕金森病作为一种常见的神经退行性疾病,其患病率呈现逐年上升的趋势,给患者家庭和社会医疗系统带来了巨大负担。传统的帕金森病诊断主要依赖于临床医生的经验判断和症状观察,这种方式不仅主观性强,而且在疾病早期阶段往往难以准确识别。近年来,医学研究发现帕金森病患者在语音方面会表现出特殊的病理特征,包括音调单一、语速变慢、发音不清等症状,这些语音变化往往在运动症状出现之前就已经存在。语音分析作为一种非侵入性、成本低廉、操作简便的检测方法,为帕金森病的早期筛查和辅助诊断提供了新的思路。目前医疗机构在处理这类语音数据时,往往面临数据量大、分析维度多、计算复杂度高等挑战,传统的单机处理方式已经无法满足大规模医疗数据分析的需求,迫切需要借助大数据技术来提升数据处理效率和分析精度。

本课题的研究具有重要的实际应用价值和学术意义。从技术角度来看,该系统将大数据处理技术与医疗数据分析相结合,验证了Spark分布式计算框架在医疗领域的应用可行性,为后续类似的医疗大数据项目提供了技术参考和实践经验。通过对22个语音特征指标的深度分析,系统能够帮助医疗研究人员更好地理解帕金森病患者语音特征的变化规律,为疾病的早期发现和监测提供数据支持。从社会价值角度来看,该系统为医疗机构提供了一个相对便捷的帕金森病辅助诊断工具,虽然不能替代专业医生的诊断,但可以作为临床决策的重要参考依据,特别是在医疗资源相对匮乏的地区,能够在一定程度上提高疾病筛查的覆盖面和效率。同时,系统的可视化功能使得复杂的语音特征数据能够以直观的图表形式呈现,便于医生和研究人员理解和分析,提升了医疗数据的可读性和实用性。作为一个毕业设计项目,该系统也体现了计算机技术在解决实际社会问题方面的应用潜力。

 

帕金森病数据可视化分析系统-视频展示

www.bilibili.com/video/BV128…  

帕金森病数据可视化分析系统-图片展示

  登录.png

多维特征关联分析.png

非线性动力学分析.png

封面.png

帕金森病数据管理.png

数据大屏上.png

数据大屏下.png

数据集总体分析.png

用户.png

语音声学特征分析.png

帕金森病数据可视化分析系统-代码展示

from pyspark.sql.functions import col, avg, stddev, corr, when, count, max as spark_max, min as spark_min
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from django.http import JsonResponse
from django.views import View
import pandas as pd
import numpy as np
import json

spark = SparkSession.builder.appName("ParkinsonDataAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.serializer", "org.apache.spark.serializer.KryoSerializer").getOrCreate()

class PatientGroupProfileAnalysis(View):
    def post(self, request):
        df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/parkinson_db").option("dbtable", "voice_features").option("user", "root").option("password", "password").option("driver", "com.mysql.cj.jdbc.Driver").load()
        patient_data = df.filter(col("status") == 1)
        healthy_data = df.filter(col("status") == 0)
        patient_count = patient_data.count()
        healthy_count = healthy_data.count()
        total_count = df.count()
        balance_ratio = min(patient_count, healthy_count) / max(patient_count, healthy_count)
        voice_columns = ["MDVP_Fo_Hz", "MDVP_Fhi_Hz", "MDVP_Flo_Hz", "MDVP_Jitter_percent", "MDVP_Jitter_Abs", "MDVP_RAP", "MDVP_PPQ", "Jitter_DDP", "MDVP_Shimmer", "MDVP_Shimmer_dB", "Shimmer_APQ3", "Shimmer_APQ5", "MDVP_APQ", "NHR", "HNR", "RPDE", "DFA", "spread1", "spread2", "D2", "PPE"]
        patient_stats = patient_data.select([avg(col(c)).alias(f"{c}_mean") for c in voice_columns] + [stddev(col(c)).alias(f"{c}_std") for c in voice_columns]).collect()[0].asDict()
        healthy_stats = healthy_data.select([avg(col(c)).alias(f"{c}_mean") for c in voice_columns] + [stddev(col(c)).alias(f"{c}_std") for c in voice_columns]).collect()[0].asDict()
        overall_stats = df.select([avg(col(c)).alias(f"{c}_mean") for c in voice_columns] + [stddev(col(c)).alias(f"{c}_std") for c in voice_columns] + [spark_max(col(c)).alias(f"{c}_max") for c in voice_columns] + [spark_min(col(c)).alias(f"{c}_min") for c in voice_columns]).collect()[0].asDict()
        variance_comparison = {}
        for col_name in voice_columns:
            patient_var = patient_stats.get(f"{col_name}_std", 0) ** 2
            healthy_var = healthy_stats.get(f"{col_name}_std", 0) ** 2
            variance_ratio = patient_var / healthy_var if healthy_var > 0 else float('inf')
            variance_comparison[col_name] = {"patient_variance": patient_var, "healthy_variance": healthy_var, "variance_ratio": variance_ratio}
        key_indicators_dispersion = {}
        key_indicators = ["spread1", "PPE", "MDVP_Fo_Hz", "MDVP_Jitter_percent", "MDVP_Shimmer"]
        for indicator in key_indicators:
            patient_cv = (patient_stats.get(f"{indicator}_std", 0) / patient_stats.get(f"{indicator}_mean", 1)) * 100
            healthy_cv = (healthy_stats.get(f"{indicator}_std", 0) / healthy_stats.get(f"{indicator}_mean", 1)) * 100
            cv_ratio = patient_cv / healthy_cv if healthy_cv > 0 else float('inf')
            key_indicators_dispersion[indicator] = {"patient_cv": patient_cv, "healthy_cv": healthy_cv, "cv_ratio": cv_ratio}
        result_data = {"sample_balance": {"patient_count": patient_count, "healthy_count": healthy_count, "total_count": total_count, "balance_ratio": balance_ratio}, "patient_profile": patient_stats, "healthy_profile": healthy_stats, "overall_statistics": overall_stats, "variance_comparison": variance_comparison, "key_indicators_dispersion": key_indicators_dispersion}
        return JsonResponse({"status": "success", "data": result_data})

class VoiceFeatureDifferenceAnalysis(View):
    def post(self, request):
        df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/parkinson_db").option("dbtable", "voice_features").option("user", "root").option("password", "password").option("driver", "com.mysql.cj.jdbc.Driver").load()
        patient_data = df.filter(col("status") == 1)
        healthy_data = df.filter(col("status") == 0)
        pitch_features = ["MDVP_Fo_Hz", "MDVP_Fhi_Hz", "MDVP_Flo_Hz"]
        pitch_analysis = {}
        for feature in pitch_features:
            patient_avg = patient_data.select(avg(col(feature))).collect()[0][0]
            healthy_avg = healthy_data.select(avg(col(feature))).collect()[0][0]
            patient_std = patient_data.select(stddev(col(feature))).collect()[0][0]
            healthy_std = healthy_data.select(stddev(col(feature))).collect()[0][0]
            difference_percent = ((patient_avg - healthy_avg) / healthy_avg) * 100 if healthy_avg != 0 else 0
            pitch_analysis[feature] = {"patient_mean": patient_avg, "healthy_mean": healthy_avg, "patient_std": patient_std, "healthy_std": healthy_std, "difference_percent": difference_percent}
        jitter_features = ["MDVP_Jitter_percent", "MDVP_Jitter_Abs", "MDVP_RAP", "MDVP_PPQ", "Jitter_DDP"]
        jitter_analysis = {}
        jitter_correlation_matrix = df.select(jitter_features).toPandas().corr().to_dict()
        for feature in jitter_features:
            patient_avg = patient_data.select(avg(col(feature))).collect()[0][0]
            healthy_avg = healthy_data.select(avg(col(feature))).collect()[0][0]
            patient_median = patient_data.select(col(feature)).rdd.map(lambda x: x[0]).sortBy(lambda x: x).zipWithIndex().filter(lambda x: x[1] == patient_data.count() // 2).collect()[0][0]
            healthy_median = healthy_data.select(col(feature)).rdd.map(lambda x: x[0]).sortBy(lambda x: x).zipWithIndex().filter(lambda x: x[1] == healthy_data.count() // 2).collect()[0][0]
            effect_size = (patient_avg - healthy_avg) / ((patient_data.select(stddev(col(feature))).collect()[0][0] + healthy_data.select(stddev(col(feature))).collect()[0][0]) / 2)
            jitter_analysis[feature] = {"patient_mean": patient_avg, "healthy_mean": healthy_avg, "patient_median": patient_median, "healthy_median": healthy_median, "effect_size": effect_size}
        shimmer_features = ["MDVP_Shimmer", "MDVP_Shimmer_dB", "Shimmer_APQ3", "Shimmer_APQ5", "MDVP_APQ"]
        shimmer_analysis = {}
        shimmer_correlation_matrix = df.select(shimmer_features).toPandas().corr().to_dict()
        for feature in shimmer_features:
            patient_avg = patient_data.select(avg(col(feature))).collect()[0][0]
            healthy_avg = healthy_data.select(avg(col(feature))).collect()[0][0]
            patient_95_percentile = patient_data.select(col(feature)).rdd.map(lambda x: x[0]).sortBy(lambda x: x).zipWithIndex().filter(lambda x: x[1] == int(patient_data.count() * 0.95)).collect()[0][0]
            healthy_95_percentile = healthy_data.select(col(feature)).rdd.map(lambda x: x[0]).sortBy(lambda x: x).zipWithIndex().filter(lambda x: x[1] == int(healthy_data.count() * 0.95)).collect()[0][0]
            relative_increase = (patient_avg / healthy_avg - 1) * 100 if healthy_avg != 0 else 0
            shimmer_analysis[feature] = {"patient_mean": patient_avg, "healthy_mean": healthy_avg, "patient_95_percentile": patient_95_percentile, "healthy_95_percentile": healthy_95_percentile, "relative_increase": relative_increase}
        voice_quality_features = ["NHR", "HNR"]
        voice_quality_analysis = {}
        for feature in voice_quality_features:
            patient_avg = patient_data.select(avg(col(feature))).collect()[0][0]
            healthy_avg = healthy_data.select(avg(col(feature))).collect()[0][0]
            patient_range = patient_data.select(spark_max(col(feature)) - spark_min(col(feature))).collect()[0][0]
            healthy_range = healthy_data.select(spark_max(col(feature)) - spark_min(col(feature))).collect()[0][0]
            abnormal_threshold = healthy_avg + 2 * healthy_data.select(stddev(col(feature))).collect()[0][0]
            patient_abnormal_count = patient_data.filter(col(feature) > abnormal_threshold).count()
            patient_abnormal_rate = (patient_abnormal_count / patient_data.count()) * 100
            voice_quality_analysis[feature] = {"patient_mean": patient_avg, "healthy_mean": healthy_avg, "patient_range": patient_range, "healthy_range": healthy_range, "abnormal_threshold": abnormal_threshold, "patient_abnormal_rate": patient_abnormal_rate}
        result_data = {"pitch_analysis": pitch_analysis, "jitter_analysis": jitter_analysis, "jitter_correlations": jitter_correlation_matrix, "shimmer_analysis": shimmer_analysis, "shimmer_correlations": shimmer_correlation_matrix, "voice_quality_analysis": voice_quality_analysis}
        return JsonResponse({"status": "success", "data": result_data})

class FeatureCorrelationAndImportanceAnalysis(View):
    def post(self, request):
        df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/parkinson_db").option("dbtable", "voice_features").option("user", "root").option("password", "password").option("driver", "com.mysql.cj.jdbc.Driver").load()
        voice_features = ["MDVP_Fo_Hz", "MDVP_Fhi_Hz", "MDVP_Flo_Hz", "MDVP_Jitter_percent", "MDVP_Jitter_Abs", "MDVP_RAP", "MDVP_PPQ", "Jitter_DDP", "MDVP_Shimmer", "MDVP_Shimmer_dB", "Shimmer_APQ3", "Shimmer_APQ5", "MDVP_APQ", "NHR", "HNR", "RPDE", "DFA", "spread1", "spread2", "D2", "PPE"]
        status_correlations = {}
        for feature in voice_features:
            correlation_result = df.select(corr(col(feature), col("status"))).collect()[0][0]
            absolute_correlation = abs(correlation_result) if correlation_result is not None else 0
            status_correlations[feature] = {"correlation": correlation_result, "absolute_correlation": absolute_correlation}
        sorted_correlations = dict(sorted(status_correlations.items(), key=lambda x: x[1]["absolute_correlation"], reverse=True))
        assembler = VectorAssembler(inputCols=voice_features, outputCol="features")
        assembled_data = assembler.transform(df)
        train_data, test_data = assembled_data.randomSplit([0.8, 0.2], seed=42)
        rf = RandomForestClassifier(labelCol="status", featuresCol="features", numTrees=100, maxDepth=10, seed=42)
        rf_model = rf.fit(train_data)
        feature_importance = rf_model.featureImportances.toArray()
        importance_dict = {}
        for i, feature in enumerate(voice_features):
            importance_dict[feature] = {"importance_score": float(feature_importance[i]), "rank": 0}
        sorted_importance = dict(sorted(importance_dict.items(), key=lambda x: x[1]["importance_score"], reverse=True))
        for rank, (feature, data) in enumerate(sorted_importance.items(), 1):
            sorted_importance[feature]["rank"] = rank
        predictions = rf_model.transform(test_data)
        evaluator = BinaryClassificationEvaluator(labelCol="status", rawPredictionCol="rawPrediction", metricName="areaUnderROC")
        auc_score = evaluator.evaluate(predictions)
        accuracy = predictions.filter(col("status") == col("prediction")).count() / predictions.count()
        jitter_features = ["MDVP_Jitter_percent", "MDVP_Jitter_Abs", "MDVP_RAP", "MDVP_PPQ", "Jitter_DDP"]
        jitter_correlations = {}
        for i, feature1 in enumerate(jitter_features):
            jitter_correlations[feature1] = {}
            for j, feature2 in enumerate(jitter_features):
                if i != j:
                    correlation = df.select(corr(col(feature1), col(feature2))).collect()[0][0]
                    jitter_correlations[feature1][feature2] = correlation
        shimmer_features = ["MDVP_Shimmer", "MDVP_Shimmer_dB", "Shimmer_APQ3", "Shimmer_APQ5", "MDVP_APQ"]
        shimmer_correlations = {}
        for i, feature1 in enumerate(shimmer_features):
            shimmer_correlations[feature1] = {}
            for j, feature2 in enumerate(shimmer_features):
                if i != j:
                    correlation = df.select(corr(col(feature1), col(feature2))).collect()[0][0]
                    shimmer_correlations[feature1][feature2] = correlation
        combined_analysis = {}
        for feature in voice_features:
            correlation_rank = list(sorted_correlations.keys()).index(feature) + 1
            importance_rank = sorted_importance[feature]["rank"]
            combined_score = (1/correlation_rank + 1/importance_rank) / 2
            combined_analysis[feature] = {"correlation_rank": correlation_rank, "importance_rank": importance_rank, "combined_score": combined_score, "correlation_value": sorted_correlations[feature]["correlation"], "importance_value": sorted_importance[feature]["importance_score"]}
        final_ranking = dict(sorted(combined_analysis.items(), key=lambda x: x[1]["combined_score"], reverse=True))
        result_data = {"status_correlations": sorted_correlations, "feature_importance": sorted_importance, "model_performance": {"auc_score": auc_score, "accuracy": accuracy}, "jitter_internal_correlations": jitter_correlations, "shimmer_internal_correlations": shimmer_correlations, "combined_feature_ranking": final_ranking}
        return JsonResponse({"status": "success", "data": result_data})

 

帕金森病数据可视化分析系统-结语

 1套完整源码+30个核心功能点:基于Spark的帕金森病数据可视化分析系统毕设必备

最实用却最容易被忽视的毕设选择:帕金森病大数据分析系统Hadoop+Spark完整方案

大数据技术太复杂不会用?帕金森病数据分析系统让你轻松掌握Hadoop+Spark核心应用

感谢大家点赞、收藏、投币+关注,如果遇到有技术问题或者获取源代码,欢迎在评论区一起交流探讨!

 

⚡⚡获取源码主页--> **space.bilibili.com/35463818075…

⚡⚡如果遇到具体的技术问题或计算机毕设方面需求,你也可以在主页上咨询我~~