🎓 作者:计算机毕设小月哥 | 软件开发专家
🖥️ 简介:8年计算机软件程序开发经验。精通Java、Python、微信小程序、安卓、大数据、PHP、.NET|C#、Golang等技术栈。
🛠️ 专业服务 🛠️
需求定制化开发
源码提供与讲解
技术文档撰写(指导计算机毕设选题【新颖+创新】、任务书、开题报告、文献综述、外文翻译等)
项目答辩演示PPT制作
🌟 欢迎:点赞 👍 收藏 ⭐ 评论 📝
👇🏻 精选专栏推荐 👇🏻 欢迎订阅关注!
🍅 ↓↓主页获取源码联系↓↓🍅
基于大数据的帕金森病数据可视化分析系统-功能介绍
基于大数据的帕金森病数据可视化分析系统是一个专门针对帕金森病患者语音特征进行深度分析的医疗大数据平台。该系统采用Hadoop分布式存储架构和Spark大数据处理引擎作为核心技术栈,结合Django后端框架和Vue前端框架构建了完整的数据分析生态。系统主要处理包含22个语音学指标的帕金森病数据集,通过四个维度的分析模块对患者与健康人群进行对比研究:首先从数据集整体健康度与患者群体画像入手,分析样本均衡性和关键指标的描述性统计;然后深入探索帕金森病核心语音特征差异,包括音高特征、频率微扰、振幅微扰和嗓音质量等方面的对比分析;接着进行特征关联性挖掘与关键指标识别,运用机器学习算法找出与疾病关联最紧密的语音指标;最后聚焦非线性动力学特征的深度探索,从系统混沌和信号复杂度视角提供诊断见解。系统整合了Pandas和NumPy进行数据处理,运用Echarts实现数据的多维度可视化展示,为医疗研究人员提供了一个功能完善、技术先进的帕金森病语音特征分析工具。
基于大数据的帕金森病数据可视化分析系统-选题背景意义
选题背景 帕金森病作为一种常见的神经退行性疾病,主要影响患者的运动功能和语音表达能力,早期诊断对于患者的治疗和生活质量改善具有重要价值。传统的帕金森病诊断主要依赖临床医生的经验判断和运动症状观察,这种方式不仅主观性较强,而且在疾病早期阶段往往难以准确识别。随着语音学研究的深入发展,研究人员发现帕金森病患者的语音特征会发生显著变化,包括音高不稳定、频率微扰增加、振幅波动异常等现象,这些语音学指标为疾病的客观化诊断提供了新的可能。近年来大数据技术的快速发展为医疗数据的深度挖掘和分析带来了新的机遇,Hadoop和Spark等分布式计算框架能够高效处理大规模的医疗数据,而机器学习算法的应用使得从复杂数据中提取有价值信息变得更加可行,为帕金森病的智能化诊断辅助奠定了技术基础。 选题意义 本课题的研究意义主要体现在技术实践和应用价值两个层面。从技术角度来看,该系统将大数据处理技术与医疗数据分析相结合,为计算机专业学生提供了一个很好的跨学科应用实践平台,能够帮助学习者掌握Hadoop、Spark等主流大数据技术在实际项目中的应用方法,同时培养数据分析和可视化开发能力。从应用价值角度分析,虽然作为毕业设计项目,该系统的研究深度和应用范围相对有限,但它为帕金森病的辅助诊断研究提供了一个基础的技术框架和分析思路,通过对语音特征的多维度分析和可视化展示,能够帮助研究人员更直观地理解帕金森病患者与健康人群在语音表达上的差异特征。该系统的开发过程也为后续更深入的医疗大数据研究奠定了技术基础,虽然目前主要用于学习和演示目的,但其技术架构和分析方法具有一定的参考价值,可以为相关领域的研究工作提供一些有益的启发和借鉴。
基于大数据的帕金森病数据可视化分析系统-技术选型
大数据框架:Hadoop+Spark(本次没用Hive,支持定制) 开发语言:Python+Java(两个版本都支持) 后端框架:Django+Spring Boot(Spring+SpringMVC+Mybatis)(两个版本都支持) 前端:Vue+ElementUI+Echarts+HTML+CSS+JavaScript+jQuery 详细技术点:Hadoop、HDFS、Spark、Spark SQL、Pandas、NumPy 数据库:MySQL
基于大数据的帕金森病数据可视化分析系统-视频展示
基于大数据的帕金森病数据可视化分析系统-图片展示
基于大数据的帕金森病数据可视化分析系统-代码展示
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, count, avg, stddev, corr, desc
from pyspark.sql.types import StructType, StructField, DoubleType, IntegerType
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.evaluation import BinaryClassificationEvaluator
import pandas as pd
import numpy as np
from django.http import JsonResponse
spark = SparkSession.builder.appName("ParkinsonAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()
def comprehensive_group_analysis(request):
parkinson_df = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/parkinson_data/parkinsons.csv")
total_samples = parkinson_df.count()
patient_count = parkinson_df.filter(col("status") == 1).count()
healthy_count = parkinson_df.filter(col("status") == 0).count()
balance_ratio = min(patient_count, healthy_count) / max(patient_count, healthy_count)
voice_features = ["MDVP:Fo(Hz)", "MDVP:Fhi(Hz)", "MDVP:Flo(Hz)", "MDVP:Jitter(%)", "MDVP:Jitter(Abs)", "MDVP:RAP", "MDVP:PPQ", "Jitter:DDP", "MDVP:Shimmer", "MDVP:Shimmer(dB)", "Shimmer:APQ3", "Shimmer:APQ5", "MDVP:APQ", "Shimmer:DDA", "NHR", "HNR", "RPDE", "DFA", "spread1", "spread2", "D2", "PPE"]
overall_stats = parkinson_df.select([avg(col(feature)).alias(f"{feature}_mean") for feature in voice_features] + [stddev(col(feature)).alias(f"{feature}_std") for feature in voice_features]).collect()[0].asDict()
patient_stats = parkinson_df.filter(col("status") == 1).select([avg(col(feature)).alias(f"{feature}_patient_mean") for feature in voice_features] + [stddev(col(feature)).alias(f"{feature}_patient_std") for feature in voice_features]).collect()[0].asDict()
healthy_stats = parkinson_df.filter(col("status") == 0).select([avg(col(feature)).alias(f"{feature}_healthy_mean") for feature in voice_features] + [stddev(col(feature)).alias(f"{feature}_healthy_std") for feature in voice_features]).collect()[0].asDict()
variance_comparison = {}
for feature in voice_features:
patient_variance = parkinson_df.filter(col("status") == 1).select(stddev(col(feature)).alias("std")).collect()[0]["std"]
healthy_variance = parkinson_df.filter(col("status") == 0).select(stddev(col(feature)).alias("std")).collect()[0]["std"]
variance_ratio = patient_variance / healthy_variance if healthy_variance > 0 else 0
variance_comparison[feature] = {"patient_std": patient_variance, "healthy_std": healthy_variance, "variance_ratio": variance_ratio}
key_indicators_analysis = {}
for feature in ["spread1", "PPE", "MDVP:Fo(Hz)", "NHR", "HNR"]:
patient_mean = parkinson_df.filter(col("status") == 1).select(avg(col(feature))).collect()[0][0]
healthy_mean = parkinson_df.filter(col("status") == 0).select(avg(col(feature))).collect()[0][0]
difference_percentage = abs(patient_mean - healthy_mean) / healthy_mean * 100 if healthy_mean > 0 else 0
key_indicators_analysis[feature] = {"patient_mean": patient_mean, "healthy_mean": healthy_mean, "difference_percentage": difference_percentage}
result_data = {"sample_balance": {"total_samples": total_samples, "patient_count": patient_count, "healthy_count": healthy_count, "balance_ratio": balance_ratio}, "overall_statistics": overall_stats, "patient_profile": patient_stats, "healthy_profile": healthy_stats, "variance_comparison": variance_comparison, "key_indicators": key_indicators_analysis}
return JsonResponse(result_data)
def voice_feature_difference_analysis(request):
parkinson_df = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/parkinson_data/parkinsons.csv")
pitch_features = ["MDVP:Fo(Hz)", "MDVP:Fhi(Hz)", "MDVP:Flo(Hz)"]
pitch_analysis = {}
for feature in pitch_features:
patient_stats = parkinson_df.filter(col("status") == 1).select(avg(col(feature)).alias("mean"), stddev(col(feature)).alias("std"), col(feature).alias("values")).agg(avg("mean").alias("avg_mean"), avg("std").alias("avg_std")).collect()[0]
healthy_stats = parkinson_df.filter(col("status") == 0).select(avg(col(feature)).alias("mean"), stddev(col(feature)).alias("std"), col(feature).alias("values")).agg(avg("mean").alias("avg_mean"), avg("std").alias("avg_std")).collect()[0]
statistical_significance = abs(patient_stats["avg_mean"] - healthy_stats["avg_mean"]) / ((patient_stats["avg_std"] + healthy_stats["avg_std"]) / 2)
pitch_analysis[feature] = {"patient_mean": patient_stats["avg_mean"], "patient_std": patient_stats["avg_std"], "healthy_mean": healthy_stats["avg_mean"], "healthy_std": healthy_stats["avg_std"], "effect_size": statistical_significance}
jitter_features = ["MDVP:Jitter(%)", "MDVP:Jitter(Abs)", "MDVP:RAP", "MDVP:PPQ", "Jitter:DDP"]
jitter_analysis = {}
jitter_correlations = []
for i, feature1 in enumerate(jitter_features):
patient_mean = parkinson_df.filter(col("status") == 1).select(avg(col(feature1))).collect()[0][0]
healthy_mean = parkinson_df.filter(col("status") == 0).select(avg(col(feature1))).collect()[0][0]
instability_ratio = patient_mean / healthy_mean if healthy_mean > 0 else 0
jitter_analysis[feature1] = {"patient_mean": patient_mean, "healthy_mean": healthy_mean, "instability_ratio": instability_ratio}
for j, feature2 in enumerate(jitter_features[i+1:], i+1):
correlation_coef = parkinson_df.select(corr(col(feature1), col(feature2)).alias("correlation")).collect()[0]["correlation"]
jitter_correlations.append({"feature1": feature1, "feature2": feature2, "correlation": correlation_coef})
shimmer_features = ["MDVP:Shimmer", "MDVP:Shimmer(dB)", "Shimmer:APQ3", "Shimmer:APQ5", "MDVP:APQ"]
shimmer_analysis = {}
shimmer_correlations = []
for i, feature1 in enumerate(shimmer_features):
patient_mean = parkinson_df.filter(col("status") == 1).select(avg(col(feature1))).collect()[0][0]
healthy_mean = parkinson_df.filter(col("status") == 0).select(avg(col(feature1))).collect()[0][0]
amplitude_instability = patient_mean / healthy_mean if healthy_mean > 0 else 0
shimmer_analysis[feature1] = {"patient_mean": patient_mean, "healthy_mean": healthy_mean, "amplitude_instability": amplitude_instability}
for j, feature2 in enumerate(shimmer_features[i+1:], i+1):
correlation_coef = parkinson_df.select(corr(col(feature1), col(feature2)).alias("correlation")).collect()[0]["correlation"]
shimmer_correlations.append({"feature1": feature1, "feature2": feature2, "correlation": correlation_coef})
voice_quality_features = ["NHR", "HNR"]
voice_quality_analysis = {}
for feature in voice_quality_features:
patient_percentiles = parkinson_df.filter(col("status") == 1).select(col(feature)).rdd.map(lambda x: x[0]).collect()
healthy_percentiles = parkinson_df.filter(col("status") == 0).select(col(feature)).rdd.map(lambda x: x[0]).collect()
patient_median = np.median(patient_percentiles)
healthy_median = np.median(healthy_percentiles)
patient_q75 = np.percentile(patient_percentiles, 75)
healthy_q25 = np.percentile(healthy_percentiles, 25)
voice_quality_analysis[feature] = {"patient_median": patient_median, "healthy_median": healthy_median, "patient_q75": patient_q75, "healthy_q25": healthy_q25, "quality_degradation": patient_median / healthy_median if healthy_median > 0 else 0}
result_data = {"pitch_analysis": pitch_analysis, "jitter_analysis": jitter_analysis, "jitter_correlations": jitter_correlations, "shimmer_analysis": shimmer_analysis, "shimmer_correlations": shimmer_correlations, "voice_quality": voice_quality_analysis}
return JsonResponse(result_data)
def feature_importance_mining(request):
parkinson_df = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/parkinson_data/parkinsons.csv")
voice_features = ["MDVP:Fo(Hz)", "MDVP:Fhi(Hz)", "MDVP:Flo(Hz)", "MDVP:Jitter(%)", "MDVP:Jitter(Abs)", "MDVP:RAP", "MDVP:PPQ", "Jitter:DDP", "MDVP:Shimmer", "MDVP:Shimmer(dB)", "Shimmer:APQ3", "Shimmer:APQ5", "MDVP:APQ", "Shimmer:DDA", "NHR", "HNR", "RPDE", "DFA", "spread1", "spread2", "D2", "PPE"]
correlation_analysis = {}
for feature in voice_features:
correlation_coef = parkinson_df.select(corr(col(feature), col("status")).alias("correlation")).collect()[0]["correlation"]
correlation_analysis[feature] = abs(correlation_coef) if correlation_coef is not None else 0
sorted_correlations = sorted(correlation_analysis.items(), key=lambda x: x[1], reverse=True)
assembler = VectorAssembler(inputCols=voice_features, outputCol="features")
feature_vector_df = assembler.transform(parkinson_df)
train_data, test_data = feature_vector_df.randomSplit([0.8, 0.2], seed=42)
rf_classifier = RandomForestClassifier(featuresCol="features", labelCol="status", numTrees=100, maxDepth=10, seed=42)
rf_model = rf_classifier.fit(train_data)
feature_importances = rf_model.featureImportances.toArray()
feature_importance_dict = {voice_features[i]: float(feature_importances[i]) for i in range(len(voice_features))}
sorted_importance = sorted(feature_importance_dict.items(), key=lambda x: x[1], reverse=True)
predictions = rf_model.transform(test_data)
evaluator = BinaryClassificationEvaluator(labelCol="status", rawPredictionCol="rawPrediction", metricName="areaUnderROC")
model_accuracy = evaluator.evaluate(predictions)
nonlinear_features = ["RPDE", "DFA", "D2", "PPE", "spread1", "spread2"]
nonlinear_analysis = {}
for feature in nonlinear_features:
patient_values = parkinson_df.filter(col("status") == 1).select(col(feature)).rdd.map(lambda x: x[0]).collect()
healthy_values = parkinson_df.filter(col("status") == 0).select(col(feature)).rdd.map(lambda x: x[0]).collect()
patient_complexity = np.std(patient_values) / np.mean(patient_values) if np.mean(patient_values) > 0 else 0
healthy_complexity = np.std(healthy_values) / np.mean(healthy_values) if np.mean(healthy_values) > 0 else 0
complexity_ratio = patient_complexity / healthy_complexity if healthy_complexity > 0 else 0
nonlinear_analysis[feature] = {"patient_complexity": patient_complexity, "healthy_complexity": healthy_complexity, "complexity_ratio": complexity_ratio}
cross_domain_correlations = {}
linear_features = ["MDVP:Fo(Hz)", "MDVP:Fhi(Hz)", "MDVP:Flo(Hz)"]
for linear_feature in linear_features:
cross_correlations = {}
for nonlinear_feature in nonlinear_features:
correlation_coef = parkinson_df.select(corr(col(linear_feature), col(nonlinear_feature)).alias("correlation")).collect()[0]["correlation"]
cross_correlations[nonlinear_feature] = correlation_coef if correlation_coef is not None else 0
cross_domain_correlations[linear_feature] = cross_correlations
top_10_features = sorted_importance[:10]
feature_combination_analysis = {}
for i in range(0, len(top_10_features), 2):
if i + 1 < len(top_10_features):
feature1, importance1 = top_10_features[i]
feature2, importance2 = top_10_features[i + 1]
combined_correlation = parkinson_df.select(corr(col(feature1), col(feature2)).alias("correlation")).collect()[0]["correlation"]
synergy_score = (importance1 + importance2) * (1 + abs(combined_correlation)) if combined_correlation is not None else importance1 + importance2
feature_combination_analysis[f"{feature1}+{feature2}"] = {"individual_importance": [importance1, importance2], "correlation": combined_correlation, "synergy_score": synergy_score}
result_data = {"correlation_ranking": sorted_correlations, "feature_importance_ranking": sorted_importance, "model_performance": {"accuracy": model_accuracy, "top_features_count": len([x for x in sorted_importance if x[1] > 0.05])}, "nonlinear_complexity": nonlinear_analysis, "cross_domain_correlations": cross_domain_correlations, "feature_combinations": feature_combination_analysis}
return JsonResponse(result_data)
基于大数据的帕金森病数据可视化分析系统-结语
🌟 欢迎:点赞 👍 收藏 ⭐ 评论 📝
👇🏻 精选专栏推荐 👇🏻 欢迎订阅关注!
🍅 ↓↓主页获取源码联系↓↓🍅