Python大数据分析毕设:妊娠期数据可视化系统完整技术栈实现

60 阅读5分钟

前言

💖💖作者:计算机程序员小杨 💙💙个人简介:我是一名计算机相关专业的从业者,擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。热爱技术,喜欢钻研新工具和框架,也乐于通过代码解决实际问题,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💕💕文末获取源码联系 计算机程序员小杨 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目 计算机毕业设计选题 💜💜

一.开发工具简介

大数据框架:Hadoop+Spark(本次没用Hive,支持定制) 开发语言:Python+Java(两个版本都支持) 后端框架:Django+Spring Boot(Spring+SpringMVC+Mybatis)(两个版本都支持) 前端:Vue+ElementUI+Echarts+HTML+CSS+JavaScript+jQuery 详细技术点:Hadoop、HDFS、Spark、Spark SQL、Pandas、NumPy 数据库:MySQL

二.系统内容简介

本系统是一个基于大数据技术栈的妊娠期健康数据分析与可视化平台,采用Hadoop+Spark作为核心大数据处理框架,结合Python数据科学生态和Django Web框架构建。系统通过对儿童出生体重和妊娠期相关数据的深度挖掘,实现了孕期体重关联分析、健康习惯影响评估、母亲特征影响研究等多维度数据分析功能。在数据处理层面,系统利用HDFS进行分布式存储,Spark SQL执行大规模数据查询,Pandas和NumPy进行精细化数据处理,确保能够高效处理海量妊娠期健康数据。前端采用Vue+ElementUI构建现代化用户界面,集成Echarts实现丰富的数据可视化展示,包括核心指标分布图表、多因素综合分析报告等。系统不仅具备完整的用户管理和数据管理功能,更重要的是通过大数据分析技术发现妊娠期各因素与儿童出生体重之间的潜在关联规律,为医疗健康领域提供数据支撑和决策参考,真正体现了大数据技术在实际业务场景中的应用价值。

三.系统功能演示

Python大数据分析毕设:妊娠期数据可视化系统完整技术栈实现

四.系统界面展示

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

五.系统源码展示


from pyspark.sql import SparkSession
from pyspark.sql.functions import col, avg, count, when, isnan, isnull, desc, asc
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.regression import LinearRegression
from pyspark.ml.evaluation import RegressionEvaluator
import pandas as pd
import numpy as np
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
import json

spark = SparkSession.builder.appName("PregnancyDataAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()

def pregnancy_weight_correlation_analysis(request):
    try:
        df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/pregnancy_db").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "pregnancy_data").option("user", "root").option("password", "password").load()
        df_clean = df.filter(col("birth_weight").isNotNull() & col("pregnancy_weight_gain").isNotNull() & col("pre_pregnancy_weight").isNotNull())
        correlation_df = df_clean.select("pregnancy_weight_gain", "birth_weight", "pre_pregnancy_weight", "pregnancy_weeks")
        weight_ranges = correlation_df.withColumn("weight_gain_category", when(col("pregnancy_weight_gain") < 7, "低增重").when(col("pregnancy_weight_gain") < 16, "正常增重").otherwise("高增重"))
        category_stats = weight_ranges.groupBy("weight_gain_category").agg(avg("birth_weight").alias("avg_birth_weight"), count("birth_weight").alias("sample_count"), avg("pregnancy_weeks").alias("avg_weeks"))
        result_pandas = category_stats.toPandas()
        correlation_matrix = correlation_df.toPandas().corr()
        correlation_data = correlation_matrix.to_dict()
        analysis_result = {"category_analysis": result_pandas.to_dict('records'), "correlation_matrix": correlation_data, "total_samples": df_clean.count()}
        return JsonResponse({"status": "success", "data": analysis_result})
    except Exception as e:
        return JsonResponse({"status": "error", "message": str(e)})

def health_habits_impact_analysis(request):
    try:
        habits_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/pregnancy_db").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "health_habits_data").option("user", "root").option("password", "password").load()
        birth_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/pregnancy_db").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "pregnancy_data").option("user", "root").option("password", "password").load()
        merged_df = habits_df.join(birth_df, "patient_id", "inner")
        smoking_analysis = merged_df.groupBy("smoking_status").agg(avg("birth_weight").alias("avg_weight"), count("birth_weight").alias("count"), avg("pregnancy_weeks").alias("avg_weeks"))
        exercise_analysis = merged_df.groupBy("exercise_frequency").agg(avg("birth_weight").alias("avg_weight"), count("birth_weight").alias("count"))
        nutrition_analysis = merged_df.groupBy("nutrition_score_range").agg(avg("birth_weight").alias("avg_weight"), count("birth_weight").alias("count"))
        sleep_analysis = merged_df.groupBy("sleep_quality").agg(avg("birth_weight").alias("avg_weight"), count("birth_weight").alias("count"))
        risk_factors = merged_df.filter((col("smoking_status") == "吸烟") | (col("alcohol_consumption") == "经常饮酒") | (col("sleep_quality") == "差"))
        high_risk_stats = risk_factors.agg(avg("birth_weight").alias("high_risk_avg_weight"), count("birth_weight").alias("high_risk_count"))
        normal_group = merged_df.filter((col("smoking_status") == "不吸烟") & (col("alcohol_consumption") == "不饮酒") & (col("sleep_quality") != "差"))
        normal_stats = normal_group.agg(avg("birth_weight").alias("normal_avg_weight"), count("birth_weight").alias("normal_count"))
        analysis_result = {"smoking_impact": smoking_analysis.toPandas().to_dict('records'), "exercise_impact": exercise_analysis.toPandas().to_dict('records'), "nutrition_impact": nutrition_analysis.toPandas().to_dict('records'), "sleep_impact": sleep_analysis.toPandas().to_dict('records'), "risk_comparison": {"high_risk": high_risk_stats.collect()[0].asDict(), "normal_group": normal_stats.collect()[0].asDict()}}
        return JsonResponse({"status": "success", "data": analysis_result})
    except Exception as e:
        return JsonResponse({"status": "error", "message": str(e)})

def multi_factor_comprehensive_analysis(request):
    try:
        comprehensive_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/pregnancy_db").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "comprehensive_analysis_view").option("user", "root").option("password", "password").load()
        feature_columns = ["maternal_age", "pre_pregnancy_bmi", "pregnancy_weight_gain", "exercise_score", "nutrition_score", "sleep_score"]
        assembler = VectorAssembler(inputCols=feature_columns, outputCol="features")
        feature_df = assembler.transform(comprehensive_df)
        train_data, test_data = feature_df.randomSplit([0.8, 0.2], seed=42)
        lr = LinearRegression(featuresCol="features", labelCol="birth_weight")
        model = lr.fit(train_data)
        predictions = model.transform(test_data)
        evaluator = RegressionEvaluator(labelCol="birth_weight", predictionCol="prediction", metricName="rmse")
        rmse = evaluator.evaluate(predictions)
        feature_importance = dict(zip(feature_columns, model.coefficients.toArray()))
        age_groups = comprehensive_df.withColumn("age_group", when(col("maternal_age") < 25, "年轻组").when(col("maternal_age") < 35, "适龄组").otherwise("高龄组"))
        age_analysis = age_groups.groupBy("age_group").agg(avg("birth_weight").alias("avg_weight"), count("birth_weight").alias("count"), avg("pregnancy_complications").alias("complication_rate"))
        bmi_groups = comprehensive_df.withColumn("bmi_category", when(col("pre_pregnancy_bmi") < 18.5, "偏瘦").when(col("pre_pregnancy_bmi") < 24, "正常").when(col("pre_pregnancy_bmi") < 28, "超重").otherwise("肥胖"))
        bmi_analysis = bmi_groups.groupBy("bmi_category").agg(avg("birth_weight").alias("avg_weight"), count("birth_weight").alias("count"))
        high_birth_weight = comprehensive_df.filter(col("birth_weight") > 4000).count()
        low_birth_weight = comprehensive_df.filter(col("birth_weight") < 2500).count()
        total_records = comprehensive_df.count()
        analysis_result = {"model_performance": {"rmse": rmse, "feature_importance": feature_importance}, "age_group_analysis": age_analysis.toPandas().to_dict('records'), "bmi_analysis": bmi_analysis.toPandas().to_dict('records'), "birth_weight_distribution": {"high_weight_rate": high_birth_weight / total_records, "low_weight_rate": low_birth_weight / total_records, "normal_rate": (total_records - high_birth_weight - low_birth_weight) / total_records}}
        return JsonResponse({"status": "success", "data": analysis_result})
    except Exception as e:
        return JsonResponse({"status": "error", "message": str(e)})



六.系统文档展示

在这里插入图片描述

结束

💛💛想说的话:感谢大家的关注与支持! 💕💕文末获取源码联系 计算机程序员小杨 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目 计算机毕业设计选题 💜💜