【数据分析】基于大数据的健康保险数据可视化分析系统 | 大数据毕设实战项目 毕业设计选题推荐 可视化大屏 Hadoop SPark java

39 阅读6分钟

💖💖作者:计算机毕业设计杰瑞 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学校实战项目 计算机毕业设计选题推荐

基于大数据的健康保险数据可视化分析系统介绍

基于大数据的健康保险数据可视化分析系统是一个面向保险行业数据分析需求的综合性平台,采用Hadoop分布式存储架构和Spark计算引擎作为核心技术支撑。系统后端提供Python Django和Java SpringBoot两种实现方案,前端采用Vue框架结合ElementUI组件库和Echarts图表库构建交互界面。在数据存储层面,系统利用HDFS分布式文件系统存储海量健康保险数据,通过Spark SQL进行高效的数据查询和预处理,结合Pandas和NumPy进行深度数据分析。功能模块涵盖用户管理、健康保险数据管理、大屏可视化分析、综合聚类分析、医疗费用关联分析、投保人画像分析以及保费特征分析等核心业务场景。系统通过聚类算法识别不同投保人群体特征,运用关联分析挖掘医疗费用与保险理赔之间的潜在规律,并通过多维度的数据可视化大屏直观展示分析结果,为保险产品设计和风险评估提供数据支持。整体架构实现了从数据采集、存储、计算到可视化展示的完整数据分析流程。

基于大数据的健康保险数据可视化分析系统演示视频

演示视频

基于大数据的健康保险数据可视化分析系统演示图片

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

基于大数据的健康保险数据可视化分析系统代码展示

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, avg, sum, count, when, round, desc
from pyspark.ml.feature import VectorAssembler, StandardScaler
from pyspark.ml.clustering import KMeans
from django.http import JsonResponse
from django.views import View
import pandas as pd
import numpy as np

spark = SparkSession.builder.appName("HealthInsuranceAnalysis").config("spark.sql.warehouse.dir", "hdfs://localhost:9000/user/hive/warehouse").config("spark.executor.memory", "2g").config("spark.driver.memory", "1g").getOrCreate()

class ComprehensiveClusteringAnalysisView(View):
    def get(self, request):
        df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/health_insurance").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "health_insurance_data").option("user", "root").option("password", "root").load()
        df.write.mode("overwrite").parquet("hdfs://localhost:9000/insurance/raw_data")
        insurance_df = spark.read.parquet("hdfs://localhost:9000/insurance/raw_data")
        feature_df = insurance_df.select(col("age").cast("double"), col("bmi").cast("double"), col("medical_cost").cast("double"), col("premium").cast("double"), col("claim_amount").cast("double"))
        feature_df = feature_df.na.fill({"age": 0, "bmi": 0, "medical_cost": 0, "premium": 0, "claim_amount": 0})
        assembler = VectorAssembler(inputCols=["age", "bmi", "medical_cost", "premium", "claim_amount"], outputCol="features")
        assembled_df = assembler.transform(feature_df)
        scaler = StandardScaler(inputCol="features", outputCol="scaled_features", withStd=True, withMean=True)
        scaler_model = scaler.fit(assembled_df)
        scaled_df = scaler_model.transform(assembled_df)
        kmeans = KMeans(k=4, seed=42, featuresCol="scaled_features", predictionCol="cluster")
        model = kmeans.fit(scaled_df)
        clustered_df = model.transform(scaled_df)
        cluster_stats = clustered_df.groupBy("cluster").agg(count("*").alias("count"), round(avg("age"), 2).alias("avg_age"), round(avg("bmi"), 2).alias("avg_bmi"), round(avg("medical_cost"), 2).alias("avg_medical_cost"), round(avg("premium"), 2).alias("avg_premium"), round(avg("claim_amount"), 2).alias("avg_claim_amount"))
        cluster_results = cluster_stats.orderBy("cluster").collect()
        result_list = []
        for row in cluster_results:
            result_list.append({"cluster_id": int(row["cluster"]), "member_count": int(row["count"]), "average_age": float(row["avg_age"]), "average_bmi": float(row["avg_bmi"]), "average_medical_cost": float(row["avg_medical_cost"]), "average_premium": float(row["avg_premium"]), "average_claim_amount": float(row["avg_claim_amount"])})
        clustered_df.write.mode("overwrite").parquet("hdfs://localhost:9000/insurance/clustering_result")
        return JsonResponse({"code": 200, "msg": "聚类分析完成", "data": result_list})

class MedicalCostAssociationAnalysisView(View):
    def get(self, request):
        clustered_df = spark.read.parquet("hdfs://localhost:9000/insurance/clustering_result")
        insurance_df = spark.read.parquet("hdfs://localhost:9000/insurance/raw_data")
        full_df = insurance_df.join(clustered_df.select("age", "bmi", "medical_cost", "premium", "claim_amount", "cluster"), on=["age", "bmi", "medical_cost", "premium", "claim_amount"], how="left")
        age_range_df = full_df.withColumn("age_range", when(col("age") < 30, "20-30岁").when((col("age") >= 30) & (col("age") < 40), "30-40岁").when((col("age") >= 40) & (col("age") < 50), "40-50岁").otherwise("50岁以上"))
        bmi_range_df = age_range_df.withColumn("bmi_range", when(col("bmi") < 18.5, "偏瘦").when((col("bmi") >= 18.5) & (col("bmi") < 24), "正常").when((col("bmi") >= 24) & (col("bmi") < 28), "超重").otherwise("肥胖"))
        association_stats = bmi_range_df.groupBy("age_range", "bmi_range", "cluster").agg(count("*").alias("record_count"), round(avg("medical_cost"), 2).alias("avg_medical_cost"), round(avg("claim_amount"), 2).alias("avg_claim_amount"), round(sum("medical_cost"), 2).alias("total_medical_cost"), round(sum("claim_amount"), 2).alias("total_claim_amount"))
        association_results = association_stats.orderBy(desc("total_medical_cost")).limit(50).collect()
        pandas_df = pd.DataFrame([row.asDict() for row in association_results])
        correlation_matrix = pandas_df[["avg_medical_cost", "avg_claim_amount", "record_count"]].corr()
        correlation_value = float(correlation_matrix.loc["avg_medical_cost", "avg_claim_amount"])
        result_list = []
        for row in association_results:
            result_list.append({"age_range": row["age_range"], "bmi_range": row["bmi_range"], "cluster_id": int(row["cluster"]) if row["cluster"] is not None else -1, "record_count": int(row["record_count"]), "avg_medical_cost": float(row["avg_medical_cost"]), "avg_claim_amount": float(row["avg_claim_amount"]), "total_medical_cost": float(row["total_medical_cost"]), "total_claim_amount": float(row["total_claim_amount"])})
        association_stats.write.mode("overwrite").parquet("hdfs://localhost:9000/insurance/association_result")
        return JsonResponse({"code": 200, "msg": "关联分析完成", "data": result_list, "correlation": round(correlation_value, 4)})

class InsuredProfileAnalysisView(View):
    def get(self, request):
        clustered_df = spark.read.parquet("hdfs://localhost:9000/insurance/clustering_result")
        insurance_df = spark.read.parquet("hdfs://localhost:9000/insurance/raw_data")
        full_df = insurance_df.join(clustered_df.select("age", "bmi", "medical_cost", "premium", "claim_amount", "cluster"), on=["age", "bmi", "medical_cost", "premium", "claim_amount"], how="left")
        gender_stats = full_df.groupBy("gender", "cluster").agg(count("*").alias("count"), round(avg("premium"), 2).alias("avg_premium"), round(avg("claim_amount"), 2).alias("avg_claim"))
        region_stats = full_df.groupBy("region", "cluster").agg(count("*").alias("count"), round(avg("medical_cost"), 2).alias("avg_medical_cost"))
        smoker_stats = full_df.groupBy("smoker", "cluster").agg(count("*").alias("count"), round(avg("medical_cost"), 2).alias("avg_medical_cost"), round(avg("claim_amount"), 2).alias("avg_claim"))
        children_stats = full_df.groupBy("children", "cluster").agg(count("*").alias("count"), round(avg("premium"), 2).alias("avg_premium"))
        high_risk_df = full_df.filter((col("medical_cost") > 10000) | (col("claim_amount") > 8000))
        high_risk_profile = high_risk_df.groupBy("cluster").agg(count("*").alias("high_risk_count"), round(avg("age"), 2).alias("avg_age"), round(avg("bmi"), 2).alias("avg_bmi"))
        gender_list = [{"gender": row["gender"], "cluster_id": int(row["cluster"]) if row["cluster"] is not None else -1, "count": int(row["count"]), "avg_premium": float(row["avg_premium"]), "avg_claim": float(row["avg_claim"])} for row in gender_stats.collect()]
        region_list = [{"region": row["region"], "cluster_id": int(row["cluster"]) if row["cluster"] is not None else -1, "count": int(row["count"]), "avg_medical_cost": float(row["avg_medical_cost"])} for row in region_stats.collect()]
        smoker_list = [{"smoker": row["smoker"], "cluster_id": int(row["cluster"]) if row["cluster"] is not None else -1, "count": int(row["count"]), "avg_medical_cost": float(row["avg_medical_cost"]), "avg_claim": float(row["avg_claim"])} for row in smoker_stats.collect()]
        children_list = [{"children": int(row["children"]), "cluster_id": int(row["cluster"]) if row["cluster"] is not None else -1, "count": int(row["count"]), "avg_premium": float(row["avg_premium"])} for row in children_stats.collect()]
        high_risk_list = [{"cluster_id": int(row["cluster"]), "high_risk_count": int(row["high_risk_count"]), "avg_age": float(row["avg_age"]), "avg_bmi": float(row["avg_bmi"])} for row in high_risk_profile.collect()]
        gender_stats.write.mode("overwrite").parquet("hdfs://localhost:9000/insurance/profile/gender")
        region_stats.write.mode("overwrite").parquet("hdfs://localhost:9000/insurance/profile/region")
        smoker_stats.write.mode("overwrite").parquet("hdfs://localhost:9000/insurance/profile/smoker")
        return JsonResponse({"code": 200, "msg": "投保人画像分析完成", "data": {"gender_profile": gender_list, "region_profile": region_list, "smoker_profile": smoker_list, "children_profile": children_list, "high_risk_profile": high_risk_list}})

基于大数据的健康保险数据可视化分析系统文档展示

在这里插入图片描述

💖💖作者:计算机毕业设计杰瑞 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学校实战项目 计算机毕业设计选题推荐