一、个人简介
💖💖作者:计算机编程果茶熊 💙💙个人简介:曾长期从事计算机专业培训教学,担任过编程老师,同时本人也热爱上课教学,擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 计算机毕业设计选题 💕💕文末获取源码联系计算机编程果茶熊
二、系统介绍
大数据框架:Hadoop+Spark(Hive需要定制修改) 开发语言:Java+Python(两个版本都支持) 数据库:MySQL 后端框架:SpringBoot(Spring+SpringMVC+Mybatis)+Django(两个版本都支持) 前端:Vue+Echarts+HTML+CSS+JavaScript+jQuery
汽车保险数据可视化分析系统是一个基于大数据技术构建的智能化保险业务分析平台。该系统采用Hadoop+Spark分布式计算架构作为数据处理核心,结合Django后端框架和Vue+ElementUI+Echarts前端技术栈,实现了对海量汽车保险数据的高效存储、处理和可视化展示。系统通过HDFS分布式文件系统存储保险业务数据,利用Spark SQL进行大规模数据查询和分析,配合Pandas、NumPy等数据科学库完成复杂的统计计算。平台集成了用户管理、汽车保险数据管理、客户画像分析、财务效益分析、保险产品分析、市场营销分析、风险管理分析等核心功能模块,并提供直观的可视化大屏展示。系统能够帮助保险公司深入洞察客户行为特征,优化产品设计策略,提升风险识别能力,为保险业务的数字化转型和智能化决策提供有力支撑。
三、视频解说
四、部分功能展示
五、部分代码展示
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, sum, avg, when, desc, asc
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.clustering import KMeans
import pandas as pd
import numpy as np
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
import json
spark = SparkSession.builder.appName("InsuranceDataAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()
def customer_portrait_analysis(request):
customer_df = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/insurance/customer_data.csv")
policy_df = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/insurance/policy_data.csv")
claim_df = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/insurance/claim_data.csv")
customer_policy = customer_df.join(policy_df, "customer_id", "left")
customer_claim = customer_policy.join(claim_df, "policy_id", "left")
age_groups = customer_claim.withColumn("age_group", when(col("age") < 25, "青年").when(col("age") < 35, "中年轻").when(col("age") < 45, "中年").when(col("age") < 60, "中老年").otherwise("老年"))
age_analysis = age_groups.groupBy("age_group").agg(count("customer_id").alias("customer_count"), avg("premium_amount").alias("avg_premium"), sum("claim_amount").alias("total_claim"))
income_analysis = customer_claim.withColumn("income_level", when(col("annual_income") < 50000, "低收入").when(col("annual_income") < 100000, "中等收入").when(col("annual_income") < 200000, "高收入").otherwise("超高收入"))
income_stats = income_analysis.groupBy("income_level").agg(count("customer_id").alias("customer_count"), avg("premium_amount").alias("avg_premium"), avg("claim_frequency").alias("avg_claim_freq"))
vehicle_analysis = customer_claim.groupBy("vehicle_type", "vehicle_age").agg(count("policy_id").alias("policy_count"), avg("premium_amount").alias("avg_premium"), sum("claim_amount").alias("total_claim"))
feature_cols = ["age", "annual_income", "premium_amount", "claim_frequency", "vehicle_age"]
assembler = VectorAssembler(inputCols=feature_cols, outputCol="features")
feature_data = assembler.transform(customer_claim.na.drop())
kmeans = KMeans(k=5, seed=42, featuresCol="features", predictionCol="cluster")
model = kmeans.fit(feature_data)
clustered_data = model.transform(feature_data)
cluster_summary = clustered_data.groupBy("cluster").agg(count("customer_id").alias("cluster_size"), avg("age").alias("avg_age"), avg("annual_income").alias("avg_income"), avg("premium_amount").alias("avg_premium"))
risk_score = customer_claim.withColumn("risk_score", (col("claim_frequency") * 0.4 + col("claim_amount") / col("premium_amount") * 0.6) * 100)
high_risk_customers = risk_score.filter(col("risk_score") > 80).select("customer_id", "customer_name", "risk_score", "claim_frequency", "total_claim_amount").orderBy(desc("risk_score"))
age_result = age_analysis.toPandas().to_dict('records')
income_result = income_stats.toPandas().to_dict('records')
vehicle_result = vehicle_analysis.toPandas().to_dict('records')
cluster_result = cluster_summary.toPandas().to_dict('records')
risk_result = high_risk_customers.limit(100).toPandas().to_dict('records')
return JsonResponse({'age_analysis': age_result, 'income_analysis': income_result, 'vehicle_analysis': vehicle_result, 'customer_clusters': cluster_result, 'high_risk_customers': risk_result})
def financial_benefit_analysis(request):
premium_df = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/insurance/premium_data.csv")
claim_df = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/insurance/claim_data.csv")
expense_df = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/insurance/expense_data.csv")
monthly_premium = premium_df.groupBy("year", "month").agg(sum("premium_amount").alias("total_premium"), count("policy_id").alias("policy_count"))
monthly_claim = claim_df.groupBy("year", "month").agg(sum("claim_amount").alias("total_claim"), count("claim_id").alias("claim_count"))
monthly_expense = expense_df.groupBy("year", "month").agg(sum("expense_amount").alias("total_expense"))
financial_data = monthly_premium.join(monthly_claim, ["year", "month"], "left").join(monthly_expense, ["year", "month"], "left")
financial_data = financial_data.fillna(0)
profit_analysis = financial_data.withColumn("gross_profit", col("total_premium") - col("total_claim")).withColumn("net_profit", col("total_premium") - col("total_claim") - col("total_expense")).withColumn("profit_margin", col("net_profit") / col("total_premium") * 100).withColumn("claim_ratio", col("total_claim") / col("total_premium") * 100)
product_premium = premium_df.groupBy("product_type").agg(sum("premium_amount").alias("product_premium"), count("policy_id").alias("product_policies"))
product_claim = claim_df.groupBy("product_type").agg(sum("claim_amount").alias("product_claim"), count("claim_id").alias("product_claims"))
product_profit = product_premium.join(product_claim, "product_type", "left").fillna(0)
product_profit = product_profit.withColumn("product_net_profit", col("product_premium") - col("product_claim")).withColumn("product_profit_margin", col("product_net_profit") / col("product_premium") * 100).withColumn("product_claim_ratio", col("product_claim") / col("product_premium") * 100)
quarterly_data = financial_data.withColumn("quarter", when(col("month") <= 3, "Q1").when(col("month") <= 6, "Q2").when(col("month") <= 9, "Q3").otherwise("Q4"))
quarterly_summary = quarterly_data.groupBy("year", "quarter").agg(sum("total_premium").alias("quarter_premium"), sum("total_claim").alias("quarter_claim"), sum("total_expense").alias("quarter_expense"), avg("profit_margin").alias("avg_profit_margin"))
cost_structure = expense_df.groupBy("expense_category").agg(sum("expense_amount").alias("category_expense")).withColumn("expense_percentage", col("category_expense") / expense_df.agg(sum("expense_amount")).collect()[0][0] * 100)
growth_rate = financial_data.orderBy("year", "month").withColumn("prev_premium", col("total_premium").lag(1).over(Window.orderBy("year", "month"))).withColumn("premium_growth_rate", (col("total_premium") - col("prev_premium")) / col("prev_premium") * 100)
roi_analysis = financial_data.withColumn("investment_return", col("net_profit") / col("total_expense") * 100).filter(col("total_expense") > 0)
monthly_result = profit_analysis.orderBy("year", "month").toPandas().to_dict('records')
product_result = product_profit.orderBy(desc("product_net_profit")).toPandas().to_dict('records')
quarterly_result = quarterly_summary.orderBy("year", "quarter").toPandas().to_dict('records')
cost_result = cost_structure.orderBy(desc("category_expense")).toPandas().to_dict('records')
growth_result = growth_rate.filter(col("prev_premium").isNotNull()).toPandas().to_dict('records')
roi_result = roi_analysis.toPandas().to_dict('records')
return JsonResponse({'monthly_profit': monthly_result, 'product_profit': product_result, 'quarterly_summary': quarterly_result, 'cost_structure': cost_result, 'growth_analysis': growth_result, 'roi_analysis': roi_result})
def risk_management_analysis(request):
claim_df = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/insurance/claim_data.csv")
policy_df = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/insurance/policy_data.csv")
customer_df = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/insurance/customer_data.csv")
fraud_df = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/insurance/fraud_detection.csv")
claim_frequency = claim_df.groupBy("policy_id").agg(count("claim_id").alias("claim_count"), sum("claim_amount").alias("total_claim_amount"), avg("claim_amount").alias("avg_claim_amount"))
high_frequency_claims = claim_frequency.filter(col("claim_count") > 3).orderBy(desc("claim_count"))
claim_type_analysis = claim_df.groupBy("claim_type").agg(count("claim_id").alias("type_count"), sum("claim_amount").alias("type_total_amount"), avg("claim_amount").alias("type_avg_amount")).withColumn("risk_level", when(col("type_avg_amount") > 50000, "高风险").when(col("type_avg_amount") > 20000, "中风险").otherwise("低风险"))
geographical_risk = claim_df.groupBy("region", "city").agg(count("claim_id").alias("region_claims"), sum("claim_amount").alias("region_amount"), avg("claim_amount").alias("region_avg")).withColumn("regional_risk_score", col("region_claims") * 0.3 + col("region_avg") / 10000 * 0.7)
age_risk_analysis = claim_df.join(customer_df, "customer_id", "inner").withColumn("age_group", when(col("age") < 25, "青年").when(col("age") < 35, "中年轻").when(col("age") < 45, "中年").when(col("age") < 60, "中老年").otherwise("老年"))
age_risk_stats = age_risk_analysis.groupBy("age_group").agg(count("claim_id").alias("age_claims"), avg("claim_amount").alias("age_avg_claim"), sum("claim_amount").alias("age_total_claim"))
vehicle_risk = claim_df.join(policy_df, "policy_id", "inner").groupBy("vehicle_type", "vehicle_age").agg(count("claim_id").alias("vehicle_claims"), avg("claim_amount").alias("vehicle_avg_claim")).withColumn("vehicle_risk_index", col("vehicle_claims") * 0.4 + col("vehicle_avg_claim") / 10000 * 0.6)
fraud_detection = fraud_df.filter(col("fraud_probability") > 0.7).join(claim_df, "claim_id", "inner")
suspicious_patterns = fraud_detection.groupBy("fraud_reason").agg(count("claim_id").alias("pattern_count"), avg("claim_amount").alias("pattern_avg_amount"))
seasonal_risk = claim_df.withColumn("season", when(col("month").isin([12, 1, 2]), "冬季").when(col("month").isin([3, 4, 5]), "春季").when(col("month").isin([6, 7, 8]), "夏季").otherwise("秋季"))
seasonal_analysis = seasonal_risk.groupBy("season").agg(count("claim_id").alias("seasonal_claims"), avg("claim_amount").alias("seasonal_avg"), sum("claim_amount").alias("seasonal_total"))
risk_threshold = claim_frequency.filter(col("total_claim_amount") > 100000).withColumn("risk_category", when(col("total_claim_amount") > 500000, "极高风险").when(col("total_claim_amount") > 200000, "高风险").otherwise("中高风险"))
early_warning = claim_df.filter(col("claim_status") == "pending").join(policy_df, "policy_id", "inner").withColumn("days_since_claim", (current_date() - col("claim_date")).cast("int")).filter(col("days_since_claim") > 30)
frequency_result = high_frequency_claims.limit(50).toPandas().to_dict('records')
type_result = claim_type_analysis.orderBy(desc("type_avg_amount")).toPandas().to_dict('records')
geo_result = geographical_risk.orderBy(desc("regional_risk_score")).toPandas().to_dict('records')
age_result = age_risk_stats.orderBy(desc("age_avg_claim")).toPandas().to_dict('records')
vehicle_result = vehicle_risk.orderBy(desc("vehicle_risk_index")).toPandas().to_dict('records')
fraud_result = suspicious_patterns.orderBy(desc("pattern_count")).toPandas().to_dict('records')
seasonal_result = seasonal_analysis.toPandas().to_dict('records')
threshold_result = risk_threshold.orderBy(desc("total_claim_amount")).toPandas().to_dict('records')
warning_result = early_warning.select("policy_id", "customer_id", "claim_amount", "days_since_claim").toPandas().to_dict('records')
return JsonResponse({'high_frequency_claims': frequency_result, 'claim_type_risk': type_result, 'geographical_risk': geo_result, 'age_risk_analysis': age_result, 'vehicle_risk_analysis': vehicle_result, 'fraud_patterns': fraud_result, 'seasonal_risk': seasonal_result, 'high_risk_policies': threshold_result, 'early_warning': warning_result})
六、部分文档展示
七、END
💕💕文末获取源码联系计算机编程果茶熊