【数据分析】基于大数据的汽车保险数据可视化分析系统 | 计算机毕设实战项目 可视化大屏 文档指导 运行部署 Hadoop SPark java Python

45 阅读8分钟

💖💖作者:计算机毕业设计江挽 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目

基于大数据的汽车保险数据可视化分析系统 介绍

基于大数据的汽车保险数据可视化分析系统是一个综合性的数据处理与分析平台,采用Hadoop+Spark分布式计算框架作为核心技术架构。系统通过Spark SQL对海量汽车保险数据进行高效查询和处理,利用Pandas和NumPy进行深度数据分析,最终通过Echarts实现多维度可视化展示。在后端技术方面,系统提供Django和Spring Boot两种实现方案,前端采用Vue+ElementUI构建交互界面。系统涵盖九大核心功能模块:用户管理模块负责系统权限控制,汽车保险数据管理模块实现数据的增删改查,客户画像分析模块通过多维度标签描绘客户特征,财务效益分析模块评估保险业务收益情况,保险产品分析模块对不同险种进行对比研究,市场营销分析模块挖掘潜在客户群体,风险管理分析模块识别高风险投保行为。整个系统将大数据技术与保险业务场景深度结合,为保险公司的数据驱动决策提供技术支撑,同时也为计算机专业学生提供了一个完整的大数据项目实践案例。

基于大数据的汽车保险数据可视化分析系统 演示视频

演示视频

基于大数据的汽车保险数据可视化分析系统 演示图片

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

基于大数据的汽车保险数据可视化分析系统 代码展示

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, sum, avg, when, concat_ws, collect_list
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DoubleType, DateType
import pandas as pd
import numpy as np
from django.http import JsonResponse
from datetime import datetime, timedelta

spark = SparkSession.builder.appName("CarInsuranceAnalysis").config("spark.sql.shuffle.partitions", "4").config("spark.executor.memory", "2g").getOrCreate()

def customer_portrait_analysis(request):
    insurance_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/car_insurance").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "insurance_data").option("user", "root").option("password", "123456").load()
    insurance_df.createOrReplaceTempView("insurance_table")
    age_distribution = spark.sql("SELECT CASE WHEN age < 25 THEN '25岁以下' WHEN age >= 25 AND age < 35 THEN '25-35岁' WHEN age >= 35 AND age < 45 THEN '35-45岁' WHEN age >= 45 AND age < 55 THEN '45-55岁' ELSE '55岁以上' END AS age_group, COUNT(*) AS customer_count FROM insurance_table GROUP BY age_group ORDER BY age_group")
    age_result = age_distribution.collect()
    age_data = [{"age_group": row.age_group, "count": row.customer_count} for row in age_result]
    gender_distribution = spark.sql("SELECT gender, COUNT(*) AS customer_count, AVG(premium_amount) AS avg_premium FROM insurance_table GROUP BY gender")
    gender_result = gender_distribution.collect()
    gender_data = [{"gender": row.gender, "count": row.customer_count, "avg_premium": float(row.avg_premium)} for row in gender_result]
    car_brand_analysis = spark.sql("SELECT car_brand, COUNT(*) AS brand_count, AVG(insurance_years) AS avg_years, SUM(claim_count) AS total_claims FROM insurance_table GROUP BY car_brand ORDER BY brand_count DESC LIMIT 10")
    brand_result = car_brand_analysis.collect()
    brand_data = [{"brand": row.car_brand, "count": row.brand_count, "avg_years": float(row.avg_years), "claims": row.total_claims} for row in brand_result]
    region_analysis = spark.sql("SELECT province, city, COUNT(*) AS customer_count, AVG(premium_amount) AS avg_premium FROM insurance_table GROUP BY province, city ORDER BY customer_count DESC LIMIT 20")
    region_result = region_analysis.collect()
    region_data = [{"province": row.province, "city": row.city, "count": row.customer_count, "avg_premium": float(row.avg_premium)} for row in region_result]
    risk_level_df = insurance_df.withColumn("risk_level", when(col("claim_count") >= 3, "高风险").when((col("claim_count") >= 1) & (col("claim_count") < 3), "中风险").otherwise("低风险"))
    risk_level_df.createOrReplaceTempView("risk_table")
    risk_distribution = spark.sql("SELECT risk_level, COUNT(*) AS customer_count, AVG(premium_amount) AS avg_premium, SUM(claim_amount) AS total_claim FROM risk_table GROUP BY risk_level")
    risk_result = risk_distribution.collect()
    risk_data = [{"risk_level": row.risk_level, "count": row.customer_count, "avg_premium": float(row.avg_premium), "total_claim": float(row.total_claim)} for row in risk_result]
    return JsonResponse({"code": 200, "message": "客户画像分析完成", "data": {"age_distribution": age_data, "gender_distribution": gender_data, "brand_analysis": brand_data, "region_analysis": region_data, "risk_distribution": risk_data}})

def financial_benefit_analysis(request):
    start_date = request.GET.get('start_date', '2023-01-01')
    end_date = request.GET.get('end_date', '2024-12-31')
    financial_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/car_insurance").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "insurance_data").option("user", "root").option("password", "123456").load()
    financial_df = financial_df.filter((col("policy_start_date") >= start_date) & (col("policy_start_date") <= end_date))
    financial_df.createOrReplaceTempView("financial_table")
    monthly_revenue = spark.sql("SELECT DATE_FORMAT(policy_start_date, 'yyyy-MM') AS month, SUM(premium_amount) AS total_premium, SUM(claim_amount) AS total_claim, SUM(premium_amount) - SUM(claim_amount) AS net_profit, COUNT(DISTINCT customer_id) AS customer_count FROM financial_table GROUP BY DATE_FORMAT(policy_start_date, 'yyyy-MM') ORDER BY month")
    monthly_result = monthly_revenue.collect()
    monthly_data = [{"month": row.month, "premium": float(row.total_premium), "claim": float(row.total_claim), "profit": float(row.net_profit), "customers": row.customer_count} for row in monthly_result]
    product_profit = spark.sql("SELECT insurance_type, COUNT(*) AS policy_count, SUM(premium_amount) AS total_premium, SUM(claim_amount) AS total_claim, SUM(premium_amount) - SUM(claim_amount) AS net_profit, (SUM(premium_amount) - SUM(claim_amount)) / SUM(premium_amount) * 100 AS profit_rate FROM financial_table GROUP BY insurance_type ORDER BY net_profit DESC")
    product_result = product_profit.collect()
    product_data = [{"product": row.insurance_type, "count": row.policy_count, "premium": float(row.total_premium), "claim": float(row.total_claim), "profit": float(row.net_profit), "rate": float(row.profit_rate)} for row in product_result]
    claim_ratio_df = financial_df.withColumn("claim_ratio", (col("claim_amount") / col("premium_amount") * 100))
    claim_ratio_df.createOrReplaceTempView("claim_ratio_table")
    high_claim_products = spark.sql("SELECT insurance_type, AVG(claim_ratio) AS avg_claim_ratio, COUNT(*) AS policy_count FROM claim_ratio_table GROUP BY insurance_type HAVING AVG(claim_ratio) > 60 ORDER BY avg_claim_ratio DESC")
    high_claim_result = high_claim_products.collect()
    high_claim_data = [{"product": row.insurance_type, "ratio": float(row.avg_claim_ratio), "count": row.policy_count} for row in high_claim_result]
    quarterly_analysis = spark.sql("SELECT CONCAT(YEAR(policy_start_date), '-Q', QUARTER(policy_start_date)) AS quarter, SUM(premium_amount) AS total_premium, SUM(claim_amount) AS total_claim, COUNT(*) AS policy_count FROM financial_table GROUP BY YEAR(policy_start_date), QUARTER(policy_start_date) ORDER BY quarter")
    quarterly_result = quarterly_analysis.collect()
    quarterly_data = [{"quarter": row.quarter, "premium": float(row.total_premium), "claim": float(row.total_claim), "count": row.policy_count} for row in quarterly_result]
    overall_stats = spark.sql("SELECT SUM(premium_amount) AS total_revenue, SUM(claim_amount) AS total_cost, SUM(premium_amount) - SUM(claim_amount) AS total_profit, COUNT(DISTINCT customer_id) AS total_customers, COUNT(*) AS total_policies FROM financial_table")
    overall_result = overall_stats.first()
    overall_data = {"revenue": float(overall_result.total_revenue), "cost": float(overall_result.total_cost), "profit": float(overall_result.total_profit), "customers": overall_result.total_customers, "policies": overall_result.total_policies}
    return JsonResponse({"code": 200, "message": "财务效益分析完成", "data": {"monthly_analysis": monthly_data, "product_profit": product_data, "high_claim_products": high_claim_data, "quarterly_analysis": quarterly_data, "overall_statistics": overall_data}})

def risk_management_analysis(request):
    risk_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/car_insurance").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "insurance_data").option("user", "root").option("password", "123456").load()
    risk_df.createOrReplaceTempView("risk_management_table")
    high_frequency_claims = spark.sql("SELECT customer_id, customer_name, COUNT(*) AS claim_frequency, SUM(claim_amount) AS total_claim_amount, AVG(claim_amount) AS avg_claim_amount FROM risk_management_table WHERE claim_count > 0 GROUP BY customer_id, customer_name HAVING COUNT(*) >= 3 ORDER BY claim_frequency DESC, total_claim_amount DESC LIMIT 50")
    high_freq_result = high_frequency_claims.collect()
    high_freq_data = [{"customer_id": row.customer_id, "name": row.customer_name, "frequency": row.claim_frequency, "total_amount": float(row.total_claim_amount), "avg_amount": float(row.avg_claim_amount)} for row in high_freq_result]
    large_amount_claims = spark.sql("SELECT customer_id, customer_name, claim_amount, claim_date, claim_reason FROM risk_management_table WHERE claim_amount > 50000 ORDER BY claim_amount DESC")
    large_amount_result = large_amount_claims.collect()
    large_amount_data = [{"customer_id": row.customer_id, "name": row.customer_name, "amount": float(row.claim_amount), "date": str(row.claim_date), "reason": row.claim_reason} for row in large_amount_result]
    suspicious_pattern_df = risk_df.withColumn("days_between_policy_claim", (col("claim_date").cast("long") - col("policy_start_date").cast("long")) / 86400)
    suspicious_pattern_df.createOrReplaceTempView("suspicious_table")
    early_claims = spark.sql("SELECT customer_id, customer_name, policy_start_date, claim_date, days_between_policy_claim, claim_amount FROM suspicious_table WHERE days_between_policy_claim < 30 AND claim_amount > 10000 ORDER BY days_between_policy_claim ASC")
    early_claim_result = early_claims.collect()
    early_claim_data = [{"customer_id": row.customer_id, "name": row.customer_name, "policy_date": str(row.policy_start_date), "claim_date": str(row.claim_date), "days_gap": int(row.days_between_policy_claim), "amount": float(row.claim_amount)} for row in early_claim_result]
    risk_region_analysis = spark.sql("SELECT province, city, COUNT(*) AS total_policies, SUM(CASE WHEN claim_count > 0 THEN 1 ELSE 0 END) AS claim_policies, SUM(CASE WHEN claim_count > 0 THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS claim_rate, AVG(claim_amount) AS avg_claim_amount FROM risk_management_table GROUP BY province, city HAVING claim_rate > 30 ORDER BY claim_rate DESC LIMIT 30")
    risk_region_result = risk_region_analysis.collect()
    risk_region_data = [{"province": row.province, "city": row.city, "total": row.total_policies, "claims": row.claim_policies, "rate": float(row.claim_rate), "avg_amount": float(row.avg_claim_amount)} for row in risk_region_result]
    fraud_score_df = risk_df.withColumn("fraud_score", (when(col("claim_count") >= 3, 30).otherwise(0) + when((col("claim_date").cast("long") - col("policy_start_date").cast("long")) / 86400 < 30, 25).otherwise(0) + when(col("claim_amount") > 50000, 25).otherwise(0) + when(col("age") < 25, 20).otherwise(0)))
    fraud_score_df.createOrReplaceTempView("fraud_score_table")
    high_risk_customers = spark.sql("SELECT customer_id, customer_name, claim_count, claim_amount, age, fraud_score FROM fraud_score_table WHERE fraud_score >= 50 ORDER BY fraud_score DESC, claim_amount DESC")
    high_risk_result = high_risk_customers.collect()
    high_risk_data = [{"customer_id": row.customer_id, "name": row.customer_name, "claims": row.claim_count, "amount": float(row.claim_amount), "age": row.age, "score": row.fraud_score} for row in high_risk_result]
    claim_type_analysis = spark.sql("SELECT claim_reason, COUNT(*) AS occurrence_count, SUM(claim_amount) AS total_amount, AVG(claim_amount) AS avg_amount FROM risk_management_table WHERE claim_count > 0 GROUP BY claim_reason ORDER BY occurrence_count DESC")
    claim_type_result = claim_type_analysis.collect()
    claim_type_data = [{"reason": row.claim_reason, "count": row.occurrence_count, "total": float(row.total_amount), "average": float(row.avg_amount)} for row in claim_type_result]
    return JsonResponse({"code": 200, "message": "风险管理分析完成", "data": {"high_frequency_claims": high_freq_data, "large_amount_claims": large_amount_data, "early_claims": early_claim_data, "risk_regions": risk_region_data, "high_risk_customers": high_risk_data, "claim_type_analysis": claim_type_data}})

基于大数据的汽车保险数据可视化分析系统 文档展示

在这里插入图片描述

💖💖作者:计算机毕业设计江挽 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目