基于大数据的消费者信用评分画像数据分析系统 | Hadoop+Spark+Python:3大技术栈搞定消费者信用评分画像分析系统毕设

48 阅读7分钟

💖💖作者:计算机毕业设计杰瑞 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学校实战项目 计算机毕业设计选题推荐

基于大数据的消费者信用评分画像数据分析系统介绍

消费者信用评分画像数据分析系统是一个基于大数据技术栈构建的综合性分析平台,采用Hadoop+Spark+Python的技术架构,实现对消费者信用数据的深度挖掘与智能分析。系统通过HDFS分布式文件系统存储海量消费者信用数据,利用Spark强大的内存计算能力进行数据清洗、特征工程和模型训练,结合Pandas、NumPy等数据科学库完成复杂的统计分析任务。系统核心功能涵盖信用数据管理、用户画像分析、消费行为分析、信用评分分析、生活偏好分析、用户分群画像等多个维度,通过Vue+ElementUI+Echarts构建的前端界面实现数据的可视化展示。系统采用Django框架作为后端服务,提供RESTful API接口支持前后端分离架构,MySQL数据库负责存储处理后的结构化数据和分析结果。整个系统能够处理大规模消费者信用数据,通过多维度分析构建精准的用户画像,为信用风险评估和个性化服务提供数据支撑,是一个集数据采集、存储、处理、分析、可视化于一体的完整大数据解决方案。

基于大数据的消费者信用评分画像数据分析系统演示视频

演示视频

基于大数据的消费者信用评分画像数据分析系统演示图片

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

在这里插入图片描述 在这里插入图片描述

基于大数据的消费者信用评分画像数据分析系统代码展示

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
import pandas as pd
import numpy as np
from django.http import JsonResponse
from django.views import View
import json
import mysql.connector
from datetime import datetime

spark = SparkSession.builder.appName("ConsumerCreditAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()

class UserPortraitAnalysis(View):
    def post(self, request):
        data = json.loads(request.body)
        user_id = data.get('user_id')
        df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/credit_db").option("dbtable", "user_credit_data").option("user", "root").option("password", "password").load()
        user_df = df.filter(df.user_id == user_id)
        age_group = when(user_df.age < 25, "青年群体").when((user_df.age >= 25) & (user_df.age < 35), "中青年群体").when((user_df.age >= 35) & (user_df.age < 50), "中年群体").otherwise("中老年群体")
        income_level = when(user_df.monthly_income < 5000, "低收入").when((user_df.monthly_income >= 5000) & (user_df.monthly_income < 15000), "中等收入").when((user_df.monthly_income >= 15000) & (user_df.monthly_income < 30000), "中高收入").otherwise("高收入")
        consumption_pattern = when(user_df.consumption_score > 80, "高消费型").when((user_df.consumption_score >= 60) & (user_df.consumption_score <= 80), "中等消费型").otherwise("理性消费型")
        portrait_df = user_df.withColumn("age_group", age_group).withColumn("income_level", income_level).withColumn("consumption_pattern", consumption_pattern)
        education_mapping = {"本科": 3, "硕士": 4, "博士": 5, "高中": 2, "初中": 1}
        education_score = when(portrait_df.education == "博士", 5).when(portrait_df.education == "硕士", 4).when(portrait_df.education == "本科", 3).when(portrait_df.education == "高中", 2).otherwise(1)
        occupation_score = when(portrait_df.occupation.isin(["公务员", "教师", "医生"]), 5).when(portrait_df.occupation.isin(["工程师", "经理", "律师"]), 4).when(portrait_df.occupation.isin(["销售", "文员", "技术员"]), 3).otherwise(2)
        stability_score = (education_score * 0.3 + occupation_score * 0.4 + (portrait_df.work_years / 10) * 0.3) * 20
        risk_level = when(stability_score > 80, "低风险").when((stability_score >= 60) & (stability_score <= 80), "中等风险").otherwise("高风险")
        final_portrait = portrait_df.withColumn("stability_score", stability_score).withColumn("risk_level", risk_level)
        result = final_portrait.select("user_id", "age_group", "income_level", "consumption_pattern", "stability_score", "risk_level").collect()
        portrait_data = {"user_id": result[0]["user_id"], "age_group": result[0]["age_group"], "income_level": result[0]["income_level"], "consumption_pattern": result[0]["consumption_pattern"], "stability_score": float(result[0]["stability_score"]), "risk_level": result[0]["risk_level"]}
        return JsonResponse({"code": 200, "message": "用户画像分析完成", "data": portrait_data})

class CreditScoreAnalysis(View):
    def post(self, request):
        data = json.loads(request.body)
        analysis_type = data.get('analysis_type', 'overall')
        df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/credit_db").option("dbtable", "credit_score_data").option("user", "root").option("password", "password").load()
        payment_weight = 0.35
        debt_weight = 0.30
        history_weight = 0.15
        diversity_weight = 0.10
        inquiry_weight = 0.10
        normalized_payment = (df.payment_history - df.payment_history.agg({"payment_history": "min"}).collect()[0][0]) / (df.payment_history.agg({"payment_history": "max"}).collect()[0][0] - df.payment_history.agg({"payment_history": "min"}).collect()[0][0])
        normalized_debt = 1 - ((df.debt_ratio - df.debt_ratio.agg({"debt_ratio": "min"}).collect()[0][0]) / (df.debt_ratio.agg({"debt_ratio": "max"}).collect()[0][0] - df.debt_ratio.agg({"debt_ratio": "min"}).collect()[0][0]))
        normalized_history = (df.credit_history_length - df.credit_history_length.agg({"credit_history_length": "min"}).collect()[0][0]) / (df.credit_history_length.agg({"credit_history_length": "max"}).collect()[0][0] - df.credit_history_length.agg({"credit_history_length": "min"}).collect()[0][0])
        normalized_diversity = (df.credit_diversity - df.credit_diversity.agg({"credit_diversity": "min"}).collect()[0][0]) / (df.credit_diversity.agg({"credit_diversity": "max"}).collect()[0][0] - df.credit_diversity.agg({"credit_diversity": "min"}).collect()[0][0])
        normalized_inquiry = 1 - ((df.recent_inquiry - df.recent_inquiry.agg({"recent_inquiry": "min"}).collect()[0][0]) / (df.recent_inquiry.agg({"recent_inquiry": "max"}).collect()[0][0] - df.recent_inquiry.agg({"recent_inquiry": "min"}).collect()[0][0]))
        credit_score = (normalized_payment * payment_weight + normalized_debt * debt_weight + normalized_history * history_weight + normalized_diversity * diversity_weight + normalized_inquiry * inquiry_weight) * 850 + 300
        score_level = when(credit_score >= 750, "优秀").when((credit_score >= 700) & (credit_score < 750), "良好").when((credit_score >= 650) & (credit_score < 700), "一般").when((credit_score >= 600) & (credit_score < 650), "较差").otherwise("很差")
        score_df = df.withColumn("calculated_score", credit_score).withColumn("score_level", score_level)
        if analysis_type == 'distribution':
            score_distribution = score_df.groupBy("score_level").count().orderBy("count", ascending=False)
            distribution_result = score_distribution.collect()
            distribution_data = [{"level": row["score_level"], "count": row["count"]} for row in distribution_result]
            return JsonResponse({"code": 200, "message": "信用评分分布分析完成", "data": distribution_data})
        avg_score = score_df.agg({"calculated_score": "avg"}).collect()[0][0]
        max_score = score_df.agg({"calculated_score": "max"}).collect()[0][0]
        min_score = score_df.agg({"calculated_score": "min"}).collect()[0][0]
        score_stats = {"average_score": float(avg_score), "max_score": float(max_score), "min_score": float(min_score), "total_users": score_df.count()}
        return JsonResponse({"code": 200, "message": "信用评分统计分析完成", "data": score_stats})

class ConsumptionBehaviorAnalysis(View):
    def post(self, request):
        data = json.loads(request.body)
        time_period = data.get('time_period', 'month')
        category_filter = data.get('category', 'all')
        df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/credit_db").option("dbtable", "consumption_data").option("user", "root").option("password", "password").load()
        df = df.withColumn("transaction_date", to_date(df.transaction_time, "yyyy-MM-dd"))
        df = df.withColumn("month", month(df.transaction_date)).withColumn("week", weekofyear(df.transaction_date)).withColumn("day", dayofmonth(df.transaction_date))
        if category_filter != 'all':
            df = df.filter(df.category == category_filter)
        if time_period == 'week':
            time_grouped = df.groupBy("week").agg(sum("amount").alias("total_amount"), count("transaction_id").alias("transaction_count"), avg("amount").alias("avg_amount"))
            time_grouped = time_grouped.orderBy("week")
        elif time_period == 'day':
            time_grouped = df.groupBy("day").agg(sum("amount").alias("total_amount"), count("transaction_id").alias("transaction_count"), avg("amount").alias("avg_amount"))
            time_grouped = time_grouped.orderBy("day")
        else:
            time_grouped = df.groupBy("month").agg(sum("amount").alias("total_amount"), count("transaction_id").alias("transaction_count"), avg("amount").alias("avg_amount"))
            time_grouped = time_grouped.orderBy("month")
        category_analysis = df.groupBy("category").agg(sum("amount").alias("category_total"), count("transaction_id").alias("category_count"), avg("amount").alias("category_avg"))
        category_analysis = category_analysis.orderBy("category_total", ascending=False)
        user_behavior = df.groupBy("user_id").agg(sum("amount").alias("user_total"), count("transaction_id").alias("user_count"), countDistinct("category").alias("category_diversity"))
        high_value_users = user_behavior.filter(user_behavior.user_total > user_behavior.agg({"user_total": "avg"}).collect()[0][0] * 1.5)
        frequent_users = user_behavior.filter(user_behavior.user_count > user_behavior.agg({"user_count": "avg"}).collect()[0][0] * 2)
        time_result = time_grouped.collect()
        category_result = category_analysis.collect()
        behavior_stats = {"high_value_user_count": high_value_users.count(), "frequent_user_count": frequent_users.count(), "total_users": user_behavior.count()}
        time_data = [{"period": row[time_period if time_period in ['week', 'day', 'month'] else 'month'], "total_amount": float(row["total_amount"]), "transaction_count": row["transaction_count"], "avg_amount": float(row["avg_amount"])} for row in time_result]
        category_data = [{"category": row["category"], "total_amount": float(row["category_total"]), "transaction_count": row["category_count"], "avg_amount": float(row["category_avg"])} for row in category_result]
        analysis_result = {"time_analysis": time_data, "category_analysis": category_data, "behavior_statistics": behavior_stats}
        return JsonResponse({"code": 200, "message": "消费行为分析完成", "data": analysis_result})

基于大数据的消费者信用评分画像数据分析系统文档展示

在这里插入图片描述

💖💖作者:计算机毕业设计杰瑞 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学校实战项目 计算机毕业设计选题推荐