传统数据分析vs Hadoop+Spark大数据：电信客户流失系统评分天壤之别💖💖作者：计算机编程小咖 💙💙个人

💖💖作者：计算机编程小咖 💙💙个人简介：曾长期从事计算机专业培训教学，本人也热爱上课教学，语言擅长Java、微信小程序、Python、Golang、安卓Android等，开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法，也喜欢交流技术，大家有技术代码这一块的问题可以问我！ 💛💛想说的话：感谢大家的关注与支持！ 💜💜 网站实战项目安卓/小程序实战项目大数据实战项目深度学习实战项目

@TOC

电信客户流失数据分析系统介绍

基于大数据的电信客户流失数据分析系统是一套融合现代大数据技术与数据分析理念的综合性平台，专门针对电信行业客户流失预测与分析需求而设计。系统采用Hadoop分布式存储架构结合Spark大数据计算框架作为核心技术底座，能够高效处理海量电信客户数据，通过HDFS实现数据的可靠存储与管理。在开发实现上，系统提供Python+Django和Java+SpringBoot两套完整的技术方案，前端采用Vue框架搭配ElementUI组件库构建现代化用户界面，并集成Echarts图表库实现数据的直观可视化展示。系统功能涵盖完整的数据分析生命周期，包括系统首页、用户管理、个人信息维护、密码修改等基础功能，以及电信客户流失数据管理这一核心模块。在数据分析维度上，系统提供大屏可视化展示、合约分析、业务分析、总体分析、分群分析和特征分析六大分析模块，能够从多个角度深入挖掘客户行为特征与流失规律。通过运用Spark SQL进行大数据查询处理，结合Pandas、NumPy等数据科学库进行深度数据挖掘，系统能够为电信企业提供精准的客户流失预测与业务决策支持，实现从数据收集、存储、处理到分析展示的全流程大数据解决方案。

电信客户流失数据分析系统演示视频

演示视频

电信客户流失数据分析系统演示图片

登陆界面.png

电信客户流失数据.png

分群分析.png

合约分析.png

数据大屏.png

特征分析.png

业务分析.png

用户管理.png

总体分析.png

电信客户流失数据分析系统代码展示

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, count, sum, avg, desc, asc
from pyspark.ml.feature import VectorAssembler, StringIndexer
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
import json
import pandas as pd
import numpy as np
spark = SparkSession.builder.appName("TelecomChurnAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()
@csrf_exempt
def customer_churn_prediction(request):
   if request.method == 'POST':
       data = json.loads(request.body)
       customer_data = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/telecom/customer_data.csv")
       feature_cols = ['tenure', 'monthly_charges', 'total_charges', 'contract_type_index', 'payment_method_index']
       string_indexer_contract = StringIndexer(inputCol="contract_type", outputCol="contract_type_index")
       string_indexer_payment = StringIndexer(inputCol="payment_method", outputCol="payment_method_index")
       customer_data_indexed = string_indexer_contract.fit(customer_data).transform(customer_data)
       customer_data_indexed = string_indexer_payment.fit(customer_data_indexed).transform(customer_data_indexed)
       assembler = VectorAssembler(inputCols=feature_cols, outputCol="features")
       customer_data_features = assembler.transform(customer_data_indexed)
       churn_indexer = StringIndexer(inputCol="churn", outputCol="churn_index")
       customer_data_final = churn_indexer.fit(customer_data_features).transform(customer_data_features)
       train_data, test_data = customer_data_final.randomSplit([0.8, 0.2], seed=42)
       rf_classifier = RandomForestClassifier(featuresCol="features", labelCol="churn_index", numTrees=100)
       rf_model = rf_classifier.fit(train_data)
       predictions = rf_model.transform(test_data)
       evaluator = BinaryClassificationEvaluator(labelCol="churn_index", rawPredictionCol="rawPrediction")
       accuracy = evaluator.evaluate(predictions)
       churn_probability = predictions.select("customer_id", "probability", "prediction").collect()
       high_risk_customers = predictions.filter(col("prediction") == 1.0).select("customer_id", "monthly_charges", "tenure").orderBy(desc("monthly_charges")).limit(50).collect()
       feature_importance = rf_model.featureImportances.toArray().tolist()
       result_data = {"accuracy": accuracy, "high_risk_count": len(high_risk_customers), "feature_importance": feature_importance, "prediction_results": [{"customer_id": row.customer_id, "churn_risk": float(row.probability[1])} for row in churn_probability[:100]]}
       return JsonResponse(result_data)
@csrf_exempt
def business_analysis(request):
   if request.method == 'GET':
       business_data = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/telecom/business_data.csv")
       service_usage = business_data.groupBy("service_type").agg(count("customer_id").alias("customer_count"), avg("monthly_usage").alias("avg_usage"), sum("revenue").alias("total_revenue")).orderBy(desc("total_revenue"))
       contract_analysis = business_data.groupBy("contract_type").agg(count("customer_id").alias("customer_count"), avg("monthly_charges").alias("avg_monthly_charges"), (count(when(col("churn") == "Yes", 1)) / count("customer_id") * 100).alias("churn_rate")).orderBy(asc("churn_rate"))
       payment_method_stats = business_data.groupBy("payment_method").agg(count("customer_id").alias("customer_count"), avg("total_charges").alias("avg_total_charges"), (count(when(col("churn") == "Yes", 1)) / count("customer_id") * 100).alias("churn_rate")).orderBy(desc("customer_count"))
       tenure_segments = business_data.withColumn("tenure_group", when(col("tenure") <= 12, "0-1年").when(col("tenure") <= 24, "1-2年").when(col("tenure") <= 36, "2-3年").otherwise("3年以上"))
       tenure_analysis = tenure_segments.groupBy("tenure_group").agg(count("customer_id").alias("customer_count"), avg("monthly_charges").alias("avg_monthly_charges"), (count(when(col("churn") == "Yes", 1)) / count("customer_id") * 100).alias("churn_rate")).orderBy(col("tenure_group"))
       monthly_revenue_trend = business_data.groupBy("signup_month").agg(sum("monthly_charges").alias("monthly_revenue"), count("customer_id").alias("new_customers")).orderBy("signup_month")
       service_satisfaction = business_data.filter(col("satisfaction_score").isNotNull()).groupBy("service_type").agg(avg("satisfaction_score").alias("avg_satisfaction"), count("customer_id").alias("customer_count")).orderBy(desc("avg_satisfaction"))
       high_value_customers = business_data.filter(col("total_charges") > business_data.select(avg("total_charges")).collect()[0][0]).groupBy("service_type").agg(count("customer_id").alias("high_value_count"), avg("monthly_charges").alias("avg_charges")).orderBy(desc("high_value_count"))
       service_usage_list = [{"service_type": row.service_type, "customer_count": row.customer_count, "avg_usage": float(row.avg_usage), "total_revenue": float(row.total_revenue)} for row in service_usage.collect()]
       contract_analysis_list = [{"contract_type": row.contract_type, "customer_count": row.customer_count, "avg_monthly_charges": float(row.avg_monthly_charges), "churn_rate": float(row.churn_rate)} for row in contract_analysis.collect()]
       payment_stats_list = [{"payment_method": row.payment_method, "customer_count": row.customer_count, "avg_total_charges": float(row.avg_total_charges), "churn_rate": float(row.churn_rate)} for row in payment_method_stats.collect()]
       result_data = {"service_usage": service_usage_list, "contract_analysis": contract_analysis_list, "payment_method_stats": payment_stats_list, "tenure_analysis": [{"tenure_group": row.tenure_group, "customer_count": row.customer_count, "churn_rate": float(row.churn_rate)} for row in tenure_analysis.collect()]}
       return JsonResponse(result_data)
@csrf_exempt
def customer_segmentation(request):
   if request.method == 'POST':
       segmentation_data = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/telecom/customer_segmentation.csv")
       value_segments = segmentation_data.withColumn("value_segment", when(col("total_charges") >= segmentation_data.select(avg("total_charges")).collect()[0][0] * 1.5, "高价值客户").when(col("total_charges") >= segmentation_data.select(avg("total_charges")).collect()[0][0], "中价值客户").otherwise("低价值客户"))
       usage_segments = value_segments.withColumn("usage_segment", when(col("monthly_usage") >= value_segments.select(avg("monthly_usage")).collect()[0][0] * 1.2, "重度使用").when(col("monthly_usage") >= value_segments.select(avg("monthly_usage")).collect()[0][0] * 0.8, "中度使用").otherwise("轻度使用"))
       loyalty_segments = usage_segments.withColumn("loyalty_segment", when(col("tenure") >= 36, "忠诚客户").when(col("tenure") >= 12, "稳定客户").otherwise("新客户"))
       comprehensive_segments = loyalty_segments.withColumn("customer_segment", when((col("value_segment") == "高价值客户") & (col("loyalty_segment") == "忠诚客户"), "VIP客户").when((col("value_segment") == "高价值客户") & (col("usage_segment") == "重度使用"), "重点客户").when((col("loyalty_segment") == "新客户") & (col("usage_segment") == "重度使用"), "潜力客户").when(col("value_segment") == "低价值客户", "普通客户").otherwise("一般客户"))
       segment_analysis = comprehensive_segments.groupBy("customer_segment").agg(count("customer_id").alias("customer_count"), avg("monthly_charges").alias("avg_monthly_charges"), avg("total_charges").alias("avg_total_charges"), (count(when(col("churn") == "Yes", 1)) / count("customer_id") * 100).alias("churn_rate")).orderBy(desc("customer_count"))
       risk_assessment = comprehensive_segments.groupBy("customer_segment", "value_segment").agg(count("customer_id").alias("segment_count"), (count(when(col("churn") == "Yes", 1)) / count("customer_id") * 100).alias("risk_level")).orderBy(desc("risk_level"))
       retention_priority = comprehensive_segments.filter(col("churn") == "Yes").groupBy("customer_segment").agg(count("customer_id").alias("churn_count"), avg("monthly_charges").alias("lost_revenue")).orderBy(desc("lost_revenue"))
       cross_segment_migration = comprehensive_segments.groupBy("value_segment", "usage_segment", "loyalty_segment").agg(count("customer_id").alias("segment_size"), (count(when(col("churn") == "Yes", 1)) / count("customer_id") * 100).alias("churn_risk")).filter(col("segment_size") > 10).orderBy(desc("churn_risk"))
       segment_characteristics = comprehensive_segments.groupBy("customer_segment").agg(avg("tenure").alias("avg_tenure"), avg("satisfaction_score").alias("avg_satisfaction"), count(when(col("support_calls") > 3, 1)).alias("high_support_calls")).orderBy("customer_segment")
       segment_analysis_list = [{"customer_segment": row.customer_segment, "customer_count": row.customer_count, "avg_monthly_charges": float(row.avg_monthly_charges), "churn_rate": float(row.churn_rate)} for row in segment_analysis.collect()]
       risk_assessment_list = [{"customer_segment": row.customer_segment, "value_segment": row.value_segment, "risk_level": float(row.risk_level)} for row in risk_assessment.collect()]
       segment_summary = {"total_segments": segment_analysis.count(), "highest_risk_segment": segment_analysis.orderBy(desc("churn_rate")).select("customer_segment").first().customer_segment, "most_valuable_segment": segment_analysis.orderBy(desc("avg_total_charges")).select("customer_segment").first().customer_segment}
       result_data = {"segment_analysis": segment_analysis_list, "risk_assessment": risk_assessment_list, "segment_summary": segment_summary, "retention_priority": [{"customer_segment": row.customer_segment, "lost_revenue": float(row.lost_revenue)} for row in retention_priority.collect()[:10]]}
       return JsonResponse(result_data)

电信客户流失数据分析系统文档展示

文档.png 💖💖作者：计算机编程小咖 💙💙个人简介：曾长期从事计算机专业培训教学，本人也热爱上课教学，语言擅长Java、微信小程序、Python、Golang、安卓Android等，开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法，也喜欢交流技术，大家有技术代码这一块的问题可以问我！ 💛💛想说的话：感谢大家的关注与支持！ 💜💜 网站实战项目安卓/小程序实战项目大数据实战项目深度学习实战项目