计算机编程指导师****
⭐⭐个人介绍:自己非常喜欢研究技术问题!专业做Java、Python、小程序、安卓、大数据、爬虫、Golang、大屏、爬虫、深度学习、机器学习、预测等实战项目。
⛽⛽实战项目:有源码或者技术上的问题欢迎在评论区一起讨论交流!
⚡⚡如果遇到具体的技术问题或计算机毕设方面需求,你也可以在主页上咨询我~~
⚡⚡获取源码主页--> space.bilibili.com/35463818075…****
汽车各品牌投诉数据分析与可视化系统- 简介****
基于Hadoop的汽车各品牌投诉数据分析与可视化系统是一套专门针对汽车行业消费者投诉数据进行深度分析的大数据处理平台。该系统以Hadoop分布式文件系统为底层存储架构,结合Spark大数据计算引擎的强大处理能力,实现对海量汽车投诉数据的高效存储、快速检索和智能分析。系统采用Python作为主要开发语言,后端基于Django框架构建RESTful API接口,前端运用Vue.js配合ElementUI组件库和ECharts图表库打造直观友好的数据可视化界面。在数据处理层面,系统充分利用Spark SQL进行结构化查询,结合Pandas和NumPy进行数据预处理和统计分析,实现从原始投诉数据到有价值商业洞察的完整转换。系统核心功能涵盖品牌维度的投诉量排名分析、投诉问题类型分布统计、车型维度的投诉对比研究以及基于自然语言处理技术的投诉文本挖掘分析,通过多维度交叉分析帮助汽车制造商识别产品质量问题、优化售后服务流程,同时为消费者提供购车决策参考,为行业监管部门提供数据支撑,构建了一个完整的汽车投诉数据分析生态系统。
汽车各品牌投诉数据分析与可视化系统-技术 框架****
开发语言:Python或Java(两个版本都支持)
大数据框架:Hadoop+Spark(本次没用Hive,支持定制)
后端框架:Django+Spring Boot(Spring+SpringMVC+Mybatis)(两个版本都支持)
前端:Vue+ElementUI+Echarts+HTML+CSS+JavaScript+jQuery
详细技术点:Hadoop、HDFS、Spark、Spark SQL、Pandas、NumPy
数据库:MySQL
汽车各品牌投诉数据分析与可视化系统- 背景****
随着我国汽车工业的快速发展和消费者权益保护意识的不断提升,汽车产品质量投诉问题日益受到社会各界关注。传统的投诉处理方式主要依靠人工统计和简单的数据库查询,面对日益增长的投诉数据量显得力不从心,无法有效挖掘投诉数据中蕴含的深层次规律和价值信息。汽车制造企业需要通过对投诉数据的深度分析来识别产品设计缺陷、生产质量问题和服务流程不足,消费者也希望获得更加透明和客观的产品质量信息来指导购车决策。现有的数据分析工具大多停留在基础统计层面,缺乏对海量非结构化投诉文本的深度挖掘能力,无法满足多维度、多层次的分析需求。大数据技术的成熟为解决这一问题提供了新的思路,Hadoop生态系统的分布式存储和并行计算能力,结合机器学习和自然语言处理技术,能够有效处理大规模投诉数据并提取有价值的商业智能。
本课题的研究具有重要的实际应用价值和技术探索意义。对于汽车制造企业而言,系统提供的多维度投诉数据分析能够帮助企业及时发现产品质量问题的集中爆发点,通过品牌投诉趋势分析和问题类型分布统计,企业可以有针对性地改进产品设计和生产工艺,降低产品召回风险和品牌声誉损失。消费者角度来看,系统的可视化分析结果为购车决策提供了客观的数据支撑,通过车型投诉对比和品牌质量评估,消费者能够更加理性地选择适合的汽车产品。从行业监管层面,系统的数据挖掘功能有助于监管部门识别行业共性问题和潜在安全隐患,为制定相关政策法规和行业标准提供数据依据。技术层面上,本课题将大数据处理技术与具体业务场景相结合,探索了Hadoop生态系统在非结构化文本数据处理中的应用实践,为类似的数据分析项目提供了技术参考。该系统的设计实现过程也为计算机专业学生提供了一个综合运用大数据技术、Web开发和数据可视化技术的实践平台,具有一定的教学示范价值。
汽车各品牌投诉数据分析与可视化系统-视频展示****
汽车各品牌投诉数据分析与可视化系统-图片展示****
汽车各品牌投诉数据分析与可视化系统-代码展示****
from pyspark.sql.functions import col, count, desc, when, regexp_extract, split, explode, collect_list, first
from pyspark.sql.types import *
import pandas as pd
import numpy as np
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
import json
import jieba
from collections import Counter
spark = SparkSession.builder.appName("CarComplaintAnalysis").master("local[*]").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()
def brand_complaint_ranking_analysis(request):
complaint_df = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/car_complaints/complaints_data.csv")
brand_complaint_counts = complaint_df.groupBy("投诉品牌").agg(count("*").alias("投诉总量")).orderBy(desc("投诉总量"))
brand_problem_distribution = complaint_df.groupBy("投诉品牌", "问题类型").agg(count("*").alias("问题数量"))
brand_problem_pivot = brand_problem_distribution.groupBy("投诉品牌").pivot("问题类型").agg(first("问题数量")).fillna(0)
brand_time_trend = complaint_df.withColumn("投诉月份", regexp_extract(col("投诉日期"), r"(\d{4}-\d{2})", 1)).groupBy("投诉品牌", "投诉月份").agg(count("*").alias("月度投诉量")).orderBy("投诉品牌", "投诉月份")
severity_keywords = ["严重", "危险", "致命", "重大", "紧急"]
severity_condition = col("投诉简述").rlike("|".join(severity_keywords))
brand_severity_analysis = complaint_df.withColumn("严重投诉", when(severity_condition, 1).otherwise(0)).groupBy("投诉品牌").agg((count(when(col("严重投诉") == 1, True)) / count("*")).alias("严重投诉比例"))
response_keywords = ["已处理", "已解决", "回复", "联系", "处理中"]
response_condition = col("投诉简述").rlike("|".join(response_keywords))
brand_response_analysis = complaint_df.withColumn("有响应", when(response_condition, 1).otherwise(0)).groupBy("投诉品牌").agg((count(when(col("有响应") == 1, True)) / count("*")).alias("响应率"))
top_brands = brand_complaint_counts.limit(20).collect()
brand_names = [row["投诉品牌"] for row in top_brands]
filtered_problem_pivot = brand_problem_pivot.filter(col("投诉品牌").isin(brand_names))
filtered_time_trend = brand_time_trend.filter(col("投诉品牌").isin(brand_names))
filtered_severity = brand_severity_analysis.filter(col("投诉品牌").isin(brand_names))
filtered_response = brand_response_analysis.filter(col("投诉品牌").isin(brand_names))
ranking_result = [{"brand": row["投诉品牌"], "complaint_count": row["投诉总量"]} for row in top_brands]
problem_result = []
for row in filtered_problem_pivot.collect():
brand_data = {"brand": row["投诉品牌"]}
for col_name in filtered_problem_pivot.columns[1:]:
brand_data[col_name] = row[col_name] or 0
problem_result.append(brand_data)
trend_result = {}
for row in filtered_time_trend.collect():
brand = row["投诉品牌"]
if brand not in trend_result:
trend_result[brand] = []
trend_result[brand].append({"month": row["投诉月份"], "count": row["月度投诉量"]})
severity_result = [{"brand": row["投诉品牌"], "severity_ratio": float(row["严重投诉比例"])} for row in filtered_severity.collect()]
response_result = [{"brand": row["投诉品牌"], "response_rate": float(row["响应率"])} for row in filtered_response.collect()]
return JsonResponse({"ranking": ranking_result, "problem_distribution": problem_result, "time_trend": trend_result, "severity_analysis": severity_result, "response_analysis": response_result})
def vehicle_model_complaint_comparison(request):
complaint_df = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/car_complaints/complaints_data.csv")
vehicle_series_counts = complaint_df.groupBy("投诉车系").agg(count("*").alias("投诉总量")).orderBy(desc("投诉总量"))
model_year_pattern = r"(\d{4})款"
model_year_analysis = complaint_df.withColumn("车型年款", regexp_extract(col("投诉车型"), model_year_pattern, 1)).filter(col("车型年款") != "").groupBy("投诉车系", "车型年款").agg(count("*").alias("投诉数量")).orderBy("投诉车系", desc("车型年款"))
series_problem_analysis = complaint_df.groupBy("投诉车系", "投诉问题").agg(count("*").alias("问题频次")).orderBy("投诉车系", desc("问题频次"))
price_keywords = {"10万以下": r"[1-9]万", "10-20万": r"1[0-9]万", "20-30万": r"2[0-9]万", "30万以上": r"[3-9][0-9]万"}
price_conditions = []
for price_range, pattern in price_keywords.items():
price_conditions.append(when(col("投诉车型").rlike(pattern), price_range))
final_price_condition = price_conditions[0]
for condition in price_conditions[1:]:
final_price_condition = final_price_condition.otherwise(condition)
price_complaint_analysis = complaint_df.withColumn("价格区间", final_price_condition.otherwise("未知")).filter(col("价格区间") != "未知").groupBy("价格区间").agg(count("*").alias("投诉数量"))
new_model_threshold_months = 12
complaint_with_months = complaint_df.withColumn("投诉年月", regexp_extract(col("投诉日期"), r"(\d{4}-\d{2})", 1))
model_launch_analysis = complaint_with_months.groupBy("投诉车型", "投诉年月").agg(count("*").alias("月度投诉量"))
model_early_complaints = model_launch_analysis.orderBy("投诉车型", "投诉年月").groupBy("投诉车型").agg(collect_list("月度投诉量").alias("投诉序列"))
top_series = vehicle_series_counts.limit(15).collect()
series_names = [row["投诉车系"] for row in top_series]
filtered_year_analysis = model_year_analysis.filter(col("投诉车系").isin(series_names))
filtered_problem_analysis = series_problem_analysis.filter(col("投诉车系").isin(series_names))
series_ranking_result = [{"series": row["投诉车系"], "complaint_count": row["投诉总量"]} for row in top_series]
year_comparison_result = {}
for row in filtered_year_analysis.collect():
series = row["投诉车系"]
if series not in year_comparison_result:
year_comparison_result[series] = []
year_comparison_result[series].append({"year": row["车型年款"], "count": row["投诉数量"]})
problem_analysis_result = {}
for row in filtered_problem_analysis.collect():
series = row["投诉车系"]
if series not in problem_analysis_result:
problem_analysis_result[series] = []
problem_analysis_result[series].append({"problem": row["投诉问题"], "frequency": row["问题频次"]})
price_analysis_result = [{"price_range": row["价格区间"], "complaint_count": row["投诉数量"]} for row in price_complaint_analysis.collect()]
early_adoption_result = []
for row in model_early_complaints.collect():
complaint_sequence = row["投诉序列"]
if len(complaint_sequence) >= 3:
early_risk_score = sum(complaint_sequence[:3]) / sum(complaint_sequence) if sum(complaint_sequence) > 0 else 0
early_adoption_result.append({"model": row["投诉车型"], "early_risk_score": float(early_risk_score)})
return JsonResponse({"series_ranking": series_ranking_result, "year_comparison": year_comparison_result, "problem_analysis": problem_analysis_result, "price_analysis": price_analysis_result, "early_adoption_risk": early_adoption_result})
def complaint_text_mining_analysis(request):
complaint_df = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/car_complaints/complaints_data.csv")
complaint_texts = complaint_df.select("投诉简述").rdd.map(lambda row: row[0]).filter(lambda x: x is not None).collect()
all_words = []
for text in complaint_texts:
words = jieba.cut(text)
filtered_words = [word for word in words if len(word) > 1 and word not in ["的", "了", "在", "和", "与", "或", "但", "然", "而", "就", "都", "很", "更", "最", "非常", "特别", "十分"]]
all_words.extend(filtered_words)
word_freq = Counter(all_words)
top_keywords = word_freq.most_common(50)
sentiment_positive_keywords = ["满意", "不错", "好评", "赞", "优秀", "完美", "棒", "给力"]
sentiment_negative_keywords = ["不满", "差", "烂", "垃圾", "失望", "愤怒", "恶心", "坑", "黑心", "欺骗"]
complaint_sentiment_scores = []
for text in complaint_texts:
positive_count = sum([1 for keyword in sentiment_positive_keywords if keyword in text])
negative_count = sum([1 for keyword in sentiment_negative_keywords if keyword in text])
sentiment_score = positive_count - negative_count
complaint_sentiment_scores.append(sentiment_score)
avg_sentiment = np.mean(complaint_sentiment_scores)
sentiment_distribution = {"positive": sum([1 for score in complaint_sentiment_scores if score > 0]), "neutral": sum([1 for score in complaint_sentiment_scores if score == 0]), "negative": sum([1 for score in complaint_sentiment_scores if score < 0])}
safety_keywords = ["安全", "事故", "危险", "碰撞", "制动", "刹车", "转向", "失控", "爆胎", "漏油", "起火", "自燃"]
safety_related_complaints = []
for i, text in enumerate(complaint_texts):
safety_mentions = [keyword for keyword in safety_keywords if keyword in text]
if safety_mentions:
complaint_sentiment_score = complaint_sentiment_scores[i]
safety_related_complaints.append({"text_snippet": text[:100], "safety_keywords": safety_mentions, "sentiment_score": complaint_sentiment_score})
service_attitude_keywords = ["态度", "服务", "客服", "销售", "维修", "推诿", "敷衍", "冷漠", "热情", "专业", "耐心"]
service_complaints = []
for text in complaint_texts:
service_mentions = [keyword for keyword in service_attitude_keywords if keyword in text]
if service_mentions:
service_complaints.append({"text": text[:150], "service_keywords": service_mentions})
solution_keywords = ["退货", "退款", "换货", "维修", "赔偿", "道歉", "解释", "处理", "解决", "改进"]
solution_expectations = Counter()
for text in complaint_texts:
for keyword in solution_keywords:
if keyword in text:
solution_expectations[keyword] += 1
complaint_df_with_id = complaint_df.withColumn("complaint_id", col("投诉编号"))
text_similarity_threshold = 0.8
potential_duplicates = []
processed_complaints = complaint_df_with_id.select("complaint_id", "投诉简述").collect()
for i in range(len(processed_complaints)):
for j in range(i + 1, len(processed_complaints)):
text1 = processed_complaints[i]["投诉简述"] or ""
text2 = processed_complaints[j]["投诉简述"] or ""
if len(text1) > 20 and len(text2) > 20:
common_chars = set(text1) & set(text2)
similarity = len(common_chars) / max(len(set(text1)), len(set(text2))) if max(len(set(text1)), len(set(text2))) > 0 else 0
if similarity > text_similarity_threshold:
potential_duplicates.append({"complaint1": processed_complaints[i]["complaint_id"], "complaint2": processed_complaints[j]["complaint_id"], "similarity": similarity})
return JsonResponse({"keyword_analysis": [{"word": word, "frequency": freq} for word, freq in top_keywords], "sentiment_analysis": {"average_sentiment": float(avg_sentiment), "distribution": sentiment_distribution}, "safety_hazard_analysis": safety_related_complaints[:20], "service_attitude_analysis": service_complaints[:15], "solution_expectation_analysis": [{"solution": solution, "frequency": freq} for solution, freq in solution_expectations.most_common()], "duplicate_detection": potential_duplicates[:10]})
汽车各品牌投诉数据分析与可视化系统-结语****
大数据工程师推荐计算机毕设:汽车各品牌投诉数据分析与可视化系统技术实现
30万+汽车投诉数据秒级分析:基于Hadoop的品牌投诉可视化系统设计
2026年最火大数据技术:汽车投诉分析系统Hadoop+Spark+Vue完美结合
支持我记得一键三连,再点个关注,学习不迷路!如果遇到有技术问题或者获取源代码,欢迎在评论区留言!
⚡⚡获取源码主页--> space.bilibili.com/35463818075…****
⚡⚡如果遇到具体的技术问题或计算机毕设方面需求,你也可以在主页上咨询我~~