一、个人简介
💖💖作者:计算机编程果茶熊 💙💙个人简介:曾长期从事计算机专业培训教学,担任过编程老师,同时本人也热爱上课教学,擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 计算机毕业设计选题 💕💕文末获取源码联系计算机编程果茶熊
二、系统介绍
大数据框架:Hadoop+Spark(Hive需要定制修改) 开发语言:Java+Python(两个版本都支持) 数据库:MySQL 后端框架:SpringBoot(Spring+SpringMVC+Mybatis)+Django(两个版本都支持) 前端:Vue+Echarts+HTML+CSS+JavaScript+jQuery
《星巴克全国门店数据可视化分析系统》是一套基于大数据技术架构的连锁门店智能分析平台。系统采用Hadoop分布式存储和Spark计算引擎作为底层技术支撑,通过HDFS存储海量门店运营数据,利用Spark SQL和Pandas进行数据清洗与分析处理。后端基于Django框架构建RESTful接口,前端采用Vue+ElementUI搭建管理界面,结合Echarts实现多维度数据可视化。系统实现了门店基础信息管理、品牌策略分析、业绩趋势挖掘、地理空间分布、类型细分统计等核心功能模块,通过可视化大屏直观展现全国门店经营状况。系统整合NumPy进行数值计算,配合MySQL数据库完成结构化数据持久化,为连锁企业的运营决策提供数据支撑,帮助管理者从区域分布、店型特征、业绩表现等多个维度洞察门店运营规律,辅助制定差异化的经营策略。
三、视频解说
四、部分功能展示
五、部分代码展示
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, sum, avg, count, when, row_number, dense_rank
from pyspark.sql.window import Window
from django.http import JsonResponse
from django.views import View
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
spark = SparkSession.builder.appName("StarbucksAnalysis").config("spark.sql.warehouse.dir", "/user/hive/warehouse").config("spark.executor.memory", "4g").config("spark.driver.memory", "2g").getOrCreate()
class StorePerformanceAnalysis(View):
def post(self, request):
start_date = request.POST.get('start_date')
end_date = request.POST.get('end_date')
region = request.POST.get('region', 'all')
df_sales = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/starbucks").option("dbtable", "store_sales").option("user", "root").option("password", "password").load()
df_stores = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/starbucks").option("dbtable", "store_info").option("user", "root").option("password", "password").load()
df_joined = df_sales.join(df_stores, df_sales.store_id == df_stores.id, "left")
df_filtered = df_joined.filter((col("sale_date") >= start_date) & (col("sale_date") <= end_date))
if region != 'all':
df_filtered = df_filtered.filter(col("region") == region)
window_spec = Window.partitionBy("store_id").orderBy(col("sale_date"))
df_with_rank = df_filtered.withColumn("daily_sales", col("amount")).withColumn("row_num", row_number().over(window_spec))
df_aggregated = df_with_rank.groupBy("store_id", "store_name", "region", "city").agg(sum("amount").alias("total_sales"),avg("amount").alias("avg_daily_sales"),count("sale_date").alias("operating_days"),sum(when(col("amount") > 10000, 1).otherwise(0)).alias("high_performance_days"))
df_growth = df_filtered.groupBy("store_id", "sale_date").agg(sum("amount").alias("daily_amount")).orderBy("store_id", "sale_date")
pdf_growth = df_growth.toPandas()
pdf_growth['sale_date'] = pd.to_datetime(pdf_growth['sale_date'])
pdf_growth = pdf_growth.sort_values(['store_id', 'sale_date'])
pdf_growth['prev_amount'] = pdf_growth.groupby('store_id')['daily_amount'].shift(1)
pdf_growth['growth_rate'] = ((pdf_growth['daily_amount'] - pdf_growth['prev_amount']) / pdf_growth['prev_amount'] * 100).fillna(0)
growth_stats = pdf_growth.groupby('store_id').agg({'growth_rate': ['mean', 'std']}).reset_index()
growth_stats.columns = ['store_id', 'avg_growth_rate', 'growth_volatility']
pdf_result = df_aggregated.toPandas()
pdf_result = pdf_result.merge(growth_stats, on='store_id', how='left')
pdf_result['performance_score'] = (pdf_result['total_sales'] / pdf_result['total_sales'].max() * 0.4 + pdf_result['avg_daily_sales'] / pdf_result['avg_daily_sales'].max() * 0.3 + pdf_result['avg_growth_rate'] / pdf_result['avg_growth_rate'].max() * 0.3) * 100
pdf_result['performance_level'] = pd.cut(pdf_result['performance_score'], bins=[0, 60, 75, 90, 100], labels=['需改进', '一般', '良好', '优秀'])
result_dict = pdf_result.to_dict('records')
return JsonResponse({'code': 200, 'data': result_dict, 'message': '业绩分析完成'})
class StoreGeoSpatialAnalysis(View):
def post(self, request):
analysis_type = request.POST.get('type', 'density')
target_city = request.POST.get('city', None)
df_stores = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/starbucks").option("dbtable", "store_info").option("user", "root").option("password", "password").load()
df_population = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/starbucks").option("dbtable", "city_population").option("user", "root").option("password", "password").load()
df_with_pop = df_stores.join(df_population, df_stores.city == df_population.city_name, "left")
city_stats = df_with_pop.groupBy("province", "city", "population").agg(count("id").alias("store_count"))
city_stats = city_stats.withColumn("density", col("store_count") / col("population") * 100000)
city_stats = city_stats.withColumn("saturation_level", when(col("density") > 5, "高饱和").when(col("density") > 2, "中等").otherwise("低饱和"))
pdf_city = city_stats.toPandas()
pdf_city['population'] = pdf_city['population'].fillna(pdf_city['population'].median())
pdf_city['density'] = pdf_city['density'].fillna(0)
province_stats = pdf_city.groupby('province').agg({'store_count': 'sum', 'population': 'sum'}).reset_index()
province_stats['province_density'] = province_stats['store_count'] / province_stats['population'] * 100000
province_stats = province_stats.sort_values('province_density', ascending=False)
pdf_stores_detail = df_stores.toPandas()
pdf_stores_detail['latitude'] = pd.to_numeric(pdf_stores_detail['latitude'], errors='coerce')
pdf_stores_detail['longitude'] = pd.to_numeric(pdf_stores_detail['longitude'], errors='coerce')
pdf_stores_detail = pdf_stores_detail.dropna(subset=['latitude', 'longitude'])
if target_city:
city_stores = pdf_stores_detail[pdf_stores_detail['city'] == target_city]
if len(city_stores) > 1:
coords = city_stores[['latitude', 'longitude']].values
distances = []
for i in range(len(coords)):
for j in range(i+1, len(coords)):
lat1, lon1 = np.radians(coords[i])
lat2, lon2 = np.radians(coords[j])
dlat = lat2 - lat1
dlon = lon2 - lon1
a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
c = 2 * np.arcsin(np.sqrt(a))
distance = 6371 * c
distances.append(distance)
avg_distance = np.mean(distances) if distances else 0
min_distance = np.min(distances) if distances else 0
cluster_analysis = {'city': target_city, 'avg_distance_km': round(avg_distance, 2), 'min_distance_km': round(min_distance, 2), 'cluster_density': 'high' if avg_distance < 2 else 'medium' if avg_distance < 5 else 'low'}
else:
cluster_analysis = {'city': target_city, 'message': '门店数量不足,无法分析'}
else:
cluster_analysis = {}
return JsonResponse({'code': 200, 'data': {'city_stats': pdf_city.to_dict('records'), 'province_stats': province_stats.to_dict('records'), 'cluster_analysis': cluster_analysis}, 'message': '地理空间分析完成'})
class StoreBrandStrategyAnalysis(View):
def post(self, request):
strategy_dimension = request.POST.get('dimension', 'comprehensive')
df_stores = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/starbucks").option("dbtable", "store_info").option("user", "root").option("password", "password").load()
df_sales = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/starbucks").option("dbtable", "store_sales").option("user", "root").option("password", "password").load()
df_customer = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/starbucks").option("dbtable", "customer_flow").option("user", "root").option("password", "password").load()
df_full = df_stores.join(df_sales, df_stores.id == df_sales.store_id, "left").join(df_customer, df_stores.id == df_customer.store_id, "left")
store_type_stats = df_full.groupBy("store_type", "business_district_type").agg(count("store_id").alias("store_count"),avg("amount").alias("avg_sales"),avg("customer_count").alias("avg_customer"),sum("amount").alias("total_revenue"))
window_type = Window.partitionBy("store_type").orderBy(col("total_revenue").desc())
store_type_ranked = store_type_stats.withColumn("type_rank", dense_rank().over(window_type))
pdf_type = store_type_ranked.toPandas()
pdf_type['revenue_share'] = pdf_type.groupby('store_type')['total_revenue'].transform(lambda x: x / x.sum() * 100)
pdf_type['customer_conversion'] = (pdf_type['avg_sales'] / pdf_type['avg_customer']).fillna(0)
city_level_stats = df_full.groupBy("city_level", "store_type").agg(count("store_id").alias("count"),avg("amount").alias("performance")).toPandas()
pivot_city = city_level_stats.pivot_table(index='city_level', columns='store_type', values='performance', fill_value=0)
pivot_city['optimal_type'] = pivot_city.idxmax(axis=1)
pivot_city_reset = pivot_city.reset_index()
time_based = df_sales.withColumn("sale_month", col("sale_date").substr(1, 7)).groupBy("sale_month", "store_id").agg(sum("amount").alias("monthly_sales")).toPandas()
time_based['sale_month'] = pd.to_datetime(time_based['sale_month'])
time_based = time_based.sort_values(['store_id', 'sale_month'])
time_based['sales_ma3'] = time_based.groupby('store_id')['monthly_sales'].transform(lambda x: x.rolling(window=3, min_periods=1).mean())
time_based['trend'] = time_based.groupby('store_id')['sales_ma3'].transform(lambda x: 'up' if x.diff().mean() > 0 else 'down')
trend_summary = time_based.groupby('store_id').agg({'trend': 'first', 'monthly_sales': 'mean'}).reset_index()
strategy_matrix = pdf_type.merge(trend_summary, left_on='store_count', right_on='store_id', how='left')
strategy_matrix['strategy_recommendation'] = strategy_matrix.apply(lambda row: '加大投入' if row['revenue_share'] > 30 and row['trend'] == 'up' else '优化运营' if row['revenue_share'] > 20 else '评估调整', axis=1)
return JsonResponse({'code': 200, 'data': {'store_type_analysis': pdf_type.to_dict('records'), 'city_level_strategy': pivot_city_reset.to_dict('records'), 'strategy_matrix': strategy_matrix.head(20).to_dict('records')}, 'message': '品牌策略分析完成'})
六、部分文档展示
七、END
💕💕文末获取源码联系计算机编程果茶熊