【数据分析】基于大数据的上海餐饮数据可视化分析系统 | 大数据毕设实战项目 选题推荐 可视化大屏 Hadoop SPark java Python

60 阅读7分钟

💖💖作者:计算机毕业设计江挽 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目

基于大数据的上海餐饮数据可视化分析系统介绍

基于大数据的上海餐饮数据可视化分析系统是一个综合运用Hadoop和Spark技术的毕业设计项目,专门针对上海地区餐饮行业的海量数据进行深度挖掘和可视化呈现。系统采用Hadoop作为分布式存储基础,通过HDFS管理餐饮企业的经营数据、消费者评价数据以及市场竞争数据,利用Spark强大的内存计算能力和Spark SQL的结构化查询功能,对百万级别的餐饮数据进行快速处理和分析。在技术实现上,系统提供Python+Django和Java+SpringBoot两种后端解决方案,前端采用Vue+ElementUI构建交互界面,通过Echarts实现数据的动态可视化展示。系统核心功能涵盖餐饮分布分析、消费分析、质量分析和竞争分析四大模块,能够帮助用户直观了解上海餐饮市场的地域分布特征、消费者行为模式、服务质量水平以及行业竞争态势,为餐饮企业决策和市场研究提供数据支撑,同时也为计算机专业学生提供了一个将大数据理论知识应用于实际场景的实践平台。

基于大数据的上海餐饮数据可视化分析系统演示视频

演示视频

基于大数据的上海餐饮数据可视化分析系统演示图片

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

基于大数据的上海餐饮数据可视化分析系统代码展示

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, avg, sum, desc, when, row_number, dense_rank
from pyspark.sql.window import Window
from django.http import JsonResponse
from django.views import View
import pandas as pd
import numpy as np

spark = SparkSession.builder.appName("ShanghaiRestaurantAnalysis").config("spark.sql.shuffle.partitions", "10").config("spark.executor.memory", "2g").getOrCreate()

class RestaurantDistributionAnalysis(View):
    def post(self, request):
        district = request.POST.get('district', None)
        category = request.POST.get('category', None)
        df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/restaurant_db").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "restaurant_info").option("user", "root").option("password", "123456").load()
        df_filtered = df.filter(col("city") == "上海")
        if district:
            df_filtered = df_filtered.filter(col("district") == district)
        if category:
            df_filtered = df_filtered.filter(col("category") == category)
        district_distribution = df_filtered.groupBy("district").agg(count("restaurant_id").alias("restaurant_count")).orderBy(desc("restaurant_count"))
        category_distribution = df_filtered.groupBy("category").agg(count("restaurant_id").alias("category_count")).orderBy(desc("category_count"))
        district_category_cross = df_filtered.groupBy("district", "category").agg(count("restaurant_id").alias("count")).orderBy(desc("count"))
        avg_price_by_district = df_filtered.groupBy("district").agg(avg("avg_price").alias("avg_district_price")).orderBy(desc("avg_district_price"))
        density_analysis = df_filtered.groupBy("district").agg(count("restaurant_id").alias("total_count"), avg("rating").alias("avg_rating")).withColumn("density_score", col("total_count") * col("avg_rating") / 100).orderBy(desc("density_score"))
        district_list = [row.asDict() for row in district_distribution.collect()]
        category_list = [row.asDict() for row in category_distribution.collect()]
        cross_list = [row.asDict() for row in district_category_cross.collect()]
        price_list = [row.asDict() for row in avg_price_by_district.collect()]
        density_list = [row.asDict() for row in density_analysis.collect()]
        result = {"district_distribution": district_list, "category_distribution": category_list, "district_category_cross": cross_list, "avg_price_by_district": price_list, "density_analysis": density_list}
        return JsonResponse({"code": 200, "msg": "分布分析完成", "data": result})

class RestaurantConsumptionAnalysis(View):
    def post(self, request):
        start_date = request.POST.get('start_date')
        end_date = request.POST.get('end_date')
        price_range = request.POST.get('price_range', None)
        df_orders = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/restaurant_db").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "order_records").option("user", "root").option("password", "123456").load()
        df_restaurants = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/restaurant_db").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "restaurant_info").option("user", "root").option("password", "123456").load()
        df_orders_filtered = df_orders.filter((col("order_date") >= start_date) & (col("order_date") <= end_date))
        df_joined = df_orders_filtered.join(df_restaurants, "restaurant_id", "left")
        if price_range:
            price_bounds = price_range.split("-")
            df_joined = df_joined.filter((col("order_amount") >= int(price_bounds[0])) & (col("order_amount") <= int(price_bounds[1])))
        daily_consumption = df_joined.groupBy("order_date").agg(sum("order_amount").alias("daily_total"), count("order_id").alias("daily_count"), avg("order_amount").alias("daily_avg")).orderBy("order_date")
        hourly_consumption = df_joined.groupBy("order_hour").agg(count("order_id").alias("hourly_count"), avg("order_amount").alias("hourly_avg")).orderBy("order_hour")
        price_range_distribution = df_joined.withColumn("price_level", when(col("order_amount") < 50, "低消费").when((col("order_amount") >= 50) & (col("order_amount") < 100), "中消费").when((col("order_amount") >= 100) & (col("order_amount") < 200), "高消费").otherwise("奢侈消费")).groupBy("price_level").agg(count("order_id").alias("count"), sum("order_amount").alias("total_amount")).orderBy(desc("total_amount"))
        district_consumption_rank = df_joined.groupBy("district").agg(sum("order_amount").alias("district_total"), count("order_id").alias("district_count")).withColumn("avg_per_order", col("district_total") / col("district_count")).orderBy(desc("district_total"))
        category_consumption = df_joined.groupBy("category").agg(sum("order_amount").alias("category_total"), count("order_id").alias("category_count"), avg("order_amount").alias("category_avg")).orderBy(desc("category_total"))
        user_consumption_window = Window.orderBy(desc("user_total"))
        top_users = df_joined.groupBy("user_id").agg(sum("order_amount").alias("user_total"), count("order_id").alias("user_count")).withColumn("rank", row_number().over(user_consumption_window)).filter(col("rank") <= 100)
        daily_list = [row.asDict() for row in daily_consumption.collect()]
        hourly_list = [row.asDict() for row in hourly_consumption.collect()]
        price_level_list = [row.asDict() for row in price_range_distribution.collect()]
        district_rank_list = [row.asDict() for row in district_consumption_rank.collect()]
        category_list = [row.asDict() for row in category_consumption.collect()]
        top_users_list = [row.asDict() for row in top_users.collect()]
        result = {"daily_consumption": daily_list, "hourly_consumption": hourly_list, "price_distribution": price_level_list, "district_rank": district_rank_list, "category_consumption": category_list, "top_users": top_users_list}
        return JsonResponse({"code": 200, "msg": "消费分析完成", "data": result})

class RestaurantQualityAnalysis(View):
    def post(self, request):
        min_rating = float(request.POST.get('min_rating', 0))
        district = request.POST.get('district', None)
        df_reviews = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/restaurant_db").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "review_records").option("user", "root").option("password", "123456").load()
        df_restaurants = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/restaurant_db").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "restaurant_info").option("user", "root").option("password", "123456").load()
        df_joined = df_reviews.join(df_restaurants, "restaurant_id", "left")
        df_filtered = df_joined.filter(col("rating") >= min_rating)
        if district:
            df_filtered = df_filtered.filter(col("district") == district)
        rating_distribution = df_filtered.groupBy("rating").agg(count("review_id").alias("rating_count")).orderBy("rating")
        avg_rating_by_category = df_filtered.groupBy("category").agg(avg("rating").alias("avg_category_rating"), count("review_id").alias("review_count")).orderBy(desc("avg_category_rating"))
        avg_rating_by_district = df_filtered.groupBy("district").agg(avg("rating").alias("avg_district_rating"), count("review_id").alias("review_count")).orderBy(desc("avg_district_rating"))
        quality_level_distribution = df_filtered.withColumn("quality_level", when(col("rating") >= 4.5, "优秀").when((col("rating") >= 4.0) & (col("rating") < 4.5), "良好").when((col("rating") >= 3.5) & (col("rating") < 4.0), "中等").when((col("rating") >= 3.0) & (col("rating") < 3.5), "一般").otherwise("较差")).groupBy("quality_level").agg(count("review_id").alias("count")).orderBy(desc("count"))
        restaurant_rating_window = Window.orderBy(desc("restaurant_avg_rating"))
        top_restaurants = df_filtered.groupBy("restaurant_id", "restaurant_name", "district", "category").agg(avg("rating").alias("restaurant_avg_rating"), count("review_id").alias("review_count")).filter(col("review_count") >= 10).withColumn("rank", dense_rank().over(restaurant_rating_window)).filter(col("rank") <= 50)
        keyword_analysis = df_filtered.select("review_content").rdd.flatMap(lambda row: row[0].split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b).sortBy(lambda x: x[1], ascending=False).take(100)
        service_score = df_filtered.filter(col("service_rating").isNotNull()).groupBy("restaurant_id").agg(avg("service_rating").alias("avg_service")).orderBy(desc("avg_service"))
        taste_score = df_filtered.filter(col("taste_rating").isNotNull()).groupBy("restaurant_id").agg(avg("taste_rating").alias("avg_taste")).orderBy(desc("avg_taste"))
        environment_score = df_filtered.filter(col("environment_rating").isNotNull()).groupBy("restaurant_id").agg(avg("environment_rating").alias("avg_environment")).orderBy(desc("avg_environment"))
        rating_dist_list = [row.asDict() for row in rating_distribution.collect()]
        category_rating_list = [row.asDict() for row in avg_rating_by_category.collect()]
        district_rating_list = [row.asDict() for row in avg_rating_by_district.collect()]
        quality_level_list = [row.asDict() for row in quality_level_distribution.collect()]
        top_restaurant_list = [row.asDict() for row in top_restaurants.collect()]
        keyword_list = [{"keyword": item[0], "count": item[1]} for item in keyword_analysis]
        service_list = [row.asDict() for row in service_score.take(20)]
        taste_list = [row.asDict() for row in taste_score.take(20)]
        environment_list = [row.asDict() for row in environment_score.take(20)]
        result = {"rating_distribution": rating_dist_list, "category_rating": category_rating_list, "district_rating": district_rating_list, "quality_level": quality_level_list, "top_restaurants": top_restaurant_list, "hot_keywords": keyword_list, "service_ranking": service_list, "taste_ranking": taste_list, "environment_ranking": environment_list}
        return JsonResponse({"code": 200, "msg": "质量分析完成", "data": result})

基于大数据的上海餐饮数据可视化分析系统文档展示

在这里插入图片描述

💖💖作者:计算机毕业设计江挽 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学习实战项目