【大数据】全球各地旅游体验评价数据分析系统 计算机毕业设计项目 Hadoop+Spark环境配置 数据科学与大数据技术 附源码+文档+讲解

55 阅读6分钟

一、个人简介

💖💖作者:计算机编程果茶熊 💙💙个人简介:曾长期从事计算机专业培训教学,担任过编程老师,同时本人也热爱上课教学,擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 计算机毕业设计选题 💕💕文末获取源码联系计算机编程果茶熊

二、系统介绍

大数据框架:Hadoop+Spark(Hive需要定制修改) 开发语言:Java+Python(两个版本都支持) 数据库:MySQL 后端框架:SpringBoot(Spring+SpringMVC+Mybatis)+Django(两个版本都支持) 前端:Vue+Echarts+HTML+CSS+JavaScript+jQuery

全球各地旅游体验评价数据分析系统是一套基于Hadoop+Spark大数据架构的旅游数据智能分析平台。系统采用Python作为核心开发语言,后端基于Django框架构建RESTful API服务,前端运用Vue+ElementUI实现响应式交互界面,并通过Echarts完成数据可视化呈现。在数据处理层面,系统利用HDFS进行海量旅游评价数据的分布式存储,借助Spark SQL进行高效的数据查询与清洗,结合Pandas和NumPy完成深度数据挖掘与统计分析。系统功能涵盖用户权限管理、旅游体验评价数据的增删改查、游客偏好多维度分析、旅游消费行为分析、满意度影响因素挖掘、游客决策路径分析以及可视化大屏展示等模块。通过MySQL存储业务数据与分析结果,实现了从数据采集、存储、处理到可视化的全流程闭环,为旅游行业的经营决策提供数据支撑。

三、视频解说

全球各地旅游体验评价数据分析系统

四、部分功能展示

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

五、部分代码展示


from pyspark.sql import SparkSession
from pyspark.sql.functions import col, avg, count, sum, desc, when, concat_ws
from pyspark.sql.window import Window
import pyspark.sql.functions as F
from django.http import JsonResponse
from django.views.decorators.http import require_http_methods
import json
import pandas as pd
import numpy as np
from datetime import datetime
spark = SparkSession.builder.appName("TourismAnalysisSystem").config("spark.sql.warehouse.dir", "/user/hive/warehouse").config("spark.executor.memory", "4g").config("spark.driver.memory", "2g").getOrCreate()
@require_http_methods(["POST"])
def analyze_tourist_preference(request):
    try:
        params = json.loads(request.body)
        start_date = params.get('start_date')
        end_date = params.get('end_date')
        region = params.get('region', None)
        df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/tourism_db").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "tourism_evaluation").option("user", "root").option("password", "root").load()
        df_filtered = df.filter((col("evaluation_date") >= start_date) & (col("evaluation_date") <= end_date))
        if region:
            df_filtered = df_filtered.filter(col("destination_region") == region)
        destination_stats = df_filtered.groupBy("destination_name").agg(count("*").alias("visit_count"),avg("rating_score").alias("avg_rating"),avg("price_level").alias("avg_price")).orderBy(desc("visit_count"))
        age_preference = df_filtered.groupBy("age_group", "tourism_type").agg(count("*").alias("preference_count")).withColumn("rank", F.row_number().over(Window.partitionBy("age_group").orderBy(desc("preference_count")))).filter(col("rank") <= 5)
        season_analysis = df_filtered.withColumn("season", when((col("month") >= 3) & (col("month") <= 5), "春季").when((col("month") >= 6) & (col("month") <= 8), "夏季").when((col("month") >= 9) & (col("month") <= 11), "秋季").otherwise("冬季"))
        season_preference = season_analysis.groupBy("season", "destination_name").agg(count("*").alias("season_visit_count")).withColumn("season_rank", F.row_number().over(Window.partitionBy("season").orderBy(desc("season_visit_count")))).filter(col("season_rank") <= 10)
        travel_companion = df_filtered.groupBy("companion_type").agg(count("*").alias("companion_count"),avg("rating_score").alias("companion_avg_rating")).orderBy(desc("companion_count"))
        activity_preference = df_filtered.groupBy("activity_type").agg(count("*").alias("activity_count"),avg("satisfaction_score").alias("activity_satisfaction")).orderBy(desc("activity_count"))
        dest_pandas = destination_stats.limit(20).toPandas()
        age_pandas = age_preference.toPandas()
        season_pandas = season_preference.toPandas()
        companion_pandas = travel_companion.toPandas()
        activity_pandas = activity_preference.limit(15).toPandas()
        result = {"destination_ranking": dest_pandas.to_dict('records'),"age_preference": age_pandas.to_dict('records'),"season_preference": season_pandas.to_dict('records'),"companion_analysis": companion_pandas.to_dict('records'),"activity_preference": activity_pandas.to_dict('records'),"total_records": df_filtered.count()}
        return JsonResponse({"code": 200, "message": "游客偏好分析完成", "data": result})
    except Exception as e:
        return JsonResponse({"code": 500, "message": f"分析失败: {str(e)}"})
@require_http_methods(["POST"])
def analyze_consumption_behavior(request):
    try:
        params = json.loads(request.body)
        analysis_dimension = params.get('dimension', 'overall')
        time_range = params.get('time_range', 'month')
        df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/tourism_db").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "tourism_evaluation").option("user", "root").option("password", "root").load()
        df = df.withColumn("total_cost", col("accommodation_cost") + col("food_cost") + col("transport_cost") + col("entertainment_cost") + col("shopping_cost"))
        consumption_stats = df.groupBy("destination_region").agg(avg("total_cost").alias("avg_total_cost"),avg("accommodation_cost").alias("avg_accommodation"),avg("food_cost").alias("avg_food"),avg("transport_cost").alias("avg_transport"),avg("entertainment_cost").alias("avg_entertainment"),avg("shopping_cost").alias("avg_shopping"),count("*").alias("sample_count")).orderBy(desc("avg_total_cost"))
        if time_range == 'month':
            time_group = df.groupBy("year", "month").agg(avg("total_cost").alias("period_avg_cost"),sum("total_cost").alias("period_total_cost"),count("*").alias("period_count")).orderBy("year", "month")
        else:
            time_group = df.groupBy("year", "quarter").agg(avg("total_cost").alias("period_avg_cost"),sum("total_cost").alias("period_total_cost"),count("*").alias("period_count")).orderBy("year", "quarter")
        income_consumption = df.groupBy("income_level").agg(avg("total_cost").alias("income_avg_cost"),avg("accommodation_cost").alias("income_avg_accommodation"),avg("shopping_cost").alias("income_avg_shopping"),count("*").alias("income_count")).orderBy("income_level")
        df_with_cost_level = df.withColumn("cost_level", when(col("total_cost") < 3000, "经济型").when((col("total_cost") >= 3000) & (col("total_cost") < 8000), "中档型").when((col("total_cost") >= 8000) & (col("total_cost") < 15000), "高档型").otherwise("奢华型"))
        cost_level_dist = df_with_cost_level.groupBy("cost_level").agg(count("*").alias("level_count"),avg("rating_score").alias("level_avg_rating")).orderBy(when(col("cost_level") == "经济型", 1).when(col("cost_level") == "中档型", 2).when(col("cost_level") == "高档型", 3).otherwise(4))
        stay_consumption = df.groupBy("stay_duration").agg(avg("total_cost").alias("duration_avg_cost"),count("*").alias("duration_count")).orderBy("stay_duration")
        cost_structure = df.select((col("accommodation_cost")/col("total_cost")*100).alias("accommodation_ratio"),(col("food_cost")/col("total_cost")*100).alias("food_ratio"),(col("transport_cost")/col("total_cost")*100).alias("transport_ratio"),(col("entertainment_cost")/col("total_cost")*100).alias("entertainment_ratio"),(col("shopping_cost")/col("total_cost")*100).alias("shopping_ratio"))
        cost_structure_avg = cost_structure.agg(avg("accommodation_ratio").alias("avg_accommodation_ratio"),avg("food_ratio").alias("avg_food_ratio"),avg("transport_ratio").alias("avg_transport_ratio"),avg("entertainment_ratio").alias("avg_entertainment_ratio"),avg("shopping_ratio").alias("avg_shopping_ratio"))
        consumption_pandas = consumption_stats.toPandas()
        time_pandas = time_group.toPandas()
        income_pandas = income_consumption.toPandas()
        cost_level_pandas = cost_level_dist.toPandas()
        stay_pandas = stay_consumption.toPandas()
        structure_pandas = cost_structure_avg.toPandas()
        result = {"region_consumption": consumption_pandas.to_dict('records'),"time_trend": time_pandas.to_dict('records'),"income_consumption": income_pandas.to_dict('records'),"cost_level_distribution": cost_level_pandas.to_dict('records'),"stay_consumption": stay_pandas.to_dict('records'),"cost_structure": structure_pandas.to_dict('records')[0] if len(structure_pandas) > 0 else {}}
        return JsonResponse({"code": 200, "message": "旅游消费分析完成", "data": result})
    except Exception as e:
        return JsonResponse({"code": 500, "message": f"消费分析失败: {str(e)}"})
@require_http_methods(["POST"])
def analyze_satisfaction_factors(request):
    try:
        params = json.loads(request.body)
        target_destination = params.get('destination', None)
        df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/tourism_db").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "tourism_evaluation").option("user", "root").option("password", "root").load()
        if target_destination:
            df = df.filter(col("destination_name") == target_destination)
        df = df.withColumn("satisfaction_level", when(col("rating_score") >= 4.5, "非常满意").when((col("rating_score") >= 3.5) & (col("rating_score") < 4.5), "满意").when((col("rating_score") >= 2.5) & (col("rating_score") < 3.5), "一般").otherwise("不满意"))
        satisfaction_dist = df.groupBy("satisfaction_level").agg(count("*").alias("level_count")).withColumn("percentage", col("level_count") / sum("level_count").over(Window.partitionBy()) * 100)
        factor_correlation = df.select("rating_score", "service_score", "environment_score", "facility_score", "transportation_score", "cost_performance_score")
        factor_pandas = factor_correlation.toPandas()
        correlation_matrix = factor_pandas.corr()
        factor_importance = correlation_matrix['rating_score'].drop('rating_score').sort_values(ascending=False)
        service_analysis = df.groupBy("service_score").agg(count("*").alias("service_count"),avg("rating_score").alias("service_avg_rating")).orderBy("service_score")
        environment_analysis = df.groupBy("environment_score").agg(count("*").alias("env_count"),avg("rating_score").alias("env_avg_rating")).orderBy("environment_score")
        negative_feedback = df.filter(col("rating_score") < 3.0)
        negative_reasons = negative_feedback.groupBy("complaint_type").agg(count("*").alias("complaint_count")).orderBy(desc("complaint_count"))
        positive_feedback = df.filter(col("rating_score") >= 4.5)
        positive_highlights = positive_feedback.groupBy("highlight_feature").agg(count("*").alias("highlight_count")).orderBy(desc("highlight_count"))
        price_satisfaction = df.groupBy("price_level").agg(avg("rating_score").alias("price_avg_rating"),avg("cost_performance_score").alias("price_avg_cost_perf"),count("*").alias("price_count")).orderBy("price_level")
        repeat_visit = df.groupBy("is_repeat_visitor").agg(count("*").alias("visitor_count"),avg("rating_score").alias("repeat_avg_rating"))
        recommendation_rate = df.groupBy("would_recommend").agg(count("*").alias("recommend_count")).withColumn("recommend_percentage", col("recommend_count") / sum("recommend_count").over(Window.partitionBy()) * 100)
        satisfaction_pandas = satisfaction_dist.toPandas()
        service_pandas = service_analysis.toPandas()
        environment_pandas = environment_analysis.toPandas()
        negative_pandas = negative_reasons.limit(10).toPandas()
        positive_pandas = positive_highlights.limit(10).toPandas()
        price_pandas = price_satisfaction.toPandas()
        repeat_pandas = repeat_visit.toPandas()
        recommend_pandas = recommendation_rate.toPandas()
        result = {"satisfaction_distribution": satisfaction_pandas.to_dict('records'),"factor_importance": factor_importance.to_dict(),"service_impact": service_pandas.to_dict('records'),"environment_impact": environment_pandas.to_dict('records'),"negative_feedback": negative_pandas.to_dict('records'),"positive_highlights": positive_pandas.to_dict('records'),"price_satisfaction": price_pandas.to_dict('records'),"repeat_visitor_analysis": repeat_pandas.to_dict('records'),"recommendation_rate": recommend_pandas.to_dict('records'),"correlation_matrix": correlation_matrix.to_dict()}
        return JsonResponse({"code": 200, "message": "满意度因素分析完成", "data": result})
    except Exception as e:
        return JsonResponse({"code": 500, "message": f"满意度分析失败: {str(e)}"})

六、部分文档展示

在这里插入图片描述

七、END

💕💕文末获取源码联系计算机编程果茶熊