一、个人简介
💖💖作者:计算机编程果茶熊 💙💙个人简介:曾长期从事计算机专业培训教学,担任过编程老师,同时本人也热爱上课教学,擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 计算机毕业设计选题 💕💕文末获取源码联系计算机编程果茶熊
二、系统介绍
大数据框架:Hadoop+Spark(Hive需要定制修改) 开发语言:Java+Python(两个版本都支持) 数据库:MySQL 后端框架:SpringBoot(Spring+SpringMVC+Mybatis)+Django(两个版本都支持) 前端:Vue+Echarts+HTML+CSS+JavaScript+jQuery
三、视频解说
四、部分功能展示
五、部分代码展示
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, avg, sum, desc, asc, when, expr
from django.http import JsonResponse
from django.views import View
import pandas as pd
import numpy as np
import json
spark = SparkSession.builder.appName("JiangxiScenicAnalysis").config("spark.sql.adaptive.enabled", "true").getOrCreate()
def scenic_distribution_analysis(request):
scenic_df = spark.read.option("header", "true").csv("hdfs://localhost:9000/scenic_data/scenic_basic_info.csv")
city_distribution = scenic_df.groupBy("city").agg(count("scenic_id").alias("scenic_count")).orderBy(desc("scenic_count"))
city_pandas = city_distribution.toPandas()
city_coords_df = spark.read.option("header", "true").csv("hdfs://localhost:9000/geographic_data/city_coordinates.csv")
distribution_with_coords = scenic_df.join(city_coords_df, "city", "left")
geo_analysis = distribution_with_coords.select("city", "latitude", "longitude", "scenic_type").groupBy("city", "latitude", "longitude").agg(count("scenic_type").alias("total_scenic"), expr("sum(case when scenic_type='自然景观' then 1 else 0 end)").alias("natural_count"), expr("sum(case when scenic_type='人文景观' then 1 else 0 end)").alias("cultural_count"), expr("sum(case when scenic_type='主题乐园' then 1 else 0 end)").alias("theme_park_count"))
geo_pandas = geo_analysis.toPandas()
density_analysis = geo_pandas.copy()
density_analysis['density_level'] = pd.cut(density_analysis['total_scenic'], bins=3, labels=['低密度', '中密度', '高密度'])
regional_clusters = density_analysis.groupby('density_level').agg({'city': 'count', 'total_scenic': 'sum', 'natural_count': 'sum', 'cultural_count': 'sum'}).reset_index()
type_distribution = scenic_df.groupBy("scenic_type").agg(count("scenic_id").alias("type_count"), avg("area_size").alias("avg_area")).orderBy(desc("type_count"))
type_pandas = type_distribution.toPandas()
result_data = {'city_distribution': city_pandas.to_dict('records'), 'geo_distribution': geo_pandas.to_dict('records'), 'cluster_analysis': regional_clusters.to_dict('records'), 'type_statistics': type_pandas.to_dict('records')}
return JsonResponse(result_data, safe=False)
def scenic_price_insight_analysis(request):
price_df = spark.read.option("header", "true").csv("hdfs://localhost:9000/scenic_data/scenic_price_info.csv")
basic_df = spark.read.option("header", "true").csv("hdfs://localhost:9000/scenic_data/scenic_basic_info.csv")
combined_df = price_df.join(basic_df, "scenic_id", "inner")
price_stats = combined_df.groupBy("scenic_level").agg(avg("current_price").alias("avg_price"), expr("percentile_approx(current_price, 0.5)").alias("median_price"), expr("min(current_price)").alias("min_price"), expr("max(current_price)").alias("max_price"), count("scenic_id").alias("scenic_count"))
level_price_pandas = price_stats.toPandas()
seasonal_analysis = combined_df.select("scenic_id", "scenic_name", "current_price", "peak_season_price", "off_season_price", "scenic_type").withColumn("price_variance", col("peak_season_price") - col("off_season_price")).withColumn("variance_ratio", (col("peak_season_price") - col("off_season_price")) / col("current_price"))
seasonal_pandas = seasonal_analysis.toPandas()
city_price_comparison = combined_df.groupBy("city").agg(avg("current_price").alias("city_avg_price"), count("scenic_id").alias("city_scenic_count")).filter(col("city_scenic_count") >= 3).orderBy(desc("city_avg_price"))
city_price_pandas = city_price_comparison.toPandas()
price_range_distribution = combined_df.withColumn("price_range", when(col("current_price") < 50, "低价位").when((col("current_price") >= 50) & (col("current_price") < 100), "中价位").when((col("current_price") >= 100) & (col("current_price") < 200), "高价位").otherwise("超高价位")).groupBy("price_range", "scenic_type").agg(count("scenic_id").alias("range_count"))
range_pandas = price_range_distribution.toPandas()
price_efficiency = combined_df.select("scenic_name", "current_price", "area_size", "service_rating").withColumn("price_per_area", col("current_price") / col("area_size")).withColumn("value_score", col("service_rating") / col("current_price") * 100).orderBy(desc("value_score"))
efficiency_pandas = price_efficiency.limit(20).toPandas()
result_data = {'level_pricing': level_price_pandas.to_dict('records'), 'seasonal_trends': seasonal_pandas.to_dict('records'), 'city_comparison': city_price_pandas.to_dict('records'), 'range_distribution': range_pandas.to_dict('records'), 'value_ranking': efficiency_pandas.to_dict('records')}
return JsonResponse(result_data, safe=False)
def scenic_popularity_ranking_analysis(request):
visit_df = spark.read.option("header", "true").csv("hdfs://localhost:9000/scenic_data/scenic_visit_data.csv")
basic_df = spark.read.option("header", "true").csv("hdfs://localhost:9000/scenic_data/scenic_basic_info.csv")
review_df = spark.read.option("header", "true").csv("hdfs://localhost:9000/scenic_data/scenic_review_data.csv")
comprehensive_df = visit_df.join(basic_df, "scenic_id", "inner").join(review_df, "scenic_id", "inner")
monthly_popularity = comprehensive_df.groupBy("scenic_id", "scenic_name", "city", "scenic_type").agg(avg("monthly_visitors").alias("avg_monthly_visitors"), avg("satisfaction_score").alias("avg_satisfaction"), sum("review_count").alias("total_reviews"), avg("recommendation_rate").alias("avg_recommendation"))
popularity_score = monthly_popularity.withColumn("popularity_index", (col("avg_monthly_visitors") * 0.4 + col("avg_satisfaction") * 20 * 0.3 + col("total_reviews") * 0.2 + col("avg_recommendation") * 10 * 0.1)).orderBy(desc("popularity_index"))
top_rankings = popularity_score.limit(50).toPandas()
type_popularity = comprehensive_df.groupBy("scenic_type").agg(avg("monthly_visitors").alias("type_avg_visitors"), avg("satisfaction_score").alias("type_avg_satisfaction"), count("scenic_id").alias("type_scenic_count")).orderBy(desc("type_avg_visitors"))
type_pandas = type_popularity.toPandas()
seasonal_visitor_trends = comprehensive_df.select("scenic_name", "spring_visitors", "summer_visitors", "autumn_visitors", "winter_visitors", "scenic_type").withColumn("peak_season", when((col("summer_visitors") > col("spring_visitors")) & (col("summer_visitors") > col("autumn_visitors")) & (col("summer_visitors") > col("winter_visitors")), "夏季").when((col("spring_visitors") > col("summer_visitors")) & (col("spring_visitors") > col("autumn_visitors")) & (col("spring_visitors") > col("winter_visitors")), "春季").when((col("autumn_visitors") > col("spring_visitors")) & (col("autumn_visitors") > col("summer_visitors")) & (col("autumn_visitors") > col("winter_visitors")), "秋季").otherwise("冬季"))
seasonal_pandas = seasonal_visitor_trends.toPandas()
growth_analysis = comprehensive_df.select("scenic_id", "scenic_name", "current_year_visitors", "last_year_visitors").withColumn("growth_rate", (col("current_year_visitors") - col("last_year_visitors")) / col("last_year_visitors") * 100).filter(col("last_year_visitors") > 0).orderBy(desc("growth_rate"))
growth_pandas = growth_analysis.limit(30).toPandas()
regional_hotspots = comprehensive_df.groupBy("city").agg(sum("monthly_visitors").alias("city_total_visitors"), avg("satisfaction_score").alias("city_avg_satisfaction"), count("scenic_id").alias("city_scenic_count")).withColumn("regional_appeal", col("city_total_visitors") / col("city_scenic_count")).orderBy(desc("regional_appeal"))
regional_pandas = regional_hotspots.toPandas()
result_data = {'comprehensive_ranking': top_rankings.to_dict('records'), 'type_analysis': type_pandas.to_dict('records'), 'seasonal_patterns': seasonal_pandas.to_dict('records'), 'growth_trends': growth_pandas.to_dict('records'), 'regional_hotspots': regional_pandas.to_dict('records')}
return JsonResponse(result_data, safe=False)
六、部分文档展示
七、END
💕💕文末获取源码联系计算机编程果茶熊