【大数据】农产品数据可视化分析系统 计算机毕业设计项目 Hadoop+Spark环境配置 数据科学与大数据技术 附源码+文档+讲解

56 阅读6分钟

一、个人简介

💖💖作者:计算机编程果茶熊 💙💙个人简介:曾长期从事计算机专业培训教学,担任过编程老师,同时本人也热爱上课教学,擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 计算机毕业设计选题 💕💕文末获取源码联系计算机编程果茶熊

二、系统介绍

大数据框架:Hadoop+Spark(Hive需要定制修改) 开发语言:Java+Python(两个版本都支持) 数据库:MySQL 后端框架:SpringBoot(Spring+SpringMVC+Mybatis)+Django(两个版本都支持) 前端:Vue+Echarts+HTML+CSS+JavaScript+jQuery

农产品数据可视化分析系统是一套基于大数据技术构建的农业数据分析平台,采用Hadoop+Spark作为核心大数据处理框架,以Python为主要开发语言,后端使用Django框架提供API服务,前端通过Vue+ElementUI+Echarts技术栈实现交互式可视化界面。系统利用HDFS进行分布式数据存储,通过Spark SQL进行大规模数据查询和分析,结合Pandas、NumPy等数据处理库进行精细化数据操作,并将处理结果存储在MySQL数据库中。系统主要提供价格趋势分析、生产结构分析、宏观影响分析、综合关联分析以及可视化大屏五大核心功能模块,能够对农产品市场数据进行多维度分析和直观展示,帮助用户深入了解农产品市场动态和发展趋势,为农业决策提供数据支撑。

三、视频解说

农产品数据可视化分析系统

四、部分功能展示

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

五、部分代码展示


from pyspark.sql import SparkSession
from pyspark.sql.functions import col, avg, sum, max, min, count, when, lag, desc, asc
from pyspark.sql.window import Window
import pandas as pd
import numpy as np
from django.http import JsonResponse
from django.views import View
import json
from datetime import datetime, timedelta

def price_trend_analysis(request):
    spark = SparkSession.builder.appName("PriceTrendAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()
    df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/agriculture_data").option("dbtable", "price_data").option("user", "root").option("password", "password").load()
    product_type = request.GET.get('product_type', 'all')
    time_range = int(request.GET.get('time_range', 30))
    end_date = datetime.now()
    start_date = end_date - timedelta(days=time_range)
    filtered_df = df.filter((col("date") >= start_date) & (col("date") <= end_date))
    if product_type != 'all':
        filtered_df = filtered_df.filter(col("product_type") == product_type)
    window_spec = Window.partitionBy("product_type").orderBy("date")
    trend_df = filtered_df.withColumn("prev_price", lag("price", 1).over(window_spec))
    trend_df = trend_df.withColumn("price_change", col("price") - col("prev_price"))
    trend_df = trend_df.withColumn("change_rate", (col("price_change") / col("prev_price")) * 100)
    daily_avg = trend_df.groupBy("date", "product_type").agg(avg("price").alias("avg_price"), avg("change_rate").alias("avg_change_rate"), count("*").alias("data_points"))
    trend_analysis = daily_avg.groupBy("product_type").agg(avg("avg_price").alias("period_avg_price"), max("avg_price").alias("max_price"), min("avg_price").alias("min_price"), avg("avg_change_rate").alias("avg_volatility"))
    volatility_df = trend_df.groupBy("product_type").agg((sum((col("change_rate") - avg("change_rate")) ** 2) / count("*")).alias("price_variance"))
    result_df = trend_analysis.join(volatility_df, "product_type")
    result_df = result_df.withColumn("volatility_level", when(col("price_variance") > 100, "High").when(col("price_variance") > 50, "Medium").otherwise("Low"))
    pandas_result = result_df.toPandas()
    spark.stop()
    result_data = []
    for _, row in pandas_result.iterrows():
        result_data.append({"product_type": row["product_type"], "avg_price": float(row["period_avg_price"]), "max_price": float(row["max_price"]), "min_price": float(row["min_price"]), "volatility": float(row["avg_volatility"]), "volatility_level": row["volatility_level"], "price_variance": float(row["price_variance"])})
    return JsonResponse({"status": "success", "data": result_data, "analysis_period": f"{time_range} days"})

def production_structure_analysis(request):
    spark = SparkSession.builder.appName("ProductionStructureAnalysis").config("spark.sql.adaptive.enabled", "true").getOrCreate()
    production_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/agriculture_data").option("dbtable", "production_data").option("user", "root").option("password", "password").load()
    region = request.GET.get('region', 'all')
    year = request.GET.get('year', str(datetime.now().year))
    filtered_df = production_df.filter(col("year") == int(year))
    if region != 'all':
        filtered_df = filtered_df.filter(col("region") == region)
    total_production = filtered_df.agg(sum("production_volume").alias("total")).collect()[0]["total"]
    structure_df = filtered_df.groupBy("product_type", "region").agg(sum("production_volume").alias("regional_production"), avg("yield_per_hectare").alias("avg_yield"), sum("planting_area").alias("total_area"))
    structure_df = structure_df.withColumn("production_ratio", (col("regional_production") / total_production) * 100)
    structure_df = structure_df.withColumn("productivity_index", col("avg_yield") / col("total_area"))
    regional_summary = structure_df.groupBy("region").agg(sum("regional_production").alias("region_total"), avg("avg_yield").alias("region_avg_yield"), sum("total_area").alias("region_total_area"))
    regional_summary = regional_summary.withColumn("region_contribution", (col("region_total") / total_production) * 100)
    product_ranking = structure_df.groupBy("product_type").agg(sum("regional_production").alias("product_total")).orderBy(desc("product_total"))
    top_products = product_ranking.limit(10)
    efficiency_analysis = structure_df.withColumn("land_efficiency", col("regional_production") / col("total_area"))
    efficiency_ranking = efficiency_analysis.groupBy("product_type").agg(avg("land_efficiency").alias("avg_efficiency")).orderBy(desc("avg_efficiency"))
    concentration_df = structure_df.groupBy("product_type").agg(count("region").alias("production_regions"), max("production_ratio").alias("max_regional_share"))
    concentration_df = concentration_df.withColumn("concentration_level", when(col("max_regional_share") > 50, "High").when(col("max_regional_share") > 30, "Medium").otherwise("Low"))
    structure_pandas = structure_df.toPandas()
    regional_pandas = regional_summary.toPandas()
    efficiency_pandas = efficiency_ranking.toPandas()
    concentration_pandas = concentration_df.toPandas()
    spark.stop()
    structure_result = []
    for _, row in structure_pandas.iterrows():
        structure_result.append({"product_type": row["product_type"], "region": row["region"], "production": int(row["regional_production"]), "ratio": float(row["production_ratio"]), "yield": float(row["avg_yield"]), "area": int(row["total_area"]), "productivity": float(row["productivity_index"])})
    regional_result = [{"region": row["region"], "total_production": int(row["region_total"]), "contribution": float(row["region_contribution"]), "avg_yield": float(row["region_avg_yield"])} for _, row in regional_pandas.iterrows()]
    return JsonResponse({"status": "success", "structure_data": structure_result, "regional_summary": regional_result, "total_production": int(total_production), "analysis_year": year})

def macro_impact_analysis(request):
    spark = SparkSession.builder.appName("MacroImpactAnalysis").config("spark.sql.adaptive.enabled", "true").getOrCreate()
    price_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/agriculture_data").option("dbtable", "price_data").option("user", "root").option("password", "password").load()
    macro_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/agriculture_data").option("dbtable", "macro_indicators").option("user", "root").option("password", "password").load()
    weather_df = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/agriculture_data").option("dbtable", "weather_data").option("user", "root").option("password", "password").load()
    impact_factor = request.GET.get('impact_factor', 'gdp')
    time_period = int(request.GET.get('time_period', 12))
    end_date = datetime.now()
    start_date = end_date - timedelta(days=time_period * 30)
    price_filtered = price_df.filter((col("date") >= start_date) & (col("date") <= end_date))
    macro_filtered = macro_df.filter((col("date") >= start_date) & (col("date") <= end_date))
    weather_filtered = weather_df.filter((col("date") >= start_date) & (col("date") <= end_date))
    monthly_price = price_filtered.groupBy("month", "product_type").agg(avg("price").alias("avg_monthly_price"), count("*").alias("price_records"))
    monthly_macro = macro_filtered.groupBy("month").agg(avg("gdp_growth_rate").alias("avg_gdp"), avg("inflation_rate").alias("avg_inflation"), avg("exchange_rate").alias("avg_exchange"), avg("interest_rate").alias("avg_interest"))
    monthly_weather = weather_filtered.groupBy("month", "region").agg(avg("temperature").alias("avg_temp"), sum("rainfall").alias("total_rainfall"), avg("humidity").alias("avg_humidity"))
    regional_weather = monthly_weather.groupBy("month").agg(avg("avg_temp").alias("national_temp"), avg("total_rainfall").alias("national_rainfall"), avg("avg_humidity").alias("national_humidity"))
    combined_df = monthly_price.join(monthly_macro, "month").join(regional_weather, "month")
    if impact_factor == 'gdp':
        correlation_df = combined_df.select("avg_monthly_price", "avg_gdp").toPandas()
        correlation_value = correlation_df["avg_monthly_price"].corr(correlation_df["avg_gdp"])
        impact_analysis = combined_df.withColumn("impact_score", when(col("avg_gdp") > 6.0, col("avg_monthly_price") * 1.1).when(col("avg_gdp") < 3.0, col("avg_monthly_price") * 0.9).otherwise(col("avg_monthly_price")))
    elif impact_factor == 'weather':
        correlation_df = combined_df.select("avg_monthly_price", "national_temp", "national_rainfall").toPandas()
        temp_corr = correlation_df["avg_monthly_price"].corr(correlation_df["national_temp"])
        rain_corr = correlation_df["avg_monthly_price"].corr(correlation_df["national_rainfall"])
        correlation_value = (temp_corr + rain_corr) / 2
        impact_analysis = combined_df.withColumn("impact_score", when((col("national_temp") > 35) | (col("national_rainfall") > 200), col("avg_monthly_price") * 1.2).when((col("national_temp") < 10) | (col("national_rainfall") < 50), col("avg_monthly_price") * 1.15).otherwise(col("avg_monthly_price")))
    impact_summary = impact_analysis.groupBy("product_type").agg(avg("impact_score").alias("avg_impact"), max("impact_score").alias("max_impact"), min("impact_score").alias("min_impact"), count("*").alias("data_points"))
    impact_summary = impact_summary.withColumn("impact_level", when(col("avg_impact") > col("min_impact") * 1.2, "High").when(col("avg_impact") > col("min_impact") * 1.1, "Medium").otherwise("Low"))
    result_pandas = impact_summary.toPandas()
    combined_pandas = combined_df.toPandas()
    spark.stop()
    impact_result = []
    for _, row in result_pandas.iterrows():
        impact_result.append({"product_type": row["product_type"], "avg_impact": float(row["avg_impact"]), "max_impact": float(row["max_impact"]), "min_impact": float(row["min_impact"]), "impact_level": row["impact_level"], "data_points": int(row["data_points"])})
    trend_data = []
    for _, row in combined_pandas.iterrows():
        trend_data.append({"month": row["month"], "price": float(row["avg_monthly_price"]), "gdp": float(row["avg_gdp"]) if impact_factor == 'gdp' else None, "temperature": float(row["national_temp"]) if impact_factor == 'weather' else None, "rainfall": float(row["national_rainfall"]) if impact_factor == 'weather' else None})
    return JsonResponse({"status": "success", "impact_analysis": impact_result, "correlation_value": float(correlation_value), "trend_data": trend_data, "analysis_factor": impact_factor, "time_period": f"{time_period} months"})

六、部分文档展示

在这里插入图片描述

七、END

💕💕文末获取源码联系计算机编程果茶熊