选择大数据方向却不知道做什么?基于Spark的药品可视化分析系统让你的毕设有理有据

33 阅读5分钟

一、个人简介

  • 💖💖作者:计算机编程果茶熊
  • 💙💙个人简介:曾长期从事计算机专业培训教学,担任过编程老师,同时本人也热爱上课教学,擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我!
  • 💛💛想说的话:感谢大家的关注与支持!
  • 💜💜
  • 网站实战项目
  • 安卓/小程序实战项目
  • 大数据实战项目
  • 计算机毕业设计选题
  • 💕💕文末获取源码联系计算机编程果茶熊

二、系统介绍

  • 大数据框架:Hadoop+Spark(Hive需要定制修改)
  • 开发语言:Java+Python(两个版本都支持)
  • 数据库:MySQL
  • 后端框架:SpringBoot(Spring+SpringMVC+Mybatis)+Django(两个版本都支持)
  • 前端:Vue+Echarts+HTML+CSS+JavaScript+jQuery

基于大数据的国家药品采集药品数据可视化分析系统是一套专门针对国家药品集中采购数据进行深度分析和可视化展示的综合性平台。该系统运用Hadoop分布式存储技术和Spark大数据处理引擎,能够高效处理海量的药品采购数据,通过Spark SQL进行复杂的数据查询和分析计算。系统采用Django后端框架与MySQL数据库相结合的技术架构,前端运用Vue框架结合ElementUI组件库和Echarts图表库,为用户提供直观美观的数据可视化界面。系统核心功能涵盖用户管理、药品信息管理、大屏可视化展示、竞争分析、价格分析、特征分类分析以及供应分析等八大模块,能够从多个维度对药品采购数据进行全面分析,帮助相关管理部门和研究人员更好地理解药品市场动态,为决策提供数据支撑。系统运用Pandas和NumPy等数据分析库进行数据预处理和统计分析,通过HDFS实现数据的可靠存储,整体架构设计合理,技术栈完整,具备良好的可扩展性和实用价值。

三、基于大数据的国家药品采集药品数据可视化分析系统-视频解说

选择大数据方向却不知道做什么?基于Spark的药品可视化分析系统让你的毕设有理有据

四、基于大数据的国家药品采集药品数据可视化分析系统-功能展示

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

五、基于大数据的国家药品采集药品数据可视化分析系统-代码展示



from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, sum, avg, desc, asc, when, isnan, isnull
from pyspark.sql.types import StructType, StructField, StringType, DoubleType, IntegerType
import pandas as pd
import numpy as np
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
import json

spark = SparkSession.builder.appName("DrugDataAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()

def drug_competition_analysis(request):
    drug_data = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/drug_data/drug_procurement.csv")
    drug_data_clean = drug_data.filter(col("drug_price").isNotNull() & col("supplier_name").isNotNull() & col("drug_name").isNotNull())
    supplier_stats = drug_data_clean.groupBy("supplier_name").agg(count("drug_id").alias("drug_count"), avg("drug_price").alias("avg_price"), sum("procurement_quantity").alias("total_quantity"))
    market_share = supplier_stats.withColumn("market_share", col("total_quantity") / supplier_stats.agg(sum("total_quantity")).collect()[0][0] * 100)
    competition_rank = market_share.orderBy(desc("market_share")).limit(20)
    drug_category_competition = drug_data_clean.groupBy("drug_category", "supplier_name").agg(count("drug_id").alias("category_count"), avg("drug_price").alias("category_avg_price"))
    category_leaders = drug_category_competition.groupBy("drug_category").agg(count("supplier_name").alias("competitor_count"))
    competition_intensity = category_leaders.withColumn("intensity_level", when(col("competitor_count") > 10, "高竞争").when(col("competitor_count") > 5, "中竞争").otherwise("低竞争"))
    price_competition = drug_data_clean.groupBy("drug_name").agg(count("supplier_name").alias("supplier_count"), avg("drug_price").alias("avg_drug_price"), (col("max_price") - col("min_price")).alias("price_range"))
    competitive_drugs = price_competition.filter(col("supplier_count") > 3).orderBy(desc("price_range"))
    result_data = {"market_leaders": [row.asDict() for row in competition_rank.collect()], "category_competition": [row.asDict() for row in competition_intensity.collect()], "price_competitive_drugs": [row.asDict() for row in competitive_drugs.collect()]}
    return JsonResponse(result_data)

def drug_price_analysis(request):
    price_data = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/drug_data/price_history.csv")
    price_clean = price_data.filter(col("current_price").isNotNull() & col("historical_price").isNotNull() & col("procurement_date").isNotNull())
    price_trends = price_clean.withColumn("price_change", col("current_price") - col("historical_price")).withColumn("price_change_rate", (col("current_price") - col("historical_price")) / col("historical_price") * 100)
    monthly_trends = price_trends.groupBy("procurement_month", "drug_category").agg(avg("current_price").alias("monthly_avg_price"), avg("price_change_rate").alias("monthly_change_rate"))
    price_volatility = price_trends.groupBy("drug_name").agg(count("drug_id").alias("price_records"), (col("max_price") - col("min_price")).alias("price_volatility"), avg("price_change_rate").alias("avg_change_rate"))
    high_volatility_drugs = price_volatility.filter(col("price_volatility") > 50).orderBy(desc("price_volatility"))
    regional_price_diff = price_clean.groupBy("procurement_region", "drug_name").agg(avg("current_price").alias("regional_avg_price"))
    price_comparison = regional_price_diff.join(regional_price_diff.groupBy("drug_name").agg(avg("regional_avg_price").alias("national_avg")), "drug_name")
    regional_variance = price_comparison.withColumn("price_deviation", col("regional_avg_price") - col("national_avg")).withColumn("deviation_rate", (col("regional_avg_price") - col("national_avg")) / col("national_avg") * 100)
    cost_effectiveness = price_clean.groupBy("drug_name", "drug_specification").agg(avg("current_price").alias("unit_price"), avg("procurement_quantity").alias("avg_quantity"))
    value_analysis = cost_effectiveness.withColumn("cost_per_unit", col("unit_price") / col("avg_quantity")).orderBy(asc("cost_per_unit"))
    price_result = {"monthly_trends": [row.asDict() for row in monthly_trends.collect()], "volatility_analysis": [row.asDict() for row in high_volatility_drugs.collect()], "regional_differences": [row.asDict() for row in regional_variance.collect()], "value_ranking": [row.asDict() for row in value_analysis.collect()]}
    return JsonResponse(price_result)

def drug_supply_analysis(request):
    supply_data = spark.read.option("header", "true").option("inferSchema", "true").csv("hdfs://localhost:9000/drug_data/supply_chain.csv")
    supply_clean = supply_data.filter(col("supplier_id").isNotNull() & col("supply_capacity").isNotNull() & col("delivery_time").isNotNull())
    supplier_capacity = supply_clean.groupBy("supplier_name", "supplier_region").agg(sum("supply_capacity").alias("total_capacity"), avg("delivery_time").alias("avg_delivery_days"), count("drug_id").alias("drug_varieties"))
    capacity_ranking = supplier_capacity.withColumn("capacity_level", when(col("total_capacity") > 10000, "大型供应商").when(col("total_capacity") > 5000, "中型供应商").otherwise("小型供应商"))
    regional_supply = supply_clean.groupBy("procurement_region").agg(sum("supply_capacity").alias("regional_supply"), count("supplier_id").alias("supplier_count"))
    supply_adequacy = regional_supply.withColumn("supply_per_supplier", col("regional_supply") / col("supplier_count"))
    delivery_performance = supply_clean.groupBy("supplier_name").agg(avg("delivery_time").alias("avg_delivery"), count(when(col("delivery_status") == "on_time", 1)).alias("on_time_count"), count("delivery_id").alias("total_deliveries"))
    reliability_score = delivery_performance.withColumn("on_time_rate", col("on_time_count") / col("total_deliveries") * 100).withColumn("reliability_grade", when(col("on_time_rate") > 95, "优秀").when(col("on_time_rate") > 85, "良好").otherwise("需改进"))
    supply_risk_assessment = supply_clean.groupBy("drug_name").agg(count("supplier_id").alias("supplier_diversity"), avg("supply_capacity").alias("avg_supply_capacity"))
    risk_evaluation = supply_risk_assessment.withColumn("supply_risk", when(col("supplier_diversity") < 3, "高风险").when(col("supplier_diversity") < 6, "中风险").otherwise("低风险"))
    seasonal_supply = supply_clean.groupBy("supply_month", "drug_category").agg(sum("supply_capacity").alias("monthly_supply"), avg("procurement_demand").alias("monthly_demand"))
    supply_balance = seasonal_supply.withColumn("supply_demand_ratio", col("monthly_supply") / col("monthly_demand")).withColumn("balance_status", when(col("supply_demand_ratio") > 1.2, "供过于求").when(col("supply_demand_ratio") < 0.8, "供不应求").otherwise("供需平衡"))
    supply_result = {"capacity_analysis": [row.asDict() for row in capacity_ranking.collect()], "regional_supply": [row.asDict() for row in supply_adequacy.collect()], "reliability_assessment": [row.asDict() for row in reliability_score.collect()], "risk_evaluation": [row.asDict() for row in risk_evaluation.collect()], "seasonal_balance": [row.asDict() for row in supply_balance.collect()]}
    return JsonResponse(supply_result)



六、基于大数据的国家药品采集药品数据可视化分析系统-文档展示

在这里插入图片描述

七、END