【大数据】个性化饮食风味数据分析系统 计算机毕业设计项目 Hadoop+Spark环境配置 数据科学与大数据技术 附源码+文档+讲解

48 阅读5分钟

一、个人简介

💖💖作者:计算机编程果茶熊 💙💙个人简介:曾长期从事计算机专业培训教学,担任过编程老师,同时本人也热爱上课教学,擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 计算机毕业设计选题 💕💕文末获取源码联系计算机编程果茶熊

二、系统介绍

大数据框架:Hadoop+Spark(Hive需要定制修改) 开发语言:Java+Python(两个版本都支持) 数据库:MySQL 后端框架:SpringBoot(Spring+SpringMVC+Mybatis)+Django(两个版本都支持) 前端:Vue+Echarts+HTML+CSS+JavaScript+jQuery

《个性化饮食风味数据分析系统》是一个基于大数据技术和机器学习算法的智能饮食分析平台。该系统采用Hadoop+Spark分布式计算框架作为核心引擎,结合Python语言的强大数据处理能力,通过Django后端框架提供稳定的API服务,前端采用Vue+ElementUI+Echarts技术栈实现交互式数据可视化界面。系统主要功能涵盖用户管理、饮食风味数据管理、用户画像分析、地理文化分析、用户分群分析、核心因素分析以及可视化大屏展示等模块。通过HDFS分布式文件系统存储海量饮食数据,利用Spark SQL进行高效的数据查询和分析,结合Pandas和NumPy进行精细化的数据处理和统计计算,最终将分析结果存储在MySQL数据库中。系统能够深度挖掘用户的饮食偏好特征,识别不同地域的饮食文化差异,为用户提供个性化的饮食建议和风味推荐,同时为餐饮行业提供数据驱动的决策支持。

三、视频解说

个性化饮食风味数据分析系统

四、部分功能展示

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

五、部分代码展示


from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, avg, sum, when, desc, asc
from pyspark.ml.feature import VectorAssembler, StandardScaler
from pyspark.ml.clustering import KMeans
from pyspark.ml.evaluation import ClusteringEvaluator
import pandas as pd
import numpy as np
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
import json
spark = SparkSession.builder.appName("FoodFlavorAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()
def user_profile_analysis(user_id):
    user_behavior_df = spark.sql(f"SELECT * FROM user_food_behavior WHERE user_id = {user_id}")
    taste_preferences = user_behavior_df.groupBy("taste_type").agg(count("*").alias("frequency"), avg("rating").alias("avg_rating")).orderBy(desc("frequency"))
    cuisine_preferences = user_behavior_df.groupBy("cuisine_type").agg(count("*").alias("order_count"), avg("rating").alias("satisfaction")).orderBy(desc("order_count"))
    spice_level_analysis = user_behavior_df.groupBy("spice_level").agg(count("*").alias("count"), avg("rating").alias("rating")).collect()
    price_range_analysis = user_behavior_df.select(avg("price").alias("avg_price"), min("price").alias("min_price"), max("price").alias("max_price")).collect()[0]
    time_pattern = user_behavior_df.groupBy("order_hour").agg(count("*").alias("order_frequency")).orderBy("order_hour")
    monthly_trend = user_behavior_df.groupBy("order_month").agg(count("*").alias("monthly_orders"), avg("rating").alias("monthly_satisfaction")).orderBy("order_month")
    flavor_intensity = user_behavior_df.select(avg("sweetness").alias("avg_sweetness"), avg("saltiness").alias("avg_saltiness"), avg("sourness").alias("avg_sourness"), avg("bitterness").alias("avg_bitterness")).collect()[0]
    dietary_restrictions = user_behavior_df.filter(col("dietary_type").isNotNull()).groupBy("dietary_type").count().collect()
    ingredient_preferences = user_behavior_df.explode(col("ingredients")).groupBy("col").agg(count("*").alias("ingredient_frequency")).orderBy(desc("ingredient_frequency")).limit(20)
    cooking_method_prefs = user_behavior_df.groupBy("cooking_method").agg(count("*").alias("method_count"), avg("rating").alias("method_satisfaction")).orderBy(desc("method_count"))
    seasonal_preferences = user_behavior_df.groupBy("season").agg(count("*").alias("seasonal_orders"), avg("rating").alias("seasonal_rating")).collect()
    nutrition_analysis = user_behavior_df.select(avg("calories").alias("avg_calories"), avg("protein").alias("avg_protein"), avg("carbs").alias("avg_carbs"), avg("fat").alias("avg_fat")).collect()[0]
    user_profile = {"taste_preferences": taste_preferences.toPandas().to_dict('records'), "cuisine_preferences": cuisine_preferences.toPandas().to_dict('records'), "spice_level": spice_level_analysis, "price_range": price_range_analysis.asDict(), "time_patterns": time_pattern.toPandas().to_dict('records'), "monthly_trends": monthly_trend.toPandas().to_dict('records'), "flavor_profile": flavor_intensity.asDict(), "dietary_info": dietary_restrictions, "top_ingredients": ingredient_preferences.toPandas().to_dict('records'), "cooking_methods": cooking_method_prefs.toPandas().to_dict('records'), "seasonal_data": seasonal_preferences, "nutrition_summary": nutrition_analysis.asDict()}
    return user_profile
def geographic_cultural_analysis(region_id):
    regional_data = spark.sql(f"SELECT * FROM food_cultural_data WHERE region_id = {region_id}")
    signature_dishes = regional_data.groupBy("dish_name").agg(count("*").alias("popularity"), avg("cultural_significance").alias("significance_score")).orderBy(desc("popularity")).limit(50)
    flavor_characteristics = regional_data.select(avg("sweetness_index").alias("regional_sweetness"), avg("spiciness_index").alias("regional_spiciness"), avg("saltiness_index").alias("regional_saltiness"), avg("umami_index").alias("regional_umami")).collect()[0]
    ingredient_distribution = regional_data.explode(col("local_ingredients")).groupBy("col").agg(count("*").alias("usage_frequency"), avg("availability_score").alias("availability")).orderBy(desc("usage_frequency"))
    cooking_techniques = regional_data.groupBy("traditional_method").agg(count("*").alias("technique_frequency"), avg("preservation_level").alias("tradition_strength")).orderBy(desc("technique_frequency"))
    cultural_influence_factors = regional_data.groupBy("cultural_factor").agg(count("*").alias("factor_impact"), avg("influence_score").alias("influence_strength")).collect()
    historical_evolution = regional_data.groupBy("time_period").agg(count("*").alias("period_dishes"), avg("innovation_score").alias("creativity_level")).orderBy("time_period")
    festival_food_analysis = regional_data.filter(col("festival_association").isNotNull()).groupBy("festival_association").agg(count("*").alias("festival_dishes"), avg("ceremonial_importance").alias("ritual_significance")).collect()
    climate_food_correlation = regional_data.select(avg("temperature_adaptation").alias("temp_factor"), avg("humidity_adaptation").alias("humidity_factor"), avg("seasonal_variation").alias("season_factor")).collect()[0]
    cross_cultural_exchanges = regional_data.filter(col("foreign_influence") > 0).groupBy("influence_source").agg(count("*").alias("exchange_count"), avg("adaptation_degree").alias("localization_level")).collect()
    modern_adaptations = regional_data.filter(col("modernization_score") > 5).groupBy("adaptation_type").agg(count("*").alias("modern_count"), avg("acceptance_rate").alias("acceptance")).orderBy(desc("modern_count"))
    nutritional_patterns = regional_data.select(avg("regional_calories").alias("avg_cal_intake"), avg("protein_ratio").alias("protein_pct"), avg("carb_ratio").alias("carb_pct"), avg("vegetable_ratio").alias("veg_pct")).collect()[0]
    economic_food_relationship = regional_data.groupBy("economic_level").agg(avg("food_cost_ratio").alias("cost_proportion"), count("*").alias("economic_group_size")).collect()
    cultural_analysis_result = {"signature_dishes": signature_dishes.toPandas().to_dict('records'), "flavor_profile": flavor_characteristics.asDict(), "ingredient_patterns": ingredient_distribution.toPandas().to_dict('records'), "cooking_traditions": cooking_techniques.toPandas().to_dict('records'), "cultural_factors": cultural_influence_factors, "historical_trends": historical_evolution.toPandas().to_dict('records'), "festival_foods": festival_food_analysis, "climate_adaptation": climate_food_correlation.asDict(), "cultural_exchanges": cross_cultural_exchanges, "modern_changes": modern_adaptations.toPandas().to_dict('records'), "nutrition_patterns": nutritional_patterns.asDict(), "economic_relations": economic_food_relationship}
    return cultural_analysis_result
def user_clustering_analysis():
    user_features_df = spark.sql("SELECT user_id, avg_sweetness, avg_spiciness, avg_saltiness, order_frequency, price_sensitivity, cuisine_diversity, health_consciousness, adventure_score FROM user_feature_matrix")
    feature_columns = ["avg_sweetness", "avg_spiciness", "avg_saltiness", "order_frequency", "price_sensitivity", "cuisine_diversity", "health_consciousness", "adventure_score"]
    assembler = VectorAssembler(inputCols=feature_columns, outputCol="features")
    feature_vector_df = assembler.transform(user_features_df)
    scaler = StandardScaler(inputCol="features", outputCol="scaled_features", withStd=True, withMean=True)
    scaler_model = scaler.fit(feature_vector_df)
    scaled_df = scaler_model.transform(feature_vector_df)
    silhouette_scores = []
    for k in range(3, 11):
        kmeans = KMeans(k=k, featuresCol="scaled_features", predictionCol="cluster", seed=42, maxIter=100)
        model = kmeans.fit(scaled_df)
        predictions = model.transform(scaled_df)
        evaluator = ClusteringEvaluator(featuresCol="scaled_features", predictionCol="cluster", metricName="silhouette")
        score = evaluator.evaluate(predictions)
        silhouette_scores.append((k, score))
    optimal_k = max(silhouette_scores, key=lambda x: x[1])[0]
    final_kmeans = KMeans(k=optimal_k, featuresCol="scaled_features", predictionCol="cluster", seed=42, maxIter=100)
    final_model = final_kmeans.fit(scaled_df)
    clustered_df = final_model.transform(scaled_df)
    cluster_statistics = clustered_df.groupBy("cluster").agg(count("*").alias("cluster_size"), avg("avg_sweetness").alias("sweetness_avg"), avg("avg_spiciness").alias("spiciness_avg"), avg("avg_saltiness").alias("saltiness_avg"), avg("order_frequency").alias("frequency_avg"), avg("price_sensitivity").alias("price_avg"), avg("cuisine_diversity").alias("diversity_avg"), avg("health_consciousness").alias("health_avg"), avg("adventure_score").alias("adventure_avg"))
    cluster_centers = final_model.clusterCenters()
    cluster_profiles = []
    for i, center in enumerate(cluster_centers):
        profile = {"cluster_id": i, "center_coordinates": center.tolist(), "interpretation": f"Cluster {i} characteristics"}
        cluster_profiles.append(profile)
    user_cluster_mapping = clustered_df.select("user_id", "cluster").toPandas().to_dict('records')
    cluster_behavioral_patterns = clustered_df.groupBy("cluster").agg(avg("order_frequency").alias("avg_orders"), avg("price_sensitivity").alias("price_conscious"), avg("cuisine_diversity").alias("variety_seeking")).collect()
    dietary_cluster_analysis = spark.sql("SELECT c.cluster, d.dietary_type, COUNT(*) as type_count FROM clustered_users c JOIN user_dietary_info d ON c.user_id = d.user_id GROUP BY c.cluster, d.dietary_type").collect()
    clustering_results = {"optimal_clusters": optimal_k, "silhouette_scores": silhouette_scores, "cluster_statistics": cluster_statistics.toPandas().to_dict('records'), "cluster_centers": cluster_profiles, "user_assignments": user_cluster_mapping, "behavioral_patterns": cluster_behavioral_patterns, "dietary_distributions": dietary_cluster_analysis}
    return clustering_results

六、部分文档展示

在这里插入图片描述

七、END

💕💕文末获取源码联系计算机编程果茶熊