基于大数据的食物口味差异数据分析系统 | 7天精通食物口味差异数据分析系统:从Hadoop搭建到Spark数据处理全流程指南

39 阅读7分钟

> 💖💖作者:计算机毕业设计杰瑞

💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学校实战项目 计算机毕业设计选题推荐

基于大数据的食物口味差异数据分析系统介绍

食物口味差异数据分析系统是一个基于大数据技术栈的综合性数据分析平台,采用Hadoop+Spark分布式计算框架处理海量口味偏好数据,通过Python和Django构建后端服务架构,前端使用Vue+ElementUI+Echarts实现数据可视化展示。系统核心功能涵盖口味差异数据管理、宏观数据分布分析、生活习惯差异分析、地理文化口味分析等八大模块,能够深入挖掘不同地域、年龄、文化背景下的饮食口味偏好规律。平台运用Spark SQL进行大规模数据查询,结合Pandas和NumPy进行精确的数据处理和统计分析,通过多维交叉因素分析和用户聚类特征分析,为研究人群饮食习惯提供科学依据。系统支持特定用户画像分析功能,能够根据个人口味数据生成详细的偏好报告,为个性化饮食推荐和营养健康研究提供技术支撑。

基于大数据的食物口味差异数据分析系统演示视频

演示视频

基于大数据的食物口味差异数据分析系统演示图片

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

基于大数据的食物口味差异数据分析系统代码展示

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.ml.clustering import KMeans
from pyspark.ml.feature import VectorAssembler
import pandas as pd
import numpy as np
from django.http import JsonResponse
from django.views import View
import json

spark = SparkSession.builder.appName("FoodTasteAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()

def macro_data_distribution_analysis(request):
    taste_df = spark.sql("SELECT region, age_group, spicy_level, sweet_level, salty_level, sour_level FROM taste_data")
    region_distribution = taste_df.groupBy("region").agg(avg("spicy_level").alias("avg_spicy"), avg("sweet_level").alias("avg_sweet"), avg("salty_level").alias("avg_salty"), avg("sour_level").alias("avg_sour"), count("*").alias("user_count"))
    age_taste_correlation = taste_df.groupBy("age_group").agg(avg("spicy_level").alias("spicy_preference"), avg("sweet_level").alias("sweet_preference"), stddev("spicy_level").alias("spicy_variance"))
    taste_intensity_stats = taste_df.select(mean("spicy_level").alias("overall_spicy_mean"), percentile_approx("spicy_level", 0.5).alias("spicy_median"), max("spicy_level").alias("max_spicy_preference"))
    regional_taste_ranking = region_distribution.orderBy(desc("avg_spicy"))
    taste_diversity_index = taste_df.groupBy("region").agg(stddev("spicy_level").alias("spicy_diversity"), stddev("sweet_level").alias("sweet_diversity")).withColumn("diversity_score", col("spicy_diversity") + col("sweet_diversity"))
    dominant_taste_by_region = taste_df.groupBy("region").agg(when(avg("spicy_level") > avg("sweet_level"), "spicy").when(avg("sweet_level") > avg("salty_level"), "sweet").otherwise("balanced").alias("dominant_taste"))
    taste_preference_matrix = taste_df.crosstab("region", "spicy_level")
    seasonal_trend_data = taste_df.filter(col("survey_date").between("2023-01-01", "2023-12-31")).groupBy(month("survey_date").alias("month")).agg(avg("spicy_level").alias("monthly_spicy_avg"))
    extreme_taste_users = taste_df.filter((col("spicy_level") > 8) | (col("sweet_level") > 8) | (col("salty_level") > 8))
    taste_balance_score = taste_df.withColumn("balance_score", abs(col("spicy_level") - col("sweet_level")) + abs(col("sweet_level") - col("salty_level")))
    regional_comparison = taste_df.groupBy("region").agg(collect_list("spicy_level").alias("spicy_distribution"), collect_list("sweet_level").alias("sweet_distribution"))
    cultural_taste_mapping = taste_df.join(spark.sql("SELECT region, cultural_type FROM region_culture_mapping"), "region").groupBy("cultural_type").agg(avg("spicy_level").alias("cultural_spicy_avg"))
    result_data = {"region_stats": region_distribution.toPandas().to_dict("records"), "age_correlation": age_taste_correlation.toPandas().to_dict("records"), "overall_stats": taste_intensity_stats.toPandas().to_dict("records"), "diversity_index": taste_diversity_index.toPandas().to_dict("records")}
    return JsonResponse({"success": True, "data": result_data})

def lifestyle_difference_analysis(request):
    lifestyle_df = spark.sql("SELECT user_id, exercise_frequency, sleep_hours, work_type, income_level, spicy_level, sweet_level, dining_frequency FROM lifestyle_taste_data")
    exercise_taste_correlation = lifestyle_df.groupBy("exercise_frequency").agg(avg("spicy_level").alias("avg_spicy"), avg("sweet_level").alias("avg_sweet"), count("*").alias("user_count")).withColumn("exercise_taste_ratio", col("avg_spicy") / col("avg_sweet"))
    sleep_impact_analysis = lifestyle_df.withColumn("sleep_category", when(col("sleep_hours") < 6, "insufficient").when(col("sleep_hours") > 8, "adequate").otherwise("normal")).groupBy("sleep_category").agg(avg("spicy_level").alias("spicy_preference"), avg("sweet_level").alias("sweet_preference"))
    work_type_preference = lifestyle_df.groupBy("work_type").agg(avg("spicy_level").alias("work_spicy_avg"), avg("sweet_level").alias("work_sweet_avg"), stddev("spicy_level").alias("spicy_variation"))
    income_taste_relationship = lifestyle_df.withColumn("income_bracket", when(col("income_level") < 5000, "low").when(col("income_level") < 15000, "medium").otherwise("high")).groupBy("income_bracket").agg(avg("spicy_level"), avg("sweet_level"), avg("salty_level"))
    dining_pattern_analysis = lifestyle_df.groupBy("dining_frequency").agg(avg("spicy_level").alias("dining_spicy_avg"), count("user_id").alias("frequency_count")).withColumn("taste_intensity_score", col("dining_spicy_avg") * col("frequency_count") / 100)
    health_conscious_segmentation = lifestyle_df.filter(col("exercise_frequency") > 3).withColumn("health_score", col("exercise_frequency") * col("sleep_hours") / 10).groupBy("health_score").agg(avg("sweet_level").alias("health_sweet_preference"))
    lifestyle_cluster_prep = VectorAssembler(inputCols=["exercise_frequency", "sleep_hours", "income_level", "spicy_level", "sweet_level"], outputCol="lifestyle_features").transform(lifestyle_df)
    lifestyle_kmeans = KMeans(k=4, featuresCol="lifestyle_features", predictionCol="lifestyle_cluster").fit(lifestyle_cluster_prep)
    lifestyle_clustered = lifestyle_kmeans.transform(lifestyle_cluster_prep)
    cluster_taste_profile = lifestyle_clustered.groupBy("lifestyle_cluster").agg(avg("spicy_level").alias("cluster_spicy_avg"), avg("sweet_level").alias("cluster_sweet_avg"), avg("exercise_frequency").alias("cluster_exercise_avg"), count("*").alias("cluster_size"))
    stress_taste_correlation = lifestyle_df.withColumn("stress_indicator", when(col("sleep_hours") < 6, 1).when(col("work_type") == "high_pressure", 1).otherwise(0)).groupBy("stress_indicator").agg(avg("spicy_level").alias("stress_spicy_level"))
    lifestyle_taste_matrix = lifestyle_df.crosstab("work_type", "exercise_frequency")
    behavioral_pattern_analysis = lifestyle_df.withColumn("active_lifestyle", when((col("exercise_frequency") > 3) & (col("sleep_hours") > 7), "active").otherwise("sedentary")).groupBy("active_lifestyle").agg(avg("spicy_level"), avg("sweet_level"))
    result_data = {"exercise_correlation": exercise_taste_correlation.toPandas().to_dict("records"), "sleep_impact": sleep_impact_analysis.toPandas().to_dict("records"), "work_preference": work_type_preference.toPandas().to_dict("records"), "income_relationship": income_taste_relationship.toPandas().to_dict("records"), "cluster_profile": cluster_taste_profile.toPandas().to_dict("records")}
    return JsonResponse({"success": True, "data": result_data})

def user_clustering_analysis(request):
    user_data = spark.sql("SELECT user_id, age, gender, region, spicy_level, sweet_level, salty_level, sour_level, bitter_level, dining_frequency, cuisine_preference FROM user_taste_profile")
    feature_columns = ["age", "spicy_level", "sweet_level", "salty_level", "sour_level", "bitter_level", "dining_frequency"]
    assembler = VectorAssembler(inputCols=feature_columns, outputCol="taste_features")
    user_features = assembler.transform(user_data)
    kmeans_model = KMeans(k=5, featuresCol="taste_features", predictionCol="taste_cluster", maxIter=100, seed=42).fit(user_features)
    clustered_users = kmeans_model.transform(user_features)
    cluster_centers = kmeans_model.clusterCenters()
    cluster_statistics = clustered_users.groupBy("taste_cluster").agg(count("user_id").alias("cluster_size"), avg("spicy_level").alias("avg_spicy"), avg("sweet_level").alias("avg_sweet"), avg("salty_level").alias("avg_salty"), avg("age").alias("avg_age"))
    cluster_demographic = clustered_users.groupBy("taste_cluster", "gender").agg(count("user_id").alias("gender_count")).withColumn("gender_ratio", col("gender_count") / sum("gender_count").over(Window.partitionBy("taste_cluster")))
    regional_cluster_distribution = clustered_users.groupBy("taste_cluster", "region").agg(count("user_id").alias("regional_count")).withColumn("regional_percentage", col("regional_count") / sum("regional_count").over(Window.partitionBy("taste_cluster")) * 100)
    cluster_taste_profile = clustered_users.groupBy("taste_cluster").agg(max("spicy_level").alias("max_spicy"), min("spicy_level").alias("min_spicy"), stddev("spicy_level").alias("spicy_std"), skewness("sweet_level").alias("sweet_skewness"))
    dominant_cuisine_by_cluster = clustered_users.groupBy("taste_cluster", "cuisine_preference").agg(count("user_id").alias("cuisine_count")).withColumn("rank", row_number().over(Window.partitionBy("taste_cluster").orderBy(desc("cuisine_count")))).filter(col("rank") == 1)
    cluster_similarity_matrix = clustered_users.alias("c1").crossJoin(clustered_users.alias("c2")).groupBy("c1.taste_cluster", "c2.taste_cluster").agg(avg(pow(col("c1.spicy_level") - col("c2.spicy_level"), 2) + pow(col("c1.sweet_level") - col("c2.sweet_level"), 2)).alias("euclidean_distance"))
    age_cluster_relationship = clustered_users.withColumn("age_group", when(col("age") < 25, "young").when(col("age") < 40, "middle").otherwise("senior")).groupBy("age_group", "taste_cluster").agg(count("user_id").alias("age_cluster_count"))
    cluster_stability_analysis = clustered_users.sample(0.8).transform(lambda df: kmeans_model.transform(assembler.transform(df))).groupBy("taste_cluster").agg(count("user_id").alias("stable_count"))
    outlier_detection = clustered_users.withColumn("distance_to_center", array([lit(center[i]) for i, center in enumerate(cluster_centers)] for i in range(len(feature_columns)))).withColumn("is_outlier", when(col("distance_to_center") > 3, 1).otherwise(0))
    cluster_behavioral_pattern = clustered_users.groupBy("taste_cluster").agg(avg("dining_frequency").alias("avg_dining_freq"), mode("cuisine_preference").alias("preferred_cuisine"), collect_list("region").alias("cluster_regions"))
    intra_cluster_variance = clustered_users.groupBy("taste_cluster").agg(variance("spicy_level").alias("spicy_variance"), variance("sweet_level").alias("sweet_variance")).withColumn("total_variance", col("spicy_variance") + col("sweet_variance"))
    result_data = {"cluster_stats": cluster_statistics.toPandas().to_dict("records"), "demographic_profile": cluster_demographic.toPandas().to_dict("records"), "regional_distribution": regional_cluster_distribution.toPandas().to_dict("records"), "taste_profile": cluster_taste_profile.toPandas().to_dict("records"), "dominant_cuisine": dominant_cuisine_by_cluster.toPandas().to_dict("records"), "behavioral_pattern": cluster_behavioral_pattern.toPandas().to_dict("records")}
    return JsonResponse({"success": True, "data": result_data, "model_info": {"cluster_count": 5, "centers": [center.tolist() for center in cluster_centers]}})

基于大数据的食物口味差异数据分析系统文档展示

在这里插入图片描述

💖💖作者:计算机毕业设计杰瑞 💙💙个人简介:曾长期从事计算机专业培训教学,本人也热爱上课教学,语言擅长Java、微信小程序、Python、Golang、安卓Android等,开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我! 💛💛想说的话:感谢大家的关注与支持! 💜💜 网站实战项目 安卓/小程序实战项目 大数据实战项目 深度学校实战项目 计算机毕业设计选题推荐