Python大数据毕设项目实战 基于Hadoop+Django的食物营养数据可视化分析系统毕设指南 毕业设计/选题推荐/深度学习/数据分析/机器学习/数据挖掘

46 阅读11分钟

计算机毕 指导师****

⭐⭐个人介绍:自己非常喜欢研究技术问题!专业做Java、Python、小程序、安卓、大数据、爬虫、Golang、大屏等实战项目。

大家都可点赞、收藏、关注、有问题都可留言评论交流

实战项目:有源码或者技术上的问题欢迎在评论区一起讨论交流!

⚡⚡如果遇到具体的技术问题或计算机毕设方面需求!你也可以在个人主页上↑↑联系我~~

⚡⚡获取源码主页-->:计算机毕设指导师

食物营养数据可视化分析系统- 简介

基于Hadoop+Django的食物营养数据可视化分析系统是一个集成大数据处理与Web可视化展示的综合性分析平台。该系统以Hadoop分布式文件系统作为底层存储架构,结合Spark大数据计算引擎进行海量营养数据的高效处理与分析,通过Django Web框架构建用户交互界面,实现食物营养信息的多维度可视化展示。系统核心功能涵盖宏观营养格局分析、特定营养素排名筛选、食物分类对比分析、膳食健康风险因素分析以及基于机器学习的高级算法探索分析五大模块。平台采用前后端分离架构,前端运用Vue+ElementUI+Echarts技术栈构建交互友好的数据可视化界面,后端通过Django框架整合Spark SQL、Pandas、NumPy等数据处理组件,实现从原始营养数据采集、清洗、存储到多维度统计分析、智能聚类、可视化展示的完整数据处理链路。系统支持热量分布统计、营养素排行榜、食物聚类分析、营养关联性探索等多种分析维度,为用户提供科学的膳食营养决策支持。  

食物营养数据可视化分析系统-技术

开发语言:java或Python

数据库:MySQL

系统架构:B/S

前端:Vue+ElementUI+HTML+CSS+JavaScript+jQuery+Echarts

大数据框架:Hadoop+Spark(本次没用Hive,支持定制)

后端框架:Django+Spring Boot(Spring+SpringMVC+Mybatis)

食物营养数据可视化分析系统- 背景

随着现代生活节奏的加快和饮食结构的多样化,人们对食物营养成分的关注度日益提升,合理的膳食搭配已成为维护健康的重要因素。传统的营养分析方法往往依赖单一数据源和简单统计手段,面对海量的食物营养数据时显得力不从心,无法深入挖掘食物间的营养关联性和规律性。与此同时,大数据技术的快速发展为营养数据分析带来了新的机遇,Hadoop生态系统在处理大规模结构化数据方面展现出显著优势,Spark计算引擎能够高效完成复杂的数据分析任务。然而,现有的营养分析工具多数缺乏大数据技术支撑,在数据处理能力、分析深度和可视化效果方面存在明显不足。构建一个基于Hadoop+Django的食物营养数据可视化分析系统,能够充分发挥大数据技术在海量营养数据处理方面的技术优势,为营养分析提供更加科学、全面的技术解决方案。

本课题的研究具有一定的理论价值和实用意义。从技术角度来看,系统将Hadoop分布式存储与Spark计算引擎相结合,为营养数据分析提供了可扩展的大数据处理方案,验证了大数据技术在特定领域应用的可行性。通过集成机器学习聚类算法,系统能够从数据驱动的角度发现食物间的潜在关联模式,为营养学研究提供新的分析视角。从实际应用层面,系统为普通用户提供了便捷的营养查询和对比工具,帮助人们更好地了解食物的营养构成,做出相对合理的饮食选择。对于营养师和健康管理从业者而言,系统的多维度分析功能能够辅助他们进行专业的膳食指导和营养评估工作。虽然作为毕业设计项目,系统在功能复杂度和数据规模上存在一定局限性,但其技术架构和分析方法具备进一步扩展和完善的潜力。通过本课题的实践,能够加深对大数据技术栈的理解和应用能力,为后续相关技术领域的深入研究奠定基础。  

食物营养数据可视化分析系统-视频展示

www.bilibili.com/video/BV1PY…  

食物营养数据可视化分析系统-图片展示

2 Python大数据项目实战:基于Hadoop+Django的营养数据可视化分析系统毕设指南.png

登录.png

高级算法探索分析.png

宏观营养格局分析.png

膳食健康风险分析.png

食物分类对比分析.png

食物营养排行分析.png

食物营养信息.png

数据大屏.png

用户.png  

食物营养数据可视化分析系统-代码展示

from pyspark.sql.functions import avg, count, desc, asc, col, when, expr
from pyspark.ml.feature import VectorAssembler, StandardScaler
from pyspark.ml.clustering import KMeans
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
import pandas as pd
import numpy as np
import json

spark = SparkSession.builder.appName("NutritionAnalysis").config("spark.sql.adaptive.enabled", "true").config("spark.sql.adaptive.coalescePartitions.enabled", "true").getOrCreate()

@csrf_exempt
def macro_nutrition_analysis(request):
    nutrition_df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("hdfs://localhost:9000/nutrition_data/cleaned_nutrition_dataset_per100g.csv")
    nutrition_df.createOrReplaceTempView("nutrition_table")
    
    core_stats = spark.sql("""
        SELECT 
            ROUND(AVG(热量), 2) as avg_calories,
            ROUND(AVG(蛋白质), 2) as avg_protein,
            ROUND(AVG(脂肪), 2) as avg_fat,
            ROUND(AVG(碳水化合物), 2) as avg_carbs,
            ROUND(AVG(钠), 2) as avg_sodium,
            ROUND(AVG(糖), 2) as avg_sugar,
            ROUND(STDDEV(热量), 2) as std_calories,
            ROUND(STDDEV(蛋白质), 2) as std_protein,
            ROUND(STDDEV(脂肪), 2) as std_fat,
            ROUND(STDDEV(碳水化合物), 2) as std_carbs
        FROM nutrition_table
    """).collect()[0]
    
    calorie_distribution = spark.sql("""
        SELECT 
            CASE 
                WHEN 热量 BETWEEN 0 AND 100 THEN '0-100大卡'
                WHEN 热量 BETWEEN 101 AND 200 THEN '101-200大卡'
                WHEN 热量 BETWEEN 201 AND 300 THEN '201-300大卡'
                WHEN 热量 BETWEEN 301 AND 400 THEN '301-400大卡'
                WHEN 热量 > 400 THEN '400+大卡'
                ELSE '未知'
            END as calorie_range,
            COUNT(*) as food_count
        FROM nutrition_table
        WHERE 热量 IS NOT NULL
        GROUP BY 
            CASE 
                WHEN 热量 BETWEEN 0 AND 100 THEN '0-100大卡'
                WHEN 热量 BETWEEN 101 AND 200 THEN '101-200大卡'
                WHEN 热量 BETWEEN 201 AND 300 THEN '201-300大卡'
                WHEN 热量 BETWEEN 301 AND 400 THEN '301-400大卡'
                WHEN 热量 > 400 THEN '400+大卡'
                ELSE '未知'
            END
        ORDER BY food_count DESC
    """).collect()
    
    macro_composition = spark.sql("""
        SELECT 
            ROUND(AVG(蛋白质), 2) as avg_protein_g,
            ROUND(AVG(脂肪), 2) as avg_fat_g,
            ROUND(AVG(碳水化合物), 2) as avg_carbs_g,
            ROUND(AVG(蛋白质) / (AVG(蛋白质) + AVG(脂肪) + AVG(碳水化合物)) * 100, 2) as protein_ratio,
            ROUND(AVG(脂肪) / (AVG(蛋白质) + AVG(脂肪) + AVG(碳水化合物)) * 100, 2) as fat_ratio,
            ROUND(AVG(碳水化合物) / (AVG(蛋白质) + AVG(脂肪) + AVG(碳水化合物)) * 100, 2) as carbs_ratio
        FROM nutrition_table
        WHERE 蛋白质 IS NOT NULL AND 脂肪 IS NOT NULL AND 碳水化合物 IS NOT NULL
    """).collect()[0]
    
    mineral_analysis = spark.sql("""
        SELECT 
            ROUND(AVG(钙), 2) as avg_calcium,
            ROUND(AVG(铁), 2) as avg_iron,
            ROUND(AVG(钠), 2) as avg_sodium,
            ROUND(STDDEV(钙), 2) as std_calcium,
            ROUND(STDDEV(铁), 2) as std_iron,
            ROUND(STDDEV(钠), 2) as std_sodium,
            ROUND(PERCENTILE_APPROX(钙, 0.5), 2) as median_calcium,
            ROUND(PERCENTILE_APPROX(铁, 0.5), 2) as median_iron,
            ROUND(PERCENTILE_APPROX(钠, 0.5), 2) as median_sodium
        FROM nutrition_table
        WHERE 钙 IS NOT NULL AND 铁 IS NOT NULL AND 钠 IS NOT NULL
    """).collect()[0]
    
    vitamin_analysis = spark.sql("""
        SELECT 
            ROUND(AVG(维生素C), 2) as avg_vitamin_c,
            ROUND(AVG(维生素B11), 2) as avg_vitamin_b11,
            ROUND(STDDEV(维生素C), 2) as std_vitamin_c,
            ROUND(STDDEV(维生素B11), 2) as std_vitamin_b11,
            COUNT(CASE WHEN 维生素C > 50 THEN 1 END) as high_vitamin_c_count,
            COUNT(CASE WHEN 维生素B11 > 100 THEN 1 END) as high_vitamin_b11_count
        FROM nutrition_table
        WHERE 维生素C IS NOT NULL AND 维生素B11 IS NOT NULL
    """).collect()[0]
    
    result_data = {
        'core_nutrition_stats': core_stats.asDict(),
        'calorie_distribution': [row.asDict() for row in calorie_distribution],
        'macro_composition': macro_composition.asDict(),
        'mineral_analysis': mineral_analysis.asDict(),
        'vitamin_analysis': vitamin_analysis.asDict()
    }
    
    return JsonResponse(result_data, safe=False)

@csrf_exempt
def nutrition_ranking_analysis(request):
    nutrition_df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("hdfs://localhost:9000/nutrition_data/cleaned_nutrition_dataset_per100g.csv")
    nutrition_df.createOrReplaceTempView("nutrition_ranking")
    
    protein_ranking = spark.sql("""
        SELECT 食物名称, ROUND(蛋白质, 2) as protein_content
        FROM nutrition_ranking
        WHERE 蛋白质 IS NOT NULL AND 蛋白质 > 0
        ORDER BY 蛋白质 DESC
        LIMIT 20
    """).collect()
    
    fiber_ranking = spark.sql("""
        SELECT 食物名称, ROUND(膳食纤维, 2) as fiber_content
        FROM nutrition_ranking
        WHERE 膳食纤维 IS NOT NULL AND 膳食纤维 > 0
        ORDER BY 膳食纤维 DESC
        LIMIT 20
    """).collect()
    
    calcium_ranking = spark.sql("""
        SELECT 食物名称, ROUND(钙, 2) as calcium_content
        FROM nutrition_ranking
        WHERE 钙 IS NOT NULL AND 钙 > 0
        ORDER BY 钙 DESC
        LIMIT 20
    """).collect()
    
    iron_ranking = spark.sql("""
        SELECT 食物名称, ROUND(铁, 2) as iron_content
        FROM nutrition_ranking
        WHERE 铁 IS NOT NULL AND 铁 > 0
        ORDER BY 铁 DESC
        LIMIT 20
    """).collect()
    
    low_calorie_ranking = spark.sql("""
        SELECT 食物名称, ROUND(热量, 2) as calorie_content
        FROM nutrition_ranking
        WHERE 热量 IS NOT NULL AND 热量 > 0
        ORDER BY 热量 ASC
        LIMIT 20
    """).collect()
    
    low_sodium_ranking = spark.sql("""
        SELECT 食物名称, ROUND(钠, 2) as sodium_content
        FROM nutrition_ranking
        WHERE 钠 IS NOT NULL AND 钠 >= 0
        ORDER BY 钠 ASC
        LIMIT 20
    """).collect()
    
    protein_density_analysis = spark.sql("""
        SELECT 
            食物名称,
            ROUND(蛋白质, 2) as protein_content,
            ROUND(热量, 2) as calorie_content,
            ROUND(蛋白质 / 热量 * 100, 3) as protein_calorie_ratio
        FROM nutrition_ranking
        WHERE 蛋白质 IS NOT NULL AND 热量 IS NOT NULL AND 热量 > 0 AND 蛋白质 > 0
        ORDER BY protein_calorie_ratio DESC
        LIMIT 15
    """).collect()
    
    comprehensive_ranking = spark.sql("""
        SELECT 
            食物名称,
            ROUND(蛋白质, 2) as protein,
            ROUND(膳食纤维, 2) as fiber,
            ROUND(钙, 2) as calcium,
            ROUND(铁, 2) as iron,
            ROUND(热量, 2) as calories,
            ROUND((蛋白质 * 0.3 + 膳食纤维 * 0.25 + 钙 * 0.002 + 铁 * 0.1 - 热量 * 0.001), 3) as nutrition_score
        FROM nutrition_ranking
        WHERE 蛋白质 IS NOT NULL AND 膳食纤维 IS NOT NULL AND 钙 IS NOT NULL AND 铁 IS NOT NULL AND 热量 IS NOT NULL
        ORDER BY nutrition_score DESC
        LIMIT 15
    """).collect()
    
    ranking_data = {
        'protein_champions': [row.asDict() for row in protein_ranking],
        'fiber_stars': [row.asDict() for row in fiber_ranking],
        'calcium_experts': [row.asDict() for row in calcium_ranking],
        'iron_specialists': [row.asDict() for row in iron_ranking],
        'low_calorie_options': [row.asDict() for row in low_calorie_ranking],
        'low_sodium_choices': [row.asDict() for row in low_sodium_ranking],
        'protein_density_leaders': [row.asDict() for row in protein_density_analysis],
        'comprehensive_nutrition_ranking': [row.asDict() for row in comprehensive_ranking]
    }
    
    return JsonResponse(ranking_data, safe=False)

@csrf_exempt
def advanced_clustering_analysis(request):
    nutrition_df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("hdfs://localhost:9000/nutrition_data/cleaned_nutrition_dataset_per100g.csv")
    
    feature_cols = ['蛋白质', '脂肪', '碳水化合物', '膳食纤维', '钠', '糖']
    clean_df = nutrition_df.select(['食物名称'] + feature_cols).dropna()
    
    assembler = VectorAssembler(inputCols=feature_cols, outputCol="features")
    assembled_df = assembler.transform(clean_df)
    
    scaler = StandardScaler(inputCol="features", outputCol="scaledFeatures", withStd=True, withMean=True)
    scaler_model = scaler.fit(assembled_df)
    scaled_df = scaler_model.transform(assembled_df)
    
    kmeans = KMeans(featuresCol="scaledFeatures", predictionCol="cluster", k=5, seed=42, maxIter=100, tol=1e-6)
    kmeans_model = kmeans.fit(scaled_df)
    clustered_df = kmeans_model.transform(scaled_df)
    
    clustered_df.createOrReplaceTempView("clustered_nutrition")
    
    cluster_characteristics = spark.sql("""
        SELECT 
            cluster,
            COUNT(*) as food_count,
            ROUND(AVG(蛋白质), 2) as avg_protein,
            ROUND(AVG(脂肪), 2) as avg_fat,
            ROUND(AVG(碳水化合物), 2) as avg_carbs,
            ROUND(AVG(膳食纤维), 2) as avg_fiber,
            ROUND(AVG(钠), 2) as avg_sodium,
            ROUND(AVG(糖), 2) as avg_sugar,
            ROUND(STDDEV(蛋白质), 2) as std_protein,
            ROUND(STDDEV(脂肪), 2) as std_fat,
            ROUND(STDDEV(碳水化合物), 2) as std_carbs
        FROM clustered_nutrition
        GROUP BY cluster
        ORDER BY cluster
    """).collect()
    
    representative_foods = spark.sql("""
        WITH cluster_centers AS (
            SELECT 
                cluster,
                AVG(蛋白质) as center_protein,
                AVG(脂肪) as center_fat,
                AVG(碳水化合物) as center_carbs,
                AVG(膳食纤维) as center_fiber,
                AVG(钠) as center_sodium,
                AVG(糖) as center_sugar
            FROM clustered_nutrition
            GROUP BY cluster
        ),
        distances AS (
            SELECT 
                c.食物名称,
                c.cluster,
                c.蛋白质, c.脂肪, c.碳水化合物, c.膳食纤维, c.钠, c.糖,
                SQRT(
                    POW(c.蛋白质 - cc.center_protein, 2) +
                    POW(c.脂肪 - cc.center_fat, 2) +
                    POW(c.碳水化合物 - cc.center_carbs, 2) +
                    POW(c.膳食纤维 - cc.center_fiber, 2) +
                    POW(c.钠 - cc.center_sodium, 2) +
                    POW(c.糖 - cc.center_sugar, 2)
                ) as distance_to_center,
                ROW_NUMBER() OVER (PARTITION BY c.cluster ORDER BY 
                    SQRT(
                        POW(c.蛋白质 - cc.center_protein, 2) +
                        POW(c.脂肪 - cc.center_fat, 2) +
                        POW(c.碳水化合物 - cc.center_carbs, 2) +
                        POW(c.膳食纤维 - cc.center_fiber, 2) +
                        POW(c.钠 - cc.center_sodium, 2) +
                        POW(c.糖 - cc.center_sugar, 2)
                    )
                ) as rank
            FROM clustered_nutrition c
            JOIN cluster_centers cc ON c.cluster = cc.cluster
        )
        SELECT 
            cluster,
            食物名称,
            ROUND(蛋白质, 2) as protein,
            ROUND(脂肪, 2) as fat,
            ROUND(碳水化合物, 2) as carbs,
            ROUND(膳食纤维, 2) as fiber,
            ROUND(钠, 2) as sodium,
            ROUND(糖, 2) as sugar,
            ROUND(distance_to_center, 3) as distance
        FROM distances
        WHERE rank <= 3
        ORDER BY cluster, rank
    """).collect()
    
    cluster_distribution = spark.sql("""
        SELECT 
            cluster,
            COUNT(*) as count,
            ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) as percentage
        FROM clustered_nutrition
        GROUP BY cluster
        ORDER BY cluster
    """).collect()
    
    cluster_nutrition_profiles = []
    for cluster_info in cluster_characteristics:
        cluster_id = cluster_info['cluster']
        profile = {
            'cluster_id': cluster_id,
            'food_count': cluster_info['food_count'],
            'nutrition_profile': {
                '蛋白质': cluster_info['avg_protein'],
                '脂肪': cluster_info['avg_fat'],
                '碳水化合物': cluster_info['avg_carbs'],
                '膳食纤维': cluster_info['avg_fiber'],
                '钠': cluster_info['avg_sodium'],
                '糖': cluster_info['avg_sugar']
            },
            'variability': {
                '蛋白质标准差': cluster_info['std_protein'],
                '脂肪标准差': cluster_info['std_fat'],
                '碳水化合物标准差': cluster_info['std_carbs']
            }
        }
        cluster_nutrition_profiles.append(profile)
    
    clustering_results = {
        'cluster_characteristics': [row.asDict() for row in cluster_characteristics],
        'representative_foods': [row.asDict() for row in representative_foods],
        'cluster_distribution': [row.asDict() for row in cluster_distribution],
        'nutrition_profiles': cluster_nutrition_profiles,
        'clustering_summary': {
            'total_clusters': 5,
            'total_foods_analyzed': sum([row['food_count'] for row in cluster_characteristics]),
            'feature_dimensions': len(feature_cols)
        }
    }
    
    return JsonResponse(clustering_results, safe=False)

 

食物营养数据可视化分析系统-结语

计算机毕设选题太普通?基于Hadoop+Django的食物营养数据可视化分析系统让你脱颖而出

如果你觉得本文有用,一键三连(点赞、评论、转发)欢迎关注我,就是对我最大支持~~

也期待在评论区或私信看到你的想法和建议,一起交流探讨!谢谢大家!

⚡⚡获取源码主页-->:计算机毕设指导师

⚡⚡如果遇到具体的技术问题或计算机毕设方面需求!你也可以在个人主页上↑↑联系我~~